Data Augmentation: Part 2

Sebastien Wong 08 Nov 2016 2 minute read

continued from part 1

Data Augmentation with SMOTE

What about the case when we dont know how to peturb the data to ensure that label information is preserved? Well the Synthetic Minority Over Sampling Technique (SMOTE) can be used. Imagine every sample is a point in a multi-dimensional graph, where each dimension of the graph is one of the features. This is commonly referred to as feature space. Select two random samples fromthe same class. Now imagine drawing a line between them in feature space, and then creating a new synthetic sample at some random distance along that line. Its easiest to visualise this in two dimensions!

There are some existing implementations of SMOTE freely available on the internet. Here are a couple of sources:

  • MATLAB – SMOTE by Manohar
  • R – SMOTE is part of the DMwR package

Instead of just using SMOTE to create additional samples of a single minority class, we are going to increase the abundance of every class. Furthermore, instead of perturbing the raw input data, we will first transform the data into features (using convolutional filters) [1], and then apply SMOTE to create the additional samples.

Experiment

Lets perform our experiment again. But this time we will use the synthetically created samples to augment the training data set. Starting with 500 real samples per class, we will use Data Warping and SMOTE to iteratively increase the number of samples all the way up to 5,000 samples per class. Then we can compare the improvement in classifier performance of Data Warping (using elastic distortions) with that of SMOTE.

Figure 3 shows the results for the CNN classifier. We get good improvement in classifier performance using Data Warp, but it’s not quite as good as using real samples. We get modest improvement in classifier performance using SMOTE.

Figure 3. CNN Error % vs Number of Training Samples.

Figure 3. CNN Error % vs Number of Training Samples.

Figure 4 shows the results for the SVM classifier. Here we get some improvement in performance using Elastic Data Warping. But no where near as good as using real-sample. We have no improvement, and even a slight degradation, in performance using SMOTE.

Figure 4. SVM Error % vs Number of Training Samples.

Figure 5 shows the results for the ELM classifier. Here is results are mix. A large number of synthetic Elastic Data Warping samples are required to provide a modest improvement in test set error% performance. Whereas a small amount of SMOTE samples provides a small improvement in test set error%. But increasing the number of samples further degrades performance.

plot-ELM-results

Figure 5. ELM, Error % vs the Number of Training Samples.

 

Conclusion

For problems where the classifier is overfitting the data, the best solution to improving classifier performance is to collect more data. However, Data Augmentation using synthetic samples is possible and can give good results. Data Warping (such as elastic deformation) will give good results, if label preserving transformations of the data are known. Otherwise the SMOTE algorithm can be used to generate synthetic samples.

Convolutional Neural Networks (CNNs) are very amenable to data-augmentation techniques. They are my first choice for classification problems with spatial data.

References

[1] Wong, Sebastien C., Adam Gatt, Victor Stamatescu, and Mark D. McDonnell. (2016) “Understanding data augmentation for classification: when to warp?.” arXiv preprint arXiv:1609.08764 .