Description
Image augmentation is an important aspect of training a deep learning model as it can improve a model’s performance. However, for a model of a given size, the addition of augmentations may provide no increase in model performance and can cause the training process to take longer, wasting potentially limited time and resources. Because of this we tested the importance of both the image augmentation and training dataset size with a pre-existing machine learning model called Zoobot, a state-of-the-art model that classifies several morphological features of galaxy images.
We find that generally, increasing the number of augmentations and size of training data does increase model performance. However, the increases in performance are often only minimal (2-3% increase in performance). Normally, the models with some level of image augmentation have similar performance, indicating that the exact selection of augmentations might not be that important. What appears to be more important is that there is some augmentation process, regardless of the specific augmentations chosen. Due to capacity, the different models often converge on a maximum ability they are unable to surpass. As more complex questions require more data, the models are less likely to converge for said questions. This research demonstrates the careful balancing required between model capacity and data diversity, and shows that we often don't need to throw everything but the kitchen sink into a deep learning model to achieve good results. In an era when sustainability is paramount, astronomers might consider how to minimise computer resource use going forward.