There is a global increase in health awareness. The awareness of changing eating habits and choosing foods wisely are key factors that
make for a healthy life. In order to design a food image recognition system, many food images were captured from a mobile device but
sometimes include non-food objects such as people, cutlery, and even food decoration styles, called noise food images. These issues
decreased the performance of the system. Convolutional neural network (CNN) architectures are proposed to address this issue and
obtain good performance. In this study, we proposed to use the ResNet50-LSTM network to improve the efficiency of the food image
recognition system. The state-of-the-art ResNet architecture was invented to extract the robust features from food images and was
employed as the input data for the Conv1D combined with a long short-term memory (LSTM) network called Conv1D-LSTM. Then,
the output of the LSTM was assigned to the global average pooling layer before passing to the softmax function to create a probability
distribution. While training the CNN model, mixed data augmentation techniques were applied and increased by 0.6%. The results
showed that the ResNet50+Conv1D-LSTM network outperformed the previous works on the Food-101 dataset. The best performance
of the ResNet50+Conv1D-LSTM network achieved an accuracy of 90.87%.
Keywords
Food image recognition, Deep feature extraction method, Long short-term memory, Convolutional neural network, Spatial temporal features