Computer Science and Engineering –Artificial Intelligence, Indira Gandhi Delhi Technical University for Women, Delhi, India
Email: vidhi048btcsai20@igdtuw.ac.in (V.K.)
Manuscript received June 12, 2024; revised July 28, 2024; accepted August 12, 2024; published September 14, 2024
Abstract—Image Caption generation is an important research area in computer vision and natural language processing. This paper compares two popular Convolutional Neural Network (CNN) architectures, DenseNet201 and ResNet50, for feature extraction in the title generation task. The study aims to analyze the impact of these architectures on the quality of generated subtitles by measuring their learning curves and Bilingual Evaluation Understudy (BLEU) scores. The study shows that the choice of CNN architecture significantly affects the performance of the captioning model. Densenet201 and Resnet50 have different learning models and BLEU scores, indicating that the former is more effective at capturing high-level features, while the latter is more suitable for capturing local features. This study’s results will help develop more accurate and efficient subtitling models.
Keywords—image caption generator, Convolutional Neural Networks (CNN), Recurrent Neural Networks (RNN), Long Short-Term Memory (LSTM), DenseNet201, ResNet50
[PDF]
Cite: Vidhi Khubchandani, "Image Caption Generator Using DenseNet201 and ResNet50," International Journal of Future Computer and Communication, vol. 13, no. 3, pp. 55-59, 2024.
Copyright © 2024 by the authors. This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited
(CC BY 4.0)