Automated Video Title Generation for Mobile Learning Resources: A Deep Learning Approach with Educational Context Awareness
Main Article Content
Abstract
With the rapid growth of mobile learning platforms, short educational videos have emerged as a critical resource for learners. However, manually generating concise and pedagogically meaningful titles for these videos remains a time-consuming challenge. To address this issue, this study proposes a deep learning framework designed for automated video title generation in educational contexts. The framework integrates Convolutional Neural Networks (CNNs), Long Short-Term Memory (LSTM) networks, and natural language processing (NLP) techniques, with explicit awareness of pedagogical relevance. The proposed approach operates in three stages: 1) extracting key frames from input videos using an optimized shot detection algorithm, 2) analyzing these frames with CNN models to derive semantic representations of visual content, and 3) processing the representations through an LSTM network to generate descriptive text. The output is further refined using the TextRank algorithm to ensure conciseness and contextual coherence. Experimental results demonstrate that our framework effectively generates high-quality video titles that are both educationally informative and contextually engaging, outperforming baseline methods in alignment with curriculum standards and learner-centric search intent.
Article Details

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
References
- Bradski, G., & Kaehler, A. (2000). OpenCV. Dr. Dobb’s journal of software tools, 3(2).
- Chang, L., Deng, X. M., & Zhou, M. Q. (2016). Convolutional neural networks in image understanding. Acta Automatica Sinica, 42(9), 13. https://doi.org/10.16383/j.aas.2016.c150800
- Gao, M., Qi, D., Mu, H., & Chen, J. (2021). A transfer residual neural network based on ResNet-34 for detection of wood knot defects. Forests, 12(2), 212. https://doi.org/10.3390/f12020212
- Goldberg, Y., & Levy, O. (2014). word2vec Explained: deriving Mikolov et al.'s negative-sampling word-embedding method. arxiv preprint arxiv:1402.3722. https://doi.org/10.48550/arXiv.1402.3722
- Imambi, S., Prakash, K. B., & Kanagachidambaresan, G. R. (2021). PyTorch. Programming with TensorFlow: solution for edge computing applications, 87-104.
- Jiang, F., & Zhang, Y. J. (2003). Scene segmentation and indexing and summarization of news videos. Chinese Journal of Computers, 26(7), 859–865. https://doi.org/10.3321/j.issn:0254-4164.2003.07.013
- Khosrovian, K., Pfahl, D., & Garousi, V. (n.d.). (2008). GENSIM 2.0: A Customizable Process Simulation Model for Software Process Evaluation. Making Globally Distributed Software Development a Success Story, 294–306. https://doi.org/10.1007/978-3-540-79588-9_26
- Ladias, A., Karvounidis, T., & Ladias, D. (2021). Classification of the programming styles in scratch using the SOLO taxonomy. Advances in Mobile Learning Educational Research, 1(2), 114-123. https://doi.org/10.25082/AMLER.2021.02.006
- Langville, A., & Meyer, C. (2004). Deeper Inside PageRank. Internet Mathematics, 1(3), 335–380. https://doi.org/10.1080/15427951.2004.10129091
- Liu, X., Zhu, Z., Fu, T., Chen, J., & Jiang, Y. (2021). Corpus annotation system based on HanLP Chinese word segmentation. In The 2nd International conference on computing and data science (pp. 1-17).
- Mihalcea, R., & Tarau, P. (2004). Textrank: Bringing order into text. In Proceedings of the 2004 conference on empirical methods in natural language processing (pp. 404-411).
- Neil, D., Pfeiffer, M., & Liu, S. C. (2016). Phased lstm: Accelerating recurrent network training for long or event-based sequences. Advances in neural information processing systems, 29. https://doi.org/10.48550/arXiv.1610.09513
- Shu, N., Liu, B., Lin, W., & Li, P. (2019). Survey of Distributed Machine Learning Platforms and Algorithms. Computer Science, 46(3), 9-18. https://doi.org/10.11896/j.issn.1002-137X.2019.03.002
- Suarez, O. D., Carrobles, M. D. M. F., Enano, N. V., García, G. B., Gracia, I. S., Incertis, J. A. P., & Tercero, J. S. (2014). OpenCV Essentials. Packt Publishing Ltd.
- Szegedy, C., Ioffe, S., Vanhoucke, V., & Alemi, A. (2017). Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning. Proceedings of the AAAI Conference on Artificial Intelligence, 31(1). https://doi.org/10.1609/aaai.v31i1.11231
- Uygun, D. (2024). Teachers’ perspectives on artificial intelligence in education. Advances in Mobile Learning Educational Research, 4(1), 931-939. https://doi.org/10.25082/AMLER.2024.01.005
- Venugopalan, S., Rohrbach, M., Donahue, J., Mooney, R., Darrell, T., & Saenko, K. (2015). Sequence to Sequence -- Video to Text. 2015 IEEE International Conference on Computer Vision (ICCV). https://doi.org/10.1109/iccv.2015.515
- Wang, Z., Wang, Z., Li, J., & Pan, J. Z. (2012). Building a Large Scale Knowledge Base from Chinese Wiki Encyclopedia. The Semantic Web, 80–95. https://doi.org/10.1007/978-3-642-29923-0_6
- Wilson, M. A., Zhou, Z., & Frank, R. (2023). Subject-verb agreement with Seq2Seq transformers: Bigger is better, but still not best. Proceedings of the Society for Computation in Linguistics, 6(1), 278-288.
- Wu, J., Zheng, H., Zhao, B., Li, Y., Yan, B., Liang, R., Wang, W., Zhou, S., Lin, G., Fu, Y., Wang, Y., & Wang, Y. (2019). Large-Scale Datasets for Going Deeper in Image Understanding. 2019 IEEE International Conference on Multimedia and Expo (ICME). https://doi.org/10.1109/icme.2019.00256
- Xiong, Y. (2014). Moving object extraction based on background subtraction and inter-frame difference method. Computer Era, 3, 4. https://doi.org/10.3969/j.issn.1006-8228.2014.03.013
- Yan, B. (2022). Graph construction based on HanLP keyword extraction and syntactic analysis. Electronic Component and Information Technology, 9, 77–80.
- Zhang, C., & Woodland, P. C. (2015). Parameterised sigmoid and reLU hidden activation functions for DNN acoustic modelling. Interspeech 2015. https://doi.org/10.21437/interspeech.2015-649
- Zhang, Z. X., Wang, H., & Xu, D. (2014). The rise and trends of ``mobile short video social applications". China Journalist, 2, 107–109. https://doi.org/10.3969/j.issn.1003-1146.2014.02.054