Automated Video Title Generation for Mobile Learning Resources: A Deep Learning Approach with Educational Context Awareness

Zheng Gong

doi:10.25082/AMLER.2025.01.010

Open Access Peer-reviewed Research Article

Download PDF

Submitted Dec 3, 2024

Published Mar 13, 2025

Issue Vol 5 No 1 (2025)

DOI 10.25082/AMLER.2025.01.010

Views

207

Downloads

Citations

PlumX Metrics

Zheng Gong

[1] Department of Applied Mathematics, Hong Kong Polytechnic University, Hong Kong, China; [2] School of Computer Science & Technology, Beijing Institute of Technology, Beijing, China

Abstract

With the rapid growth of mobile learning platforms, short educational videos have emerged as a critical resource for learners. However, manually generating concise and pedagogically meaningful titles for these videos remains a time-consuming challenge. To address this issue, this study proposes a deep learning framework designed for automated video title generation in educational contexts. The framework integrates Convolutional Neural Networks (CNNs), Long Short-Term Memory (LSTM) networks, and natural language processing (NLP) techniques, with explicit awareness of pedagogical relevance. The proposed approach operates in three stages: 1) extracting key frames from input videos using an optimized shot detection algorithm, 2) analyzing these frames with CNN models to derive semantic representations of visual content, and 3) processing the representations through an LSTM network to generate descriptive text. The output is further refined using the TextRank algorithm to ensure conciseness and contextual coherence. Experimental results demonstrate that our framework effectively generates high-quality video titles that are both educationally informative and contextually engaging, outperforming baseline methods in alignment with curriculum standards and learner-centric search intent.

Keywords

mobile learning, deep learning, convolutional neural networks, natural language processing, video captioning

How to Cite

Gong, Z. (2025). Automated Video Title Generation for Mobile Learning Resources: A Deep Learning Approach with Educational Context Awareness. Advances in Mobile Learning Educational Research, 5(1), 1344-1355. https://doi.org/10.25082/AMLER.2025.01.010

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

References

Bradski, G., & Kaehler, A. (2000). OpenCV. Dr. Dobb’s journal of software tools, 3(2).
Chang, L., Deng, X. M., & Zhou, M. Q. (2016). Convolutional neural networks in image understanding. Acta Automatica Sinica, 42(9), 13. https://doi.org/10.16383/j.aas.2016.c150800
Gao, M., Qi, D., Mu, H., & Chen, J. (2021). A transfer residual neural network based on ResNet-34 for detection of wood knot defects. Forests, 12(2), 212. https://doi.org/10.3390/f12020212
Goldberg, Y., & Levy, O. (2014). word2vec Explained: deriving Mikolov et al.'s negative-sampling word-embedding method. arxiv preprint arxiv:1402.3722. https://doi.org/10.48550/arXiv.1402.3722
Imambi, S., Prakash, K. B., & Kanagachidambaresan, G. R. (2021). PyTorch. Programming with TensorFlow: solution for edge computing applications, 87-104.
Jiang, F., & Zhang, Y. J. (2003). Scene segmentation and indexing and summarization of news videos. Chinese Journal of Computers, 26(7), 859–865. https://doi.org/10.3321/j.issn:0254-4164.2003.07.013
Khosrovian, K., Pfahl, D., & Garousi, V. (n.d.). (2008). GENSIM 2.0: A Customizable Process Simulation Model for Software Process Evaluation. Making Globally Distributed Software Development a Success Story, 294–306. https://doi.org/10.1007/978-3-540-79588-9_26
Ladias, A., Karvounidis, T., & Ladias, D. (2021). Classification of the programming styles in scratch using the SOLO taxonomy. Advances in Mobile Learning Educational Research, 1(2), 114-123. https://doi.org/10.25082/AMLER.2021.02.006
Langville, A., & Meyer, C. (2004). Deeper Inside PageRank. Internet Mathematics, 1(3), 335–380. https://doi.org/10.1080/15427951.2004.10129091
Liu, X., Zhu, Z., Fu, T., Chen, J., & Jiang, Y. (2021). Corpus annotation system based on HanLP Chinese word segmentation. In The 2nd International conference on computing and data science (pp. 1-17).
Mihalcea, R., & Tarau, P. (2004). Textrank: Bringing order into text. In Proceedings of the 2004 conference on empirical methods in natural language processing (pp. 404-411).
Neil, D., Pfeiffer, M., & Liu, S. C. (2016). Phased lstm: Accelerating recurrent network training for long or event-based sequences. Advances in neural information processing systems, 29. https://doi.org/10.48550/arXiv.1610.09513
Shu, N., Liu, B., Lin, W., & Li, P. (2019). Survey of Distributed Machine Learning Platforms and Algorithms. Computer Science, 46(3), 9-18. https://doi.org/10.11896/j.issn.1002-137X.2019.03.002
Suarez, O. D., Carrobles, M. D. M. F., Enano, N. V., García, G. B., Gracia, I. S., Incertis, J. A. P., & Tercero, J. S. (2014). OpenCV Essentials. Packt Publishing Ltd.
Szegedy, C., Ioffe, S., Vanhoucke, V., & Alemi, A. (2017). Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning. Proceedings of the AAAI Conference on Artificial Intelligence, 31(1). https://doi.org/10.1609/aaai.v31i1.11231
Uygun, D. (2024). Teachers’ perspectives on artificial intelligence in education. Advances in Mobile Learning Educational Research, 4(1), 931-939. https://doi.org/10.25082/AMLER.2024.01.005
Venugopalan, S., Rohrbach, M., Donahue, J., Mooney, R., Darrell, T., & Saenko, K. (2015). Sequence to Sequence -- Video to Text. 2015 IEEE International Conference on Computer Vision (ICCV). https://doi.org/10.1109/iccv.2015.515
Wang, Z., Wang, Z., Li, J., & Pan, J. Z. (2012). Building a Large Scale Knowledge Base from Chinese Wiki Encyclopedia. The Semantic Web, 80–95. https://doi.org/10.1007/978-3-642-29923-0_6
Wilson, M. A., Zhou, Z., & Frank, R. (2023). Subject-verb agreement with Seq2Seq transformers: Bigger is better, but still not best. Proceedings of the Society for Computation in Linguistics, 6(1), 278-288.
Wu, J., Zheng, H., Zhao, B., Li, Y., Yan, B., Liang, R., Wang, W., Zhou, S., Lin, G., Fu, Y., Wang, Y., & Wang, Y. (2019). Large-Scale Datasets for Going Deeper in Image Understanding. 2019 IEEE International Conference on Multimedia and Expo (ICME). https://doi.org/10.1109/icme.2019.00256
Xiong, Y. (2014). Moving object extraction based on background subtraction and inter-frame difference method. Computer Era, 3, 4. https://doi.org/10.3969/j.issn.1006-8228.2014.03.013
Yan, B. (2022). Graph construction based on HanLP keyword extraction and syntactic analysis. Electronic Component and Information Technology, 9, 77–80.
Zhang, C., & Woodland, P. C. (2015). Parameterised sigmoid and reLU hidden activation functions for DNN acoustic modelling. Interspeech 2015. https://doi.org/10.21437/interspeech.2015-649
Zhang, Z. X., Wang, H., & Xu, D. (2014). The rise and trends of ``mobile short video social applications". China Journalist, 2, 107–109. https://doi.org/10.3969/j.issn.1003-1146.2014.02.054

Article Sidebar

Main Article Content

Abstract

Article Details

References