Language models for generating programming questions with varying difficulty levels

Autores/as

DOI:

https://doi.org/10.31637/epsir-2024-760

Palabras clave:

Large Langue Models, ChatGPT, Question Generation, Adaptation, Gamification, Python, Difficulty, Pedagogy

Resumen

Introduction: This study explores the potential of Large Language Models (LLMs), specifically ChatGPT-4, in generating Python programming questions with varying degrees of difficulty. This ability could significantly enhance adaptive educational applications. Methodology: Experiments were conducted with ChatGPT-4 and participants to evaluate its ability to generate questions on various topics and difficulty levels in programming. Results: The results reveal a moderate positive correlation between the difficulty ratings assigned by ChatGPT-4 and the perceived difficulty ratings given by participants. ChatGPT-4 proves to be effective in generating questions that cover a wide range of difficulty levels.Discussion: The study highlights ChatGPT-4’s potential for use in adaptive educational applications that accommodate different learning competencies and needs. Conclusions: This study presents a prototype of a gamified educational application for teaching Python, which uses ChatGPT to automatically generate questions of varying difficulty levels. Future studies should conduct more exhaustive experiments, explore other programming languages, and address more complex programming concepts.

Descargas

Los datos de descargas todavía no están disponibles.

Biografía del autor/a

Christian Lopez, Lafayette College & Universidad Nacional Pedro Henríquez Ureña (UNPHU)

Es Profesor Asistente de Ciencias de la Computación con una afiliación en Ingeniería Mecánica en Lafayette College. Sus intereses de investigación se centran en el diseño y la optimización de sistemas inteligentes de apoyo a la toma de decisiones y tecnologías persuasivas para aumentar las competencias humanas. Esto significa que trabaja en diseñar y crear sistemas que ayuden a tomar mejores decisiones y mejorar el rendimiento en tareas mediante la integración de tecnologías y métodos de la ciencia y la ingeniería, como el aprendizaje automático y la realidad virtual. En algunos casos, estos sistemas también deben ser capaces de motivar a los individuos; por lo tanto, se utilizan tecnologías persuasivas como la gamificación.

Miles Morrison, Lafayette College

Miles Morrison está cursando una licenciatura en Ingeniería con un enfoque en Robótica en Lafayette College en Easton, PA, y se espera que se gradúe en 2026. Tiene la intención de obtener un título de posgrado después de obtener su licenciatura en Lafayette College para profundizar su experiencia. Esta es su primera contribución oficial a un trabajo de investigación y probablemente contribuirá a más en el futuro. Sus intereses de investigación y profesionales incluyen aplicaciones de inteligencia artificial, robótica, automatización digital y optimización de sistemas

Matthew Deacon, Lafayette College

Matthew Deacon está cursando una licenciatura en Ingeniería Mecánica con una especialización en Economía en Lafayette College en Easton, PA, y se espera que se gradúe en 2026. Tiene la intención de obtener un MBA después de obtener su título de licenciatura. En el verano de 2021, Matthew completó un trabajo sobre datos de accidentes cerebrovasculares para el profesor Guillermo Goldsztein de Georgia Tech como parte del curso de Ciencia de Datos y Aprendizaje Automático de Horizon Inspires Academic. También completó un curso en línea llamado 'Programming for Everybody - Getting Started with Python' a través de la Universidad de Michigan. Los intereses profesionales de Matthew incluyen el uso de la ingeniería para innovar y crear nuevos productos, aplicaciones o tecnologías

Citas

Aguinis, H., Villamor, I., & Ramani, R. S. (2021). MTurk Research: Review and Recommendations. Journal of Management, 47(4), 823–837. SAGE Publications Inc. https://doi.org/10.1177/0149206320969787 DOI: https://doi.org/10.1177/0149206320969787

Ahmad, A., Zeshan, F., Khan, M. S., Marriam, R., Ali, A., & Samreen, A. (2020). The Impact of Gamification on Learning Outcomes of Computer Science Majors. ACM Transactions on Computing Education, 20(2). https://doi.org/10.1145/3383456 DOI: https://doi.org/10.1145/3383456

Albán Bedoya, I., & Ocaña-Garzón, M. (2022). Educational Programming as a Strategy for the Development of Logical-Mathematical Thinking. Lecture Notes in Networks and Systems, 405 LNNS, 309–323. https://doi.org/10.1007/978-3-030-96043-8_24 DOI: https://doi.org/10.1007/978-3-030-96043-8_24

Amatriain, X. (2024). Prompt Design and Engineering: Introduction and Advanced Methods. 1–26. http://arxiv.org/abs/2401.14423

Amazon. (2018). Amazon Mechanical Turk. https://www.mturk.com/

API Reference - OpenAI API. Retrieved December 10, 2023, from https://platform.openai.com/docs/api-reference/chat

Baudisch, P., Beaudouin-Lafon, M., Mackay, W., Association for Computing Machinery, SIGCHI (Group: U.S.), & ACM Digital Library. (2013). CHI2013 Changing perspectives : extended abstracts : the 31st Annual CHI Conference on Human Factors in Computing Systems : 27 April - 2 May, 2013, Paris, France.

Bennani, S., Maalel, A., & Ben Ghezala, H. (2022). Adaptive gamification in E-learning: A literature review and future challenges. Computer Applications in Engineering Education, 30 (2), 628–642. https://doi.org/10.1002/cae.22477 DOI: https://doi.org/10.1002/cae.22477

Biancini, G., Ferrato, A., & Limongelli, C. (2024). Multiple-Choice Question Generation Using Large Language Models: Methodology and Educator Insights. Adjunct Proceedings of the 32nd ACM Conference on User Modeling, Adaptation and Personalization, 584–590. https://doi.org/10.1145/3631700.3665233 DOI: https://doi.org/10.1145/3631700.3665233

Busheska, A., & Lopez, C. (2022). Exploring the perceived complexity of 3d shapes: towards a spatial visualization VR application. Proceedings of the IDETC-CIE 2022, 1–9. DOI: https://doi.org/10.1115/DETC2022-91212

Caruccio, L., Cirillo, S., Polese, G., Solimando, G., Sundaramurthy, S., & Tortora, G. (2024). Claude 2.0 large language model: Tackling a real-world classification problem with a new iterative prompt engineering approach. Intelligent Systems with Applications, 21. https://doi.org/10.1016/j.iswa.2024.200336 DOI: https://doi.org/10.1016/j.iswa.2024.200336

Chang, Y., Wang, X., Wang, J., Wu, Y., Yang, L., Zhu, K., Chen, H., Yi, X., Wang, C., Wang, Y., Ye, W., Zhang, Y., Chang, Y., Yu, P. S., Yang, Q., & Xie, X. (2024). A Survey on Evaluation of Large Language Models. ACM Trans. Intell. Syst. Technol., 15(3). https://doi.org/10.1145/3641289 DOI: https://doi.org/10.1145/3641289

Chen, B., Zhang, Z., Langrené, N., & Zhu, S. (2023). Unleashing the potential of prompt engineering in Large Language Models: a comprehensive review. http://arxiv.org/abs/2310.14735

Chen, M., Tworek, J., Jun, H., Yuan, Q., Pinto, H. P. de O., Kaplan, J., Edwards, H., Burda, Y., Joseph, N., Brockman, G., Ray, A., Puri, R., Krueger, G., Petrov, M., Khlaaf, H., Sastry, G., Mishkin, P., Chan, B., Gray, S., … Zaremba, W. (2021). Evaluating Large Language Models Trained on Code. http://arxiv.org/abs/2107.03374

Chen, O., Paas, F., & Sweller, J. (2023). A Cognitive Load Theory Approach to Defining and Measuring Task Complexity Through Element Interactivity. Educational Psychology Review, 35 (2). https://doi.org/10.1007/s10648-023-09782-w DOI: https://doi.org/10.1007/s10648-023-09782-w

Davis, J., Van Bulck, L., Durieux, B., & Lindvall, C. (2023). The temperature feature of ChatGPT: Modifying creativity for clinical research (Preprint). JMIR Human Factors, 11. https://doi.org/10.2196/53559 DOI: https://doi.org/10.2196/preprints.53559

Deterding, S., Dixon, D., Khaled, R., & Nacke, L. (2011). From game design elements to gamefulness: Defining “gamification.” Proceedings of the 15th International Academic MindTrek Conference: Envisioning Future Media Environments, MindTrek 2011, 9–15. https://doi.org/10.1145/2181037.2181040 DOI: https://doi.org/10.1145/2181037.2181040

Doughty, J., Wan, Z., Bompelli, A., Qayum, J., Wang, T., Zhang, J., Zheng, Y., Doyle, A., Sridhar, P., Agarwal, A., Bogart, C., Keylor, E., Kultur, C., Savelka, J., & Sakr, M. (2024). A Comparative Study of AI-Generated (GPT-4) and Human-crafted MCQs in Programming Education. ACM International Conference Proceeding Series, 114–123. https://doi.org/10.1145/3636243.3636256 DOI: https://doi.org/10.1145/3636243.3636256

Ekin, S. (2023). Prompt Engineering For ChatGPT: A Quick Guide To Techniques, Tips, And Best Practices. https://doi.org/10.36227/techrxiv.22683919 DOI: https://doi.org/10.36227/techrxiv.22683919.v2

Flegal, K. E., Ragland, J. D., & Ranganath, C. (2019). Adaptive task difficulty influences neural plasticity and transfer of training. NeuroImage, 188, 111–121. https://doi.org/https://doi.org/10.1016/j.neuroimage.2018.12.003 DOI: https://doi.org/10.1016/j.neuroimage.2018.12.003

Gemini Team, Anil, R., Borgeaud, S., Alayrac, J.-B., Yu, J., Soricut, R., Schalkwyk, J., Dai, A. M., Hauth, A., Millican, K., Silver, D., Johnson, M., Antonoglou, I., Schrittwieser, J., Glaese, A., Chen, J., Pitler, E., Lillicrap, T., Lazaridou, A., … Vinyals, O. (2023). Gemini: A Family of Highly Capable Multimodal Models. http://arxiv.org/abs/2312.11805

Gomes, A., Ke, W., Lam, C. T., Teixeira, A., Correia, F., Marcelino, M., & Mendes, A. (2019). Understanding loops: a visual methodology. 2019 IEEE International Conference on Engineering, Technology and Education (TALE), 1–7. https://doi.org/10.1109/TALE48000.2019.9225951 DOI: https://doi.org/10.1109/TALE48000.2019.9225951

Hou, X., Zhao, Y., Liu, Y., Yang, Z., Wang, K., Li, L., Luo, X., Lo, D., Grundy, J., & Wang, H. (2023). Large Language Models for Software Engineering: A Systematic Literature Review. http://arxiv.org/abs/2308.10620

Huotari, K., & Hamari, J. (2017). A definition for gamification: anchoring gamification in the service marketing literature. Electronic Markets, 27(1), 21–31. https://doi.org/10.1007/s12525-015-0212-z DOI: https://doi.org/10.1007/s12525-015-0212-z

Ihantola, P., & Petersen, A. (2019). Code Complexity in Introductory Programming Courses. Proceedings of the 52nd Hawaii International Conference on System Sciences, 1–9. https://hdl.handle.net/10125/60204 DOI: https://doi.org/10.24251/HICSS.2019.924

Jones, K., Harland, J., Reid, J., & Bartlett, R. (2009). Relationship between examination questions and bloom’s taxonomy. Proceedings - Frontiers in Education Conference, 1–6. https://doi.org/10.1109/FIE.2009.5350598 DOI: https://doi.org/10.1109/FIE.2009.5350598

Lee, U., Jung, H., Jeon, Y., Sohn, Y., Hwang, W., Moon, J., & Kim, H. (2023). Few-shot is enough: exploring ChatGPT prompt engineering method for automatic question generation in english education. Education and Information Technologies, 1–33. https://doi.org/10.1007/s10639-023-12249-8 DOI: https://doi.org/10.1007/s10639-023-12249-8

Lei, H., Cui, Y., & Zhou, W. (2018). Relationships between student engagement and academic achievement: A meta-analysis. Social Behavior and Personality, 46(3), 517–528. https://doi.org/10.2224/sbp.7054 DOI: https://doi.org/10.2224/sbp.7054

Liu, J., Xia, C. S., Wang, Y., & Zhang, L. (2023). Is Your Code Generated by ChatGPT Really Correct? Rigorous Evaluation of Large Language Models for Code Generation. http://arxiv.org/abs/2305.01210

Lu, L., Neale, N., Line, N. D., & Bonn, M. (2022). Improving Data Quality Using Amazon Mechanical Turk Through Platform Setup. Cornell Hospitality Quarterly, 63(2), 231–246. https://doi.org/10.1177/19389655211025475 DOI: https://doi.org/10.1177/19389655211025475

Mcshane, L., & Lopez, C. (2023). Perceived complexity of 3d shapes for spatial visualization tasks: humans vs generative models. Proceedings of the ASME IDETC-CIE 2023, 1–10. DOI: https://doi.org/10.1115/DETC2023-115081

Oliveira, W., Hamari, J., Shi, L., Toda, A. M., Rodrigues, L., Palomino, P. T., & Isotani, S. (2023). Tailored gamification in education: A literature review and future agenda. Education and Information Technologies, 28(1), 373–406. https://doi.org/10.1007/s10639-022-11122-4 DOI: https://doi.org/10.1007/s10639-022-11122-4

OpenAI, Achiam, J., Adler, S., Agarwal, S., Ahmad, L., Akkaya, I., Aleman, F. L., Almeida, D., Altenschmidt, J., Altman, S., Anadkat, S., Avila, R., Babuschkin, I., Balaji, S., Balcom, V., Baltescu, P., Bao, H., Bavarian, M., Belgum, J., … Zoph, B. (2023). GPT-4 Technical Report. http://arxiv.org/abs/2303.08774

Ortolan, P. (2023). Optimizing Prompt Engineering for Improved Generative AI Content. [Trabajo de fin de grado, Universidad Pontificia Comillas]. http://hdl.handle.net/11531/80629

Saleem, A. N., Noori, N. M., & Ozdamli, F. (2022). Gamification Applications in E-learning: A Literature Review. Technology, Knowledge and Learning, 27(1), 139–159. https://doi.org/10.1007/s10758-020-09487-x DOI: https://doi.org/10.1007/s10758-020-09487-x

Sarsa, S., Denny, P., Hellas, A., & Leinonen, J. (2022). Automatic Generation of Programming Exercises and Code Explanations Using Large Language Models. ICER 2022 - Proceedings of the 2022 ACM Conference on International Computing Education Research, 1, 27–43. https://doi.org/10.1145/3501385.3543957 DOI: https://doi.org/10.1145/3501385.3543957

Scherer, R., Siddiq, F., & Sánchez-Scherer, B. (2021). Some Evidence on the Cognitive Benefits of Learning to Code. Frontiers in Psychology, 12. https://doi.org/10.3389/fpsyg.2021.559424 DOI: https://doi.org/10.3389/fpsyg.2021.559424

Shankar, S., Zamfirescu-Pereira, J. D., Hartmann, B., Parameswaran, A. G., & Arawjo, I. (2024). Who Validates the Validators? Aligning LLM-Assisted Evaluation of LLM Outputs with Human Preferences. http://arxiv.org/abs/2404.12272 DOI: https://doi.org/10.1145/3654777.3676450

Shieh J. (2023). Best practices for prompt engineering with the OpenAI API | OpenAI Help Center. OpenAI. https://bit.ly/4cSZyg6

Shin, E., & Ramanathan, M. (2023). Evaluation of prompt engineering strategies for pharmacokinetic data analysis with the ChatGPT large language model. Journal of Pharmacokinetics and Pharmacodynamics, 51. https://doi.org/10.1007/s10928-023-09892-6 DOI: https://doi.org/10.1007/s10928-023-09892-6

Sinclair, J., Butler, M., Morgan, M., & Kalvala, S. (2015). Student Engagement in computer science. Annual Conference on Innovation and Technology in Computer Science Education, ITiCSE, 2015-June, 242–247. https://doi.org/10.1145/2729094.2742586 DOI: https://doi.org/10.1145/2729094.2742586

Sweller, J. (1988). Cognitive Load During Problem Solving: Effects on Learning. Cognitive Science, 12(2), 257–285. https://doi.org/10.1207/s15516709cog1202_4 DOI: https://doi.org/10.1016/0364-0213(88)90023-7

Velasquez-Hainao, J. D., Franco-Cardona, C. J., & Cadavid-Higuita, L. (2023). Prompt Engineering: a methodology for optimizing interactions with AI-Language Models in the field of engineering. DYNA, 1–9. DOI: https://doi.org/10.15446/dyna.v90n230.111700

Wang, S., Xu, T., Li, H., Zhang, C., Liang, J., Tang, J., Yu, P. S., & Wen, Q. (2024). Large Language Models for Education: A Survey and Outlook. http://arxiv.org/abs/2403.18105

Yazidi, A., Abolpour Mofrad, A., Goodwin, M., Hammer, H. L., & Arntzen, E. (2020).

Balanced difficulty task finder: an adaptive recommendation method for learning tasks based on the concept of state of flow. Cognitive Neurodynamics, 14(5), 675–687. https://doi.org/10.1007/s11571-020-09624-3 DOI: https://doi.org/10.1007/s11571-020-09624-3

Zhan, Z., He, L., Tong, Y., Liang, X., Guo, S., & Lan, X. (2022). The effectiveness of gamification in programming education: Evidence from a meta-analysis. In Computers and Education: Artificial Intelligence (Vol. 3). Elsevier B.V. https://doi.org/10.1016/j.caeai.2022.100096 DOI: https://doi.org/10.1016/j.caeai.2022.100096

Zhang, R., Guo, J., Chen, L., Fan, Y., & Cheng, X. (2022). A Review on Question Generation from Natural Language Text. ACM Transactions on Information Systems, 40(1). https://doi.org/10.1145/3468889 DOI: https://doi.org/10.1145/3468889

Zhou, Y., Muresanu, A. I., Han, Z., Paster, K., Pitis, S., Chan, H., & Ba, J. (2022). Large Language Models Are Human-Level Prompt Engineers. http://arxiv.org/abs/2211.01910

Zu, T., Munsell, J., & Rebello, N. S. (2021). Subjective Measure of Cognitive Load Depends on Participants’ Content Knowledge Level. Frontiers in Education, 6, 647097. https://doi.org/10.3389/feduc.2021.647097 DOI: https://doi.org/10.3389/feduc.2021.647097

Descargas

Publicado

2024-09-12

Cómo citar

Lopez, C., Morrison, M., & Deacon, M. (2024). Language models for generating programming questions with varying difficulty levels. European Public & Social Innovation Review, 9, 1–19. https://doi.org/10.31637/epsir-2024-760

Número

Sección

Artículos Portada

Datos de los fondos