Challenges in using ChatGPT to code student's mistakes

Ableitinger, Christoph; Dorner, Christian

doi:10.3389/feduc.2025.1632548

PERSPECTIVE article

Front. Educ.

Sec. STEM Education

Volume 10 - 2025 | doi: 10.3389/feduc.2025.1632548

This article is part of the Research TopicBridging Barriers: Technology Integration in Mathematics EducationView all articles

Challenges in using ChatGPT to code student's mistakes

Provisionally accepted

Christoph Ableitinger^1*

Christian Dorner²

¹University of Vienna, Vienna, Austria
²University of Teacher Education Styria, Graz, Styria, Austria

The final, formatted version of the article will be published soon.

The rapid advancements in artificial intelligence (AI) have sparked interest in its application within mathematics education, particularly in automating the coding and grading of student solutions. This study investigates the potential of ChatGPT, specifically the GPT-4 Turbo model, to assess student solutions to procedural mathematics tasks, focusing on its ability to identify correctness and categorise errors into two domains: "knowledge of the procedure" and "arithmetic/algebraic skills". The research is motivated by the need to reduce the time-intensive nature of coding and grading and to explore AI's reliability in this context.The study employed a two-phase approach using a dataset of handwritten student solutions of a system of linear equations: first, ChatGPT was trained using student solutions that were rewritten by one of the authors to ensure consistency in handwriting style; its performance was then tested with additional solutions, also in the same handwriting.. The findings reveal significant challenges, including frequent errors in handwriting recognition, misinterpretation of mathematical symbols, and inconsistencies in the categorisation of mistakes. Despite iterative feedback and prompt adjustments, ChatGPT's performance remained inconsistent, with only partial success in accurately coding solutions.The study concludes that while ChatGPT shows promise as a coding aid, its current limitations -particularly in recognising handwritten inputs and maintaining consistencyhighlight the need for improvement. These findings contribute to the growing discourse on AI's role in education, emphasizing the importance of improving AI tools for practical classroom and research applications.

Keywords: artificial intelligence, Mathematics, Procedural task, character recognition, student solutions, coding, grading (educational)

Received: 21 May 2025; Accepted: 17 Jul 2025.

Copyright: © 2025 Ableitinger and Dorner. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence: Christoph Ableitinger, University of Vienna, Vienna, Austria

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.