2632-6779 (Print)
2633-6898 (Online)


Scopus
Ulrich’s Periodicals Directory (ProQuest)
MLA International Bibliography
MLA Directory of Periodicals
Directory of Open Access Journals (DOAJ)
QOAM (Quality Open Access Market)
British National Bibliography
WAC Clearinghouse Journal Listings
EBSCO Education
ICI Journals Master List
ERIH PLUS
CNKI Scholar
Gale-Cengage
WorldCat
Crossref
Baidu Scholar
British Library
J-Gate
ROAD
BASE
Publons
Google Scholar
Semantic Scholar
ORE Directory
TIRF
China National Center for Philosophy and Social Sciences Documentation
Austin Pack
Steven Carter
Brigham Young University-Hawaii, USA
Alex Barrett
Florida State University, USA
Juan Escalante
Brigham Young University-Hawaii, USA
Mark Wolfersberger
Brigham Young University, USA
Abstract
This study examined the defensibility of using GPT-4 for automated essay scoring, using a ManyFacet Rasch Model analysis. Forty English for academic purposes student essays were rated by GPT- 4 and four trained educators to assess nuances in rubric application, severity, leniency, and bias. Findings suggest that while GPT-4 tended to avoid the use of extreme scores, exhibiting a moderate central tendency rating, it does show a high level of consistency in its scoring behavior. This study contributes to understanding the extensions and limitations of using Generative AI tools in scoring essays, and provides insights into the use of AI tools in assessing writing.
Keywords
Generative AI, ChatGPT, artificial intelligence, automated essay scoring, assessment, education