News

ISSN Number

2632-6779 (Print)  

2633-6898 (Online)

Abstracting/Indexing/Listing

Scopus

Ulrich’s Periodicals Directory (ProQuest)

MLA International Bibliography

MLA Directory of Periodicals

Directory of Open Access Journals (DOAJ)

QOAM (Quality Open Access Market)

British National Bibliography

WAC Clearinghouse Journal Listings

EBSCO Education

ICI Journals Master List

ERIH PLUS

CNKI Scholar

Gale-Cengage

WorldCat

Crossref

Baidu Scholar

British Library

J-Gate

ROAD

BASE

Publons

Google Scholar

Semantic Scholar

ORE Directory

TIRF

China National Center for Philosophy and Social Sciences Documentation

 

Home Journal Index Online First

An Exploratory Evaluation of GPT-4's Consistency as an English Essay Rater: A Many-Facet Rasch Model Analysis of AI versus Human Rating Patterns

Download Full PDF

Austin Pack

Steven Carter

Brigham Young University-Hawaii, USA

 

Alex Barrett

Florida State University, USA

 

Juan Escalante

Brigham Young University-Hawaii, USA

 

Mark Wolfersberger

Brigham Young University, USA

 

Abstract

This study examined the defensibility of using GPT-4 for automated essay scoring, using a ManyFacet Rasch Model analysis. Forty English for academic purposes student essays were rated by GPT- 4 and four trained educators to assess nuances in rubric application, severity, leniency, and bias. Findings suggest that while GPT-4 tended to avoid the use of extreme scores, exhibiting a moderate central tendency rating, it does show a high level of consistency in its scoring behavior. This study contributes to understanding the extensions and limitations of using Generative AI tools in scoring essays, and provides insights into the use of AI tools in assessing writing.

 

Keywords

Generative AI, ChatGPT, artificial intelligence, automated essay scoring, assessment, education