View on GitHub

GEC-Info

Repository to collect and categorize Grammatical Error Correction papers.

GEC Information

Policy

This repository aims to collect and categorize GEC (Grammatical Error Correction) papers.
Unlike NLP-progress, GEC-Info does not consider performance on benchmarks.
- Authors and conferences are also not be considered.
The papers are limited to refereed papers in international conferences for now.
- This is not the case for survey papers.

Contributing

Pull Requests for adding papers are accepted. Please make a commit changing only lines regarding the addition of papers (and take care of changing by auto-formatting).
You can also request to add papers as an issue.

It can also be viewed on GitHub Pages

Overview

Surveys

Title	Year	Page
“Automated Grammatical Error Correction: A Comprehensive Review”	2017	[paper]
“A Comprehensive Survey of Grammar Error Correction”	2020	[paper]
“Recent Trends in the Use of Deep Learning Models for Grammar Error Handling”	2020	[paper]
“Grammatical Error Correction: A Survey of the State of the Art”	2022	[paper]

Shared Tasks

Name	Year	Paper	Note
HOO 2011	2011	[paper]	[website]
HOO 2012	2012	[paper]	[website]
CoNLL-2013	2013	[paper]	[website]
CoNLL-2014	2014	[paper]	[website] [system outputs]
BEA-2019	2019	[paper]	[website] [system outpus]

Libraries

Name	Year	Paper	Note
UnifiedGEC	2025	UnifiedGEC: Integrating Grammatical Error Correction Approaches for Multi-languages with a Unified Framework	[code]
gec-metrics	2025	gec-metrics: A Unified Library for Grammatical Error Correction Evaluation	[code]

Datasets

For Training (Real Data)

Name	Year	Paper	Note
EFCamDat	2014	[Automatic Linguistic Annotation ofLarge Scale L2 Databases: The EF-Cambridge Open Language Database(EFCamDat)] [The EF Cambridge Open Language Database (efcamdat) Information for Users]	[download v2]
GitHub Typo Corpus	2019	[GitHub Typo Corpus: A Large-Scale Multilingual Dataset of Misspellings and Grammatical Errors]	[download]
W&I+LOCNESS on BEA2019 Shared Task	2019	[Developing an Automated Writing Placement System for ESL Learners ]	[direct download]
FCE	2011	[A New Dataset and Method for Automatically Grading ESOL Texts]	[direct download]
NUCLE	2013	[Building a Large Annotated Corpus of Learner English: The NUS Corpus of Learner English]	[download]
ICNALE	2013	[The ICNALE and Sophisticated Contrastive Interlanguage Analysis of Asian Learners of English]	[download]
Lang-8	2011	[Mining Revision Log of Language Learning SNS for Automated Japanese Error Correction of Second Language Learners]	[website] [download: Fill this form] Related tools are useful. See the [Other Tools] for the details.

For Training (Pseudo/Systhetic Data)

Name	Year	Paper	Note
PIE-synthetic	2019	[Parallel Iterative Edit Models for Local Sequence Transduction]	[download]
OmniGEC	2025	Introducing OmniGEC: A Silver Multilingual Dataset for Grammatical Error Correction	[HF datasets], [code]. Czech, English, Estonian, German, Greek, Icelandic, Italian, Latvian, Slovene, Swedish, and Ukrainian

For Evaluation

Name	Year	Paper	Note
KJ	2011	[Creating a manually error-tagged and shallow-parsed learner corpus]	[download]
CoNLL-2013	2013	[The CoNLL-2013 Shared Task on Grammatical Error Correction]	[direct download]
CoNLL-2014	2014	[The CoNLL-2014 Shared Task on Grammatical Error Correction]	[direct download]
10 additional annotations for the CoNLL14	2015	[How Far are We from Fully Automatic High Quality Grammatical Error Correction?]	[direct download]
8 additional annotations for the CoNLL14	2016	[Reassessing the Goals of Grammatical Error Correction: Fluency Instead of Grammaticality]	[download]
JFLEG	2017	[JFLEG: A Fluency Corpus and Benchmark for Grammatical Error Correction]	[download]
GMEG-Data	2019	[Enabling Robust Grammatical Error Correction in New Domains: Data Sets, Metrics, and Analyses]	[code]
CWEB	2020	[Grammatical Error Correction in Low Error Density Domains: A New Benchmark and Analyses]	[download]
ErAConD	2021	[ErAConD : Error Annotated Conversational Dialog Dataset for Grammatical Error Correction]	[data] Training dataset is also included.
RobustGEC	2023	RobustGEC: Robust Grammatical Error Correction Against Subtle Context Perturbation	[code]
CSW Lang-8 Dataset	2024	Grammatical Error Correction for Code-Switched Sentences by Learners of English	[code/data]
CTSEG	2025	Targeted Syntactic Evaluation for Grammatical Error Correction	[data]

Performance measures

Reference-based

Name	Year	Paper	Note
M^2 Scorer	2012	[Better Evaluation for Grammatical Error Correction]	[code] It is often used to evaluate CoNLL-2013 and CoNLL-2014.
GLEU	2015	[Ground Truth for Grammatical Error Correction Metrics] [GLEU Without Tuning]	[code] It is often used to evaluate JFLEG.
I-measure	2015	[Towards a standard evaluation method for grammatical error detection and correction]	[code] Code is available only python 2.x.
ERRANT	2016	[Automatic Extraction of Learner Errors in ESL Sentences Using Linguistically Enhanced Alignments] [Automatic Annotation and Evaluation of Error Types for Grammatical Error Correction]	[code] It is often used to evaluate BEA-2019.
GMEG-Metric	2019	[Enabling Robust Grammatical Error Correction in New Domains: Data Sets, Metrics, and Analyses]	[code] Ridge regression using existing metrics (e.g. ERRANT, GLEU) as features.
GoToScorer	2019	[Taking the Correction Difficulty into Account in Grammatical Error Correction Evaluation]	[code] It can be evaluated systems considering error correction difficulty.
PT-M2	2022	Revisiting Grammatical Error Correction Evaluation and Beyond	[code]
CLEME	2023	CLEME: Debiasing Multi-reference Evaluation for Grammatical Error Correction	[code]
GREEN	2024	n-gram F-score for Evaluating Grammatical Error Correction	[code]
	2025	Refined Evaluation for End-to-End Grammatical Error Correction Using an Alignment-Based Approach	[website]
CLEME2.0	2025	CLEME2.0: Towards Interpretable Evaluation by Disentangling Edits for Grammatical Error Correction	[code]
ERRANT extended to multiple languages	2025	Multilingual Grammatical Error Annotation: Combining Language-Agnostic Framework with Language-Specific Flexibility	[code]

Reference-free

Keywords / Overview	Year	Paper	Note
Scoring by counting the errors	2016	[There’s No Comparison: Reference-less Evaluation Metrics in Grammatical Error Correction]	[code]
Fluency + grammaticality + meaning preservation	2017	[Reference-based Metrics can be Replaced with Reference-less Metrics in Evaluating Grammatical Error Correction Systems]
USim	2018	[Reference-less Measure of Faithfulness for Grammatical Error Correction]	[code]
SOME	2020	[SOME: Reference-less Sub-Metrics Optimized for Manual Evaluations of Grammatical Error Correction]	[code]
Scribendi Score	2021	[Is this the end of the gold standard? A straightforward reference-less grammatical error correction metric]	[Unofficial code]
IMPARA	2022	IMPARA: Impact-Based Metric for GEC Using Parallel Data	[code]
	2024	Large Language Models Are State-of-the-Art Evaluator for Grammatical Error Correction
IMPARA-GED	2025	IMPARA-GED: Grammatical Error Detection is Boosting Reference-free Grammatical Error Quality Estimator	[Official model]
	2025	LLM-based post-editing as reference-free GEC evaluation

Meta-evaluation

Keywords / Overview	Year	Paper	Note
Re-rank the CoNLL14 systems by human evaluation	2015	Human Evaluation of Grammatical Error Correction Systems	[code]
Reassess M^2, I-measure, GLEU by comparing human evaluation	2018	[A Reassessment of Reference-Based Grammatical Error Correction Metrics]	[code]
MAEGE	2018	Automatic Metric Validation for Grammatical Error Correction	[code]
SEEDA	2024	Revisiting Meta-evaluation for Grammatical Error Correction	[code]

Quality Estimation

Keywords / Overview	Year	Paper	Note
	2022	Proficiency Matters Quality Estimation in Grammatical Error Correction

Models

Before Neural (E.g., SMT)

Keywords / Overview	Year	Paper	Note
	2006	Correcting ESL Errors Using Phrasal SMT Techniques
	2009	Using First and Second Language Models to Correct Preposition Errors in Second Language Authoring
	2010	Generating Confusion Sets for Context-Sensitive Error Correction
	2011	Correcting Semantic Collocation Errors with L1-induced Paraphrases
	2012	Tense and Aspect Error Correction for ESL Learners Using Global Context
	2012	Exploring Grammatical Error Correction with Not-So-Crummy Machine Translation
	2014	Grammatical error correction using hybrid systems and type filtering	CoNLL2014: CAMB
	2014	The AMU System in the CoNLL-2014 Shared Task: Grammatical Error Correction by Data-Intensive and Feature-Rich Statistical Machine Translation	CoNLL2014: AMU
	2014	The Illinois-Columbia System in the CoNLL-2014 Shared Task	CoNLL2014: CUUI
	2014	RACAI GEC – A hybrid approach to Grammatical Error Correction	CoNLL2014: RAC
	2014	Grammatical Error Detection Using Tagger Disagreement	CoNLL2014: UFC
	2014	CoNLL 2014 Shared Task: Grammatical Error Correction with a Syntactic N-gram Language Model from a Big Corpora	CoNLL2014: IPN
	2014	Tuning a Grammar Correction System for Increased Precision	CoNLL2014: IITB
	2014	POSTECH Grammatical Error Correction System in the CoNLL-2014 Shared Task	CoNLL2014: POST
	2014	Grammatical Error Detection and Correction using a Single Maximum Entropy Model	CoNLL2014: SJTU
	2014	Factored Statistical Machine Translation for Grammatical Error Correction	CoNLL2014: UMC
	2014	NTHU at the CoNLL-2014 Shared Task	CoNLL2014: NTHU
	2014	A Unified Framework for Grammar Error Correction	CoNLL2014: PKU
	2016	Exploiting N-Best Hypotheses to Improve an SMT Approach to Grammatical Error Correction
	2016	Adapting Grammatical Error Correction Based on the Native Language of Writers with Neural Network Joint Models
Phrase-based SMT	2016	[Phrase-based Machine Translation is State-of-the-Art for Automatic Grammatical Error Correction]	[code]
Word-level SMT enhanced NNJMs + char-based SMT	2017	[Connecting the Dots: Towards Human-Level Grammatical Error Correction]	[code]
SMEG	2017	[Systematically Adapting Machine Translation for Grammatical Error Correction]	[code]

Encoder-Decoder

Tagging / Non-autogressive

|Keywords / Overview|Year|Paper|Note| |:–|:–|:–|:–| |LSTM tagger for word coice task|2019|[Choosing the Right Word: Using Bidirectional LSTM Tagger for Writing Support Systems]|[code]| |PIE|2019|[Parallel Iterative Edit Models for Local Sequence Transduction]|[code]| |LaserTagger|2019|[Encode, Tag, Realize: High-Precision Text Editing]|[code]| |GECToR|2020|[GECToR – Grammatical Error Correction: Tag, Not Rewrite]|[code]| |Seq2Edits|2020|[Seq2Edits: Sequence Transduction Using Span-level Edit Operations]|[code]| |GAN-like sequence labeling|2021|[Grammatical Error Correction as GAN-like Sequence Labeling]|| |GECToR Large|2022|Ensembling and Knowledge Distilling of Large Sequence Taggers for Grammatical Error Correction|[code] [Author’s Master Thesis]| ||2021|Efficient Grammatical Error Correction with Hierarchical Error Detections and Correction|[code]| ||2022|Type-Driven Multi-Turn Corrections for Grammatical Error Correction|[code]| ||2023|An Extended Sequence Tagging Vocabulary for Grammatical Error Correction|[code]|

Large Language Model

Ensembles / Post-processing

Keywords / Overview	Year	Paper	Note
Use MENT	2014	System Combination for Grammatical Error Correction
	2016	Grammatical Error Correction: Machine Translation and Classifiers
	2019	[Learning to combine Grammatical Error Corrections]	[code]
Diversity-Driven Combination (DDC)	2021	[Diversity-Driven Combination for Grammatical Error Correction]	[code]
Select a system for each error type with IP	2021	[System Combination for Grammatical Error Correction Based on Integer Programming]	[code]
	2022	Frustratingly Easy System Combination for Grammatical Error Correction	[code]
EditScorer	2022	Improved grammatical error correction by ranking elementary edits	[code]
GRECO	2023	System Combination via Quality Estimation for Grammatical Error Correction	[code]
	2024	Improving Grammatical Error Correction by Correction Acceptability Discrimination

Strategies

This includes methods such as decoding techniques and approaches that modify the loss function while keeping the model architecture unchanged.

Keywords / Overview	Year	Paper	Note
	2012	A Beam-Search Decoder for Grammatical Error Correction
	2016	Discriminative Reranking for Grammatical Error Correction with Statistical Machine Translation
	2016	Candidate re-ranking for SMT-based grammatical error correction
Some methods that can be adapted neural MT	2018	[Approaching Neural Grammatical Error Correction as a Low-Resource Machine Translation Task]	[code]
Iterative decoding	2018	[Weakly Supervised Grammatical Error Correction using Iterative Decoding]
	2019	Controlling Grammatical Error Correction Using Word Edit Rate
Add adversarial examples continually	2020	[Improving Grammatical Error Correction Models with Purpose-Built Adversarial Examples]
Cross-lingual Transfer Learning	2020	[Cross-lingual Transfer Learning for Grammatical Error Correction]
Data Weighted Training Strategies	2020	[Data Weighted Training Strategies for Grammatical Error Correction]
Align-and-Predict Decoding	2022	Adjusting the Precision-Recall Trade-Off with Align-and-Predict Decoding for Grammatical Error Correction	[code]
	2023	Mitigating Exposure Bias in Grammatical Error Correction with Data Augmentation and Reweighting	[code]
BTR	2023	Bidirectional Transformer Reranker for Grammatical Error Correction	[code]
	2023	Efficient Grammatical Error Correction Via Multi-Task Training and Optimized Training Schedule
MainGEC	2023	Grammatical Error Correction via Mixed-Grained Weighted Training
	2023	Improving Seq2Seq Grammatical Error Correction via Decoding Interventions	[code]
	2024	Multi-pass Decoding for Grammatical Error Correction

Data Augmentation

Keywords / Overview	Year	Paper	Note
Make artificial errors in a probabilistic manner	2014	[Generating artificial errors for grammatical error correction]
Back translation	2016	[Improving Neural Machine Translation Models with Monolingual Data]
SMT based MT + pattern extraction	2017	[Artificial Error Generation with Machine Translation and Syntactic Patterns]
Diverse back translation with noisy beam search	2018	[Noising and Denoising Natural Language: Diverse Backtranslation for Grammar Correction]
MAGEC	2019	[Minimally-Augmented Grammatical Error Correction]	Supervised setting is also performed
DirectNoise	2019	[Improving Grammatical Error Correction via Pre-Training a Copy-Augmented Architecture with Unlabeled Data]	The method was first called “DirectNoise” by [kiyono+ 2019]?
Substituting words using confusion sets	2019	[Neural Grammatical Error Correction Systems with Unsupervised Pre-training on Synthetic Data]	[synthetic data] BEA-2019: UEDIN-MS
Error+Context Dictionary	2019	[Improving Precision of Grammatical Error Correction with a Cheat Sheet]	BEA-2019: Buffalo
Use Google Translate for making pseudo data	2019	[(Almost) Unsupervised Grammatical Error Correction using a Synthetic Comparable Corpus]	BEA-2019: TMU in Low Resource
Inverted Spellchecker + Patterns+POS	2019	[A Comparative Study of Synthetic Data Generation Methods for Grammatical Error Correction]
Methods for erroneous data generation	2019	[Erroneous data generation for Grammatical Error Correction]	BEA-2019: Shuyao
Wikipedia revision & Wikipedia round-trip translation	2019	[Corpora Generation for Grammatical Error Correction]
Create confusion sets by edit distance, word embeddings, spell-breaking	2019	[Minimally-Augmented Grammatical Error Correction]	Supervised setting is also performed
Explore methods to make pseude data, seed corpus, training settings	2019	[An Empirical Study of Incorporating Pseudo Data into Grammatical Error Correction]	[code]
	2020	[Massive Exploration of Pseudo Data for Grammatical Error Correction]
Control error rates and error types by rule-based corruption and filtered back-translation	2020	[Controllable Data Synthesis Method for Grammatical Error Correction]
Use machine translation pairs	2020	[Improving Grammatical Error Correction with Machine Translation Pairs]
Edit latent representation	2020	[Improving Grammatical Error Correction with Data Augmentation by Editing Latent Representation]
Consider learner’s error tendency	2020	[Grammatical Error Correction Using Pseudo Learner Corpus Considering Learner’s Error Tendency]
Tagged corruption	2021	[Synthetic Data Generation for Grammatical Error Correction with Tagged Corruption Models]	[code]
Use 188 modules	2021	[Various Errors Improve Neural Grammatical Error Correction]	[code]
Use real error petterns and linguistic knowledge	2021	[Data Augmentation of Incorporating Real Error Patterns and Linguistic Knowledge for Grammatical Error Correction]
Divide non-English sentence into chunks → translate to English for each of them → concatenate	2021	[Grammatical Error Generation Based on Translated Fragments]
	2023	Grammatical Error Correction through Round-Trip Machine Translation
TransGEC	2023	TransGEC: Improving Grammatical Error Correction with Translationese	[code]
Focus on gender bias	2023	Gender-Inclusive Grammatical Error Correction through Augmentation	[code]
	2023	Training for Grammatical Error Correction Without Human-Annotated L2 Learners’ Corpora
MixEdit	2023	MixEdit: Revisiting Data Augmentation and Beyond for Grammatical Error Correction	[code]
	2024	Synthetic Data Generation for Low-resource Grammatical Error Correction with Tagged Corruption Models
	2024	Improving Grammatical Error Correction via Contextual Data Augmentation	[code]
	2024	To Err Is Human, but Llamas Can Learn It Too	[code]
	2025	Large Language Models are Good Annotators for Type-aware Data Augmentation in Grammatical Error Correction	[code]
	2025	Leveraging Multilingual Models for Robust Grammatical Error Correction Across Low-Resource Languages
	2025	Low-Resource Grammatical Error Correction: Selective Data Augmentation with Round-Trip Machine Translation	[code]. experiments Include Russian and Ukrainian.

Data Cleaning

Keywords / Overview	Year	Paper	Note
A Self-Refinement Strategy for Noise Reduction	2020	[A Self-Refinement Strategy for Noise Reduction in Grammatical Error Correction]
cLang8 (Cleaned Lang-8)	2021	[A Simple Recipe for Multilingual Grammatical Error Correction]	[code]

Analyses

Keywords / Overview	Year	Paper	Note
	2011	Algorithm Selection and Model Adaptation for ESL Correction Tasks
	2012	The Effect of Learner Corpus Size in Grammatical Error Correction of ESL Writings
	2015	[How Far are We from Fully Automatic High Quality Grammatical Error Correction?]
Human annotation focused on fluency	2016	[Reassessing the Goals of Grammatical Error Correction: Fluency Instead of Grammaticality]	[code]
	2017	[GEC into the future: Where are we going and how do we get there?]
	2018	[Inherent Biases in Reference-based Evaluation for Grammatical Error Correction]	[code]
	2018	[Assessing Grammatical Correctness in Language Learning]
Quality estimation (and re-ranking using estimated score)	2018	[Neural Quality Estimation of Grammatical Error Correction]	[code]
Evaluate four systems (SMT, CNN, LSTM, Transformer) for six corpora (CoNLL13&14, FCE, JFLEG, KJ, ICNALE)	2019	[Cross-Corpora Evaluation and Analysis of Grammatical Error Correction Models — Is Single-Corpus Evaluation Enough?]
Compare CNN, Transformer, PRPN, ON-LSTM as back-translation models	2019	[The Unbearable Weight of Generating Artificial Errors for Grammatical Error Correction]
GEC for post-processing	2019	Automatic Grammatical Error Correction for Sequence-to-sequence Text Generation: An Empirical Study
CGOP	2020	[Comparison of the Evaluation Metrics for Neural Grammatical Error Correction With Overcorrection]	Metric Considering overcorrection
Create new gold data by post-editing system outputs	2021	[How Good (really) are Grammatical Error Correction Systems?]
Explore whether models have grammatical knowledge with Known-setting and Unknown-setting	2021	[Do Grammatical Error Correction Models Realize Grammatical Generalization?]
Compare CNN, LSTM, transformer or combinations of them as BT models	2021	[Comparison of Grammatical Error Correction Using Back-Translation Models]
	2022	Uncertainty Determines the Adequacy of the Mode and the Tractability of Decoding in Sequence-to-Sequence Models
	2022	Grammatical Error Correction: Are We There Yet?
	2022	Grammatical Error Correction Systems for Automated Assessment: Are They Susceptible to Universal Adversarial Attacks?	[code]
	2023	ChatBack: Investigating Methods of Providing Grammatical Error Feedback in a GUI-based Language Learning Chatbot
	2023	A Closer Look at k-Nearest Neighbors Grammatical Error Correction
	2023	Grammatical Error Correction for Sentence-level Assessment in Language Learning
	2023	Evaluation Metrics in the Era of GPT-4: Reliably Evaluating Large Language Models on Sequence to Sequence Tasks
	2024	[Pillars of Grammatical Error Correction: Comprehensive Inspection Of Contemporary Approaches In The Era of Large Language Models]	[code]
	2024	Likelihood-based Mitigation of Evaluation Bias in Large Language Models
	2025	Rethinking Evaluation Metrics for Grammatical Error Correction: Why Use a Different Evaluation Process than Human?	[code]
	2025	Improving Explainability of Sentence-level Metrics via Edit-level Attribution for Grammatical Error Correction	[code]

Spoken Domain

Keywords / Overview	Year	Paper
	2019	AUTOMATIC GRAMMATICAL ERROR DETECTION OF NON-NATIVE SPOKEN LEARNER ENGLISH
	2020	Grammatical error detection in transcriptions of spoken English
Disfluency detection (DD) model	2020	Spoken Language ‘Grammatical Error Correction’
	2022	On Assessing and Developing Spoken ’Grammatical Error Correction’ Systems

Applications

Name	Year	Paper	Note
GECko++		[GECko+: a Grammatical and Discourse Error Correction Tool]	[website] [code] An English assiting tool. Correction grammatical error and re-ordering sentences automatically.
MiSS	2021	[MiSS: An Assistant for Multi-Style Simultaneous Translation]	[website] [demo video]
ALLECS	2023	ALLECS: A Lightweight Language Error Correction System	[website] [code]
	2023	Doolittle: Benchmarks and Corpora for Academic Writing Formalization	[code]
	2025	Who Wrote This? The Key to Zero-Shot LLM-Generated Text Detection Is GECScore	[code]

Projects

Name	Website
GramFormer	[GitHub]

Other Tools

Name	Code	Note
Lang8-NAIST-extractor	[code]	Scripts for extracting error-correct pairs from the Lang-8 Corpus.
M2Converter	[code]	Scripts for converting m2 file into source file and target file.
EFCamDat-Preprocess	[code]

Other materials

Name	Paper	Note
NLP-progress		[website] The performance ranking on some datasets.
A Crash Course in Automatic Grammatical Error Correction	[paper]	[materials] The tutorial about GEC in COLING2020.
Chunngai/gec-papers		[github] The papers are being compiled around 2019-2020?

Grammatical Error Detection

Keywords / Overview	Year	Paper	Note
	2003	Automatic Error Detection in the Japanese Learners’ English Spoken Data
	2006	Detecting errors in English article usage by non-native speakers
	2008	The Ups and Downs of Preposition Error Detection in ESL Writing
	2010	Evaluating performance of grammatical error detection to maximize learning effect
A weighted measure according to crowdsourcing results (for GED)	2011	[They Can Help: Using Crowdsourcing to Improve the Evaluation of Grammatical Error Detection Systems]
	2014	Detecting Learner Errors in the Choice of Content Words Using Compositional Distributional Semantics
	2016	Compositional Sequence Labeling Models for Error Detection in Learner Writing
	2017	Grammatical Error Detection Using Error- and Grammaticality-Specific Word Embeddings	[code]
	2018	[Wronging a Right: Generating Better Errors to Improve Grammatical Error Detection]	[code]
Bi-LSTM with contextual word embeddings	2019	[Context is Key: Grammatical Error Detection with Contextual Word Representations]
Multi-head and multi-layer attention	2019	[Multi-Head Multi-Layer Attention to Deep Language Representations for Grammatical Error Detection]
	2021	[Exploring the Capacity of a Large-scale Masked Language Model to Recognize Grammatical Errors]
	2022	Probing for targeted syntactic knowledge through grammatical error detection	[code]
	2024	Zero-shot Cross-Lingual Transfer for Synthetic Data Generation in Grammatical Error Detection	[data]
	2025	Oddballness: universal anomaly detection with language models	[code]

Feedback Comment Generation

Keywords / Overview	Year	Paper	Note
	2014	[Correcting Preposition Errors in Learner English Using Error Case Frames and Feedback Messages]
English grammar checker with feedback in Japanese	2018	[Grammatical Error Checker for Japanese Learners of English]	This is not a research as a feedback comment generation, but I classify it here for now
	2019	[Toward a Task of Feedback Comment Generation for Writing Learning]
	2020	[Creating Corpora for Research in Feedback Comment Generation]
	2021	[Shared Task on Feedback Comment Generation for Language Learners]
	2023	Template-guided Grammatical Error Feedback Comment Generation

Explainable Grammatical Error Correction

Studies to explain the reasons for and intentions of error correction.

Keywords / Overview	Year	Paper	Note
EXPECT	2023	Enhancing Grammatical Error Correction Systems with Explanations	[code]
XGEC dataset	2024	Controlled Generation with Prompt Insertion for Natural Language Explanations in Grammatical Error Correction	[data]
GEE	2024	GEE! Grammar Error Explanation with Large Language Models	[code]

Document-level Revision

Keywords / Overview	Year	Paper	Note
TETRA	2024	Towards Automated Document Revision: Grammatical Error Correction, Fluency Edits, and Beyond	[code]

Other Languages

Arabic

Keywords / Overview	Year	Paper	Note
Arabic Learner Corpus	2013	[Arabic Learner Corpus v1: A New Resource for Arabic Language Research]	[website]
QALB	2014	[Large Scale Arabic Error Annotation: Guidelines and Framework]	[QALB Project Website]
QALB 2014 Shared Task	2014	[The First QALB Shared Task on Automatic Text Correction for Arabic]	[website]
QALB 2015 Shared Task	2015	[The Second QALB Shared Task on Automatic Text Correction for Arabic]
ARETA	2021	[Automatic Error Type Annotation for Arabic]	[code]
	2023	Advancements in Arabic Grammatical Error Detection and Correction: An Empirical Investigation	[code]
	2023	Beyond English: Evaluating LLMs for Arabic Grammatical Error Correction	[[code]]
	2025	ARWI: Arabic Write and Improve	[website]
	2025	Enhancing Text Editing for Grammatical Error Correction: Arabic as a Case Study	[code]

Bangla

Keywords / Overview	Year	Paper	Note
	2021	[Development of Bangla Spell and Grammar Checkers: Resource Creation and Evaluation]
	2025	Leveraging LLMs for Bangla Grammar Error Correction: Error Categorization, Synthetic Data, and Model Evaluation	[code]

Chinese

Keywords / Overview	Year	Paper	Note
	2013	Chinese Spelling Checker Based on Statistical Machine Translation
	2014	Chinese Word Ordering Errors Detection and Correction for Non-Native Chinese Language Learners
	2015	Improving Chinese Grammatical Error Correction with Corpus Augmentation and Hierarchical Phrase-based Statistical Machine Translation
NLPCC-2018 Shared Task	2018	[Overview of the NLPCC 2018 Shared Task: Grammatical Error Correction]	[data]
Two-stage: Spell checker → seq2seq	2019	[A Two-Stage Model for Chinese Grammatical Error Correction]
CNN-based seq2seq	2019	[Chinese Grammatical Error Correction Based on Convolutional Sequence to Sequence Model]
MaskGEC	2020	[MaskGEC: Improving Neural Grammatical Error Correction via Dynamic Masking]
	2020	[Chinese Grammatical Error Detection Based on BERT Model]
	2020	[BERT Enhanced Neural Machine Translation and Sequence Tagging Model for Chinese Grammatical Error Diagnosis]
	2020	[Heterogeneous Recycle Generation for Chinese Grammatical Error Correction]
NLPTEA-2020 Shared Task	2020	[Overview of NLPTEA-2020 Shared Task for Chinese Grammatical Error Diagnosis]
Tail-to-Tail Non-Autoregressive Sequence Prediction	2021	[Tail-to-Tail Non-Autoregressive Sequence Prediction for Chinese Grammatical Error Correction]
	2021	“Is Whole Word Masking Always Better for Chinese BERT?”: Probing on Chinese Grammatical Error Correction
	2022	Pre-Training-Based Grammatical Error Correction Model for the Written Language of Chinese Hearing Impaired Students
	2022	MuCGEC: a Multi-Reference Multi-Source Evaluation Dataset for Chinese Grammatical Error Correction	[code]
	2022	Improving Chinese Grammatical Error Detection via Data augmentation by Conditional Error Generation	[code]
	2022	String Editing Based Chinese Grammatical Error Diagnosis
CLG	2022	Linguistic Rules-Based Corpus Generation for Native Chinese Grammatical Error Correction	[code]
	2022	From Spelling to Grammar: A New Framework for Chinese Grammatical Error Correction
FCGEC	2022	FCGEC: Fine-Grained Corpus for Chinese Grammatical Error Correction	[code]
	2023	Are Pre-trained Language Models Useful for Model Ensemble in Chinese Grammatical Error Correction?	[code]
	2023	Focal Training and Tagger Decouple for Grammatical Error Correction
NaSGEC	2023	NaSGEC: a Multi-Domain Chinese Grammatical Error Correction Dataset from Native Speaker Texts	[code]
TLM	2023	TLM: Token-Level Masking for Transformers	[code]
	2024	LM-Combiner: A Contextual Rewriting Model for Chinese Grammatical Error Correction	[code]
Alirector	2024	Alirector: Alignment-Enhanced Chinese Grammatical Error Corrector	[code]
	2024	Towards Better Utilization of Multi-Reference Training Data for Chinese Grammatical Error Correction	[code]
	2024	Towards Explainable Chinese Native Learner Essay Fluency Assessment: Dataset, Tasks, and Method	[code]
	2025	Improving Automatic Grammatical Error Annotation for Chinese Through Linguistically-Informed Error Typology	[website]
	2025	A Chain-of-Task Framework for Instruction Tuning of LLMs Based on Chinese Grammatical Error Correction	[code]
VisCGEC	2025	VisCGEC: Benchmarking the Visual Chinese Grammatical Error Correction	[code]
	2025	Rethinking the Roles of Large Language Models in Chinese Grammatical Error Correction	[code]

Czech

Keywords / Overview	Year	Paper	Note
AKCES-GEC dataset	2019	[Grammatical Error Correction in Low-Resource Scenarios]	[data]
Grammar Error Correction Corpus for Czech (GECCC)	2022	Czech Grammar Error Correction with a Large and Diverse Corpus	[data]

Estonian

Keywords / Overview	Year	Paper	Note
	2025	Paragraph-level Error Correction and Explanation Generation: Case Study for Estonian	[data]

Finnish

|Keywords / Overview|Year|Paper|Note| |:–|:–|:–|:–| ||2024|Correcting Challenging Finnish Learner Texts With Claude, GPT-3.5 and GPT-4 Large Language Models ||

Geek

Keywords / Overview	Year	Paper	Note
Greek Learner Corpus	2018	[Stand-off annotation in learner corpora: compiling the Greek Learner Corpus (GLC)]
ELERRANT	2021	[ELERRANT: Automatic Grammatical Error Type Classification for Greek]	[code]

German

Keywords / Overview	Year	Paper	Note
Falko-MERLIN dataset	2018	[Using Wikipedia Edits in Low Resource Grammatical Error Correction]	[data]

Hindi

Keywords / Overview	Year	Paper	Note
	2014	[Detection and correction of non word spelling errors in Hindi language]
HiWikiEd dataset	2020	[Generating Inflectional Errors for Grammatical Error Correction in Hindi]	[data]
Hi-GEC	2025	Hi-GEC: Hindi Grammar Error Correction in Low Resource Scenario	[code]

Icelandic

|Keywords / Overview|Year|Paper|Note| |:–|:–|:–|:–| |Byte-level approach|2023|Byte-Level Grammatical Error Correction Using Synthetic and Curated Corpora|[code]|

Japanese

Keywords / Overview	Year	Paper	Note
Character-level RNN-based seq2seq	2018	[Automatic Error Correction on Japanese Functional Expressions Using Character-based Neural Machine Translation]
Constructing retrieval system for Japanese GEC	2019	[Grammatical-Error-Aware Incorrect Example Retrieval System for Learners of Japanese as a Second Language]
TMU Evaluation Corpus for Japanese Learners	2020	[Construction of an Evaluation Corpus for Grammatical Error Correction for Learners of Japanese as a Second Language]	[data: Fill this form]
Non-Autoregressive approach	2020	[Non-Autoregressive Grammatical Error Correction Toward a Writing Support System]
	2022	Construction of a Quality Estimation Dataset for Automatic Evaluation of Japanese Grammatical Error Correction

Korean

|Keywords / Overview|Year|Paper|Note| |:–|:–|:–|:–| |KAGAS|2023|Towards standardizing Korean Grammatical Error Correction: Datasets and Annotation|[code] [data request form]| ||2024|Search if you don’t know! Knowledge-Augmented Korean Grammatical Error Correction with Large Language Models|| ||2025|Unified Automated Essay Scoring and Grammatical Error Correction||

Lithuanian

|Keywords / Overview|Year|Paper|Note| |:–|:–|:–|:–| ||2022|Towards Lithuanian grammatical error correction|[code]|

Romain

Keywords / Overview	Year	Paper	Note
	2020	[Neural Grammatical Error Correction for Romanian]	[code]

Russian

Keywords / Overview	Year	Paper	Note
RULEC-GEC dataset	2019	[Grammar Error Correction in Morphologically Rich Languages: The Case of Russian]	[data]
RU-Lang8 dataset	2021	[New Dataset and Strong Baselines for the Grammatical Error Correction of Russian]	[data]
Additional annotations for RULEC and RU-Lang8	2024	Multi-Reference Benchmarks for Russian Grammatical Error Correction	[RULEC] [RU-Lang8]
	2024	Universal Dependencies for Learner Russian	[code]
	2025	Grammatical Error Correction via Sequence Tagging for Russian	[code]
LORuGEC	2025	LLMs in alliance with Edit-based models: advancing In-Context Learning for Grammatical Error Correction by Specific Example Selection	[data] [code]

Spanish

Keywords / Overview	Year	Paper	Note
COWS-L2H	2020	[Developing NLP Tools with a New Corpus of Learner Spanish]	[data]

Swedish

Keywords / Overview	Year	Paper	Note
	2024	Evaluation of Really Good Grammatical Error Correction	code

Turkish

Ukrainian

Year	Paper	Note
UA-GEC	2023	[UA-GEC: Grammatical Error Correction and Fluency Corpus for the Ukrainian Language]	[data]
UNLP 2023 Shared Task	2023	The UNLP 2023 Shared Task on Grammatical Error Correction for Ukrainian
	2023	Comparative Study of Models Trained on Synthetic Data for Ukrainian Grammatical Error Correction	UNLP-2023: Pravopysnyk
	2023	A Low-Resource Approach to the Grammatical Error Correction of Ukrainian	UNLP-2023: QC-NLP
	2023	RedPenNet for Grammatical Error Correction: Outputs to Tokens, Attentions to Spans	UNLP-2023: WebSpellChecker