GEC Information
Policy
- This repository aims to collect and categorize GEC (Grammatical Error Correction) papers.
- Unlike NLP-progress, GEC-Info does not consider performance on benchmarks.
- Authors and conferences are also not be considered.
- The papers are limited to refereed papers in international conferences for now.
- This is not the case for survey papers.
Contributing
- Pull Requests for adding papers are accepted. Please make a commit changing only lines regarding the addition of papers (and take care of changing by auto-formatting).
- You can also request to add papers as an issue.
It can also be viewed on GitHub Pages
Overview
- Surveys
- Shared Tasks
- Libraries
- Datasets
- Performance Measures
- Quality Estimation
- Models
- Ensembles / Post-processing
- Strategies
- Data Augmentation
- Analyses
- Other Tools
- Spoken Domain
- Applications
- Projects
- Other Materials
- Related Tasks
- Other Languages
Surveys
Title | Year | Page | Note |
---|---|---|---|
“Automated Grammatical Error Correction: A Comprehensive Review” | 2017 | [paper] | |
“A Comprehensive Survey of Grammar Error Correction” | 2020 | [paper] | |
“Recent Trends in the Use of Deep Learning Models for Grammar Error Handling” | 2020 | [paper] | |
“Grammatical Error Correction: A Survey of the State of the Art” | 2022 | [paper] |
Shared Tasks
Name | Year | Paper | Note |
---|---|---|---|
HOO 2011 | 2011 | [paper] | [website] |
HOO 2012 | 2012 | [paper] | [website] |
CoNLL-2013 | 2013 | [paper] | [website] |
CoNLL-2014 | 2014 | [paper] | [website] [system outputs] |
BEA-2019 | 2019 | [paper] | [website] [system outpus] |
Libraries
Name | Year | Paper | Note |
---|---|---|---|
UnifiedGEC | 2025 | UnifiedGEC: Integrating Grammatical Error Correction Approaches for Multi-languages with a Unified Framework | [code] |
gec-metrics | 2025 | gec-metrics: A Unified Library for Grammatical Error Correction Evaluation | [code] |
Datasets
For Training (Real Data)
For Training (Pseudo/Systhetic Data)
Name | Year | Paper | Note |
---|---|---|---|
PIE-synthetic | 2019 | [Parallel Iterative Edit Models for Local Sequence Transduction] | [download] |
OmniGEC | 2025 | Introducing OmniGEC: A Silver Multilingual Dataset for Grammatical Error Correction | [HF datasets], [code]. Czech, English, Estonian, German, Greek, Icelandic, Italian, Latvian, Slovene, Swedish, and Ukrainian |
For Evaluation
Performance measures
Reference-based
Reference-free
Meta-evaluation
Keywords / Overview | Year | Paper | Note |
---|---|---|---|
Re-rank the CoNLL14 systems by human evaluation | 2015 | Human Evaluation of Grammatical Error Correction Systems | [code] |
Reassess M^2, I-measure, GLEU by comparing human evaluation | 2018 | [A Reassessment of Reference-Based Grammatical Error Correction Metrics] | [code] |
MAEGE | 2018 | Automatic Metric Validation for Grammatical Error Correction | [code] |
SEEDA | 2024 | Revisiting Meta-evaluation for Grammatical Error Correction | [code] |
Quality Estimation
Keywords / Overview | Year | Paper | Note |
---|---|---|---|
2022 | Proficiency Matters Quality Estimation in Grammatical Error Correction |
Models
Before Neural (E.g., SMT)
Encoder-Decoder
|Keywords / Overview|Year|Paper|Note|
|:–|:–|:–|:–|
|First NMT-based approach|2016|[Grammatical error correction using neural machine translation]||
||2016|Neural Network Translation Models for Grammatical Error Correction||
|Neural reinforcement learning|2017|[Grammatical Error Correction with Neural Reinforcement Learning]|[code]|
|A nested attention (word and char attention)|2017|[A Nested Attention Neural Hybrid Model for Grammatical Error Correction]||
|Re-ranking N-best sentence (by SMT) with LSTM-based GED|2017|[Neural Sequence-Labelling Models for Grammatical Error Correction]||
|Hybrid SMT and NMT|2018|[Near Human-Level Performance in Grammatical Error Correction with Hybrid Machine Translation]||
|CNN-based Encder-Decoder approach| 2018|[A Multilayer Convolutional Encoder-Decoder Neural Network for Grammatical Error Correction]|[code]|
|Fluency boosting learning|2018|[Fluency Boost Learning and Inference for Neural Grammatical Error Correction]|[code] [arXiv]|
|Copy-Augmented Architecture|2019|[Improving Grammatical Error Correction via Pre-Training a Copy-Augmented Architecture with Unlabeled Data]|[code]|
|Consider a few previous sentences|2019|[Cross-Sentence Grammatical Error Correction]|[code]|
|Use sentence-level error dectection| 2019|[The AIP-Tohoku System at the BEA-2019 Shared Task]|BEA-2019: AIP-Tohoku||
|Four CNN + eight Transformer|2019|[The LAIX Systems in the BEA-2019 GEC Shared Task]|BEA-2019: LAIX|
|Combine Transformer+CNN with FST + Re-ranking|2019|[Neural and FST-based approaches to grammatical error correction]|BEA-2019: CAMB-CLED|
|Transformer seq2seq + BERT re-ranker|2019|[TMU Transformer System Using BERT for Re-ranking at BEA 2019 Grammatical Error Correction on Restricted Track]|BEA-2019: TMU|
|Apply noisy channel with BERT and GPT-2 as LM|2019|[Noisy Channel for Low Resource Grammatical Error Correction]|BEA-2019: Siteimprove|
|Use Finite State Transducers|2019|[Neural Grammatical Error Correction with Finite State Transducers]||
|BERT-fuse|2020|[Encoder-Decoder Models Can Benefit from Pre-trained Masked Language Models in Grammatical Error Correction]|[code]|
|Adversarial approach (G:seq2seq D:sentence-pair classification)|2020|[Adversarial Grammatical Error Correction]||
|Erroneous span correction and detection|2020|[Improving the Efficiency of Grammatical Error Correction with Erroneous Span Detection and Correction]||
|Document-level approach|2020|[Document-level grammatical error correction]|[code]|
|Beam search considering copy probability|2020|[Generating Diverse Corrections with Local Beam Search for Grammatical Error Correction]||
|BART-based|2020|[Stronger Baselines for Grammatical Error Correction Using a Pretrained Encoder-Decoder Model]|[code]|
|VERNet|2021|[Neural Quality Estimation with Multiple Hypotheses for Grammatical Error Correction]|[code]|
|Shallow Aggressive Decoding|2021|[Instantaneous Grammatical Error Correction with Shallow Aggressive Decoding]|[code]|
|T5-based|2021|[A Simple Recipe for Multilingual Grammatical Error Correction]|[code]|
|Use multiclass GED for Transformer seq2seq and reranking|2021|[Multi-Class Grammatical Error Detection for Correction: A Tale of Two Systems]||
|GEC for writing improvement model adapted to the writer’s L1|2021|[Beyond Grammatical Error Correction: Improving L1-influenced research writing in English using pre-trained encoder-decoder models]|[code]|
|Constrastive Leaning approach|2021|[Grammatical Error Correction with Contrastive Learning in Low Error Density Domains]|[code]|
|Sequence Span Rewriting|2021|[Improving Sequence-to-Sequence Pre-training via Sequence Span Rewriting]||
|Pretrain by DAE + sequential transfer learning|2019|[A Neural Grammatical Error Correction System Built On Better Pre-training and Sequential Transfer Learning]|[code]
BEA-2019: Kakao&Brain|
|Dependent Self-Attention (DSA)|2021|[Grammatical Error Correction with Dependency Distance]||
|A GEC model using only 11.6MB|2021|An efficient system for grammatical error correction on mobile devices||
|LM-Critic| 2021|LM-Critic: Language Models for Unsupervised Grammatical Error Correction| [code]
Supervised setting is also performed|
||2022|Interpretability for Language Learners Using Example-Based Grammatical Error Correction|[code]|
||2022|Position Offset Label Prediction for Grammatical Error Correction||
|SynGEC|2022|SynGEC: Syntax-Enhanced Grammatical Error Correction with a Tailored GEC-Oriented Parser|[code]|
|EdiT5|2022|EdiT5: Semi-Autoregressive Text Editing with T5 Warm-Start|[code]|
|GEC-DePenD|2023|GEC-DePenD: Non-Autoregressive Grammatical Error Correction with Decoupled Permutation and Decoding|[code]|
|TemplateGEC|2023|TemplateGEC: Improving Grammatical Error Correction with Detection Template|[code]|
|LET|2023|LET: Leveraging Error Type Information for Grammatical Error Correction||
||2023|Leveraging Denoised Abstract Meaning Representation for Grammatical Error Correction||
|Use speech information|2023|Improving Grammatical Error Correction with Multimodal Feature Integration|[code]|
||2023|Improving Autoregressive Grammatical Error Correction with Non-autoregressive Models||
||2023|Unsupervised Grammatical Error Correction Rivaling Supervised Methods|[code]|
||2024|No Error Left Behind: Multilingual Grammatical Error Correction with Pre-trained Translation Models||
|EDU Copy Mechanism|2024|Improving Copy-oriented Text Generation via EDU Copy Mechanism||
||2024|Efficient and Interpretable Grammatical Error Correction with Mixture of Experts|[code]|
||2025|InstructGEC: Enhancing Unsupervised Grammatical Error Correction with Instruction Tuning||
|CxGGEC|2025|CxGGEC: Construction-Guided Grammatical Error Correction||
Tagging / Non-autogressive
|Keywords / Overview|Year|Paper|Note| |:–|:–|:–|:–| |LSTM tagger for word coice task|2019|[Choosing the Right Word: Using Bidirectional LSTM Tagger for Writing Support Systems]|[code]| |PIE|2019|[Parallel Iterative Edit Models for Local Sequence Transduction]|[code]| |LaserTagger|2019|[Encode, Tag, Realize: High-Precision Text Editing]|[code]| |GECToR|2020|[GECToR – Grammatical Error Correction: Tag, Not Rewrite]|[code]| |Seq2Edits|2020|[Seq2Edits: Sequence Transduction Using Span-level Edit Operations]|[code]| |GAN-like sequence labeling|2021|[Grammatical Error Correction as GAN-like Sequence Labeling]|| |GECToR Large|2022|Ensembling and Knowledge Distilling of Large Sequence Taggers for Grammatical Error Correction|[code] [Author’s Master Thesis]| ||2021|Efficient Grammatical Error Correction with Hierarchical Error Detections and Correction|[code]| ||2022|Type-Driven Multi-Turn Corrections for Grammatical Error Correction|[code]| ||2023|An Extended Sequence Tagging Vocabulary for Grammatical Error Correction|[code]|
Large Language Model
|Keywords / Overview|Year|Paper|Note| |:–|:–|:–|:–| |5-gram LM based approach| 2018|Language Model Based Grammatical Error Correction without Annotated Training Data| [code]| |Use Finite State Transducers|2019|Neural Grammatical Error Correction with Finite State Transducers|| |Use LM (BERT, GPT-1,2)|2019|[The Unreasonable Effectiveness of Transformer Language Models in Grammatical Error Correction]|| ||2023|Reducing Sequence Length by Predicting Edit Spans with Large Language Models|| ||2023|Exploring Effectiveness of GPT-3 in Grammatical Error Correction: A Study on Performance and Controllability in Prompt-Based Methods|| ||2024|Evaluating Prompting Strategies for Grammatical Error Correction Based on Language Proficiency|| ||2024|GPT-3.5 for Grammatical Error Correction|Target languages: CZ, DE, EN, RU, SV, UA| ||2024|Ungrammatical-syntax-based In-context Example Selection for Grammatical Error Correction|[code]| |mEdIT|2024|mEdIT: Multilingual Text Editing via Instruction Tuning|[code]| |DeCoGLM|2024|Detection-Correction Structure via General Language Model for Grammatical Error Correction|[code]| |For code-switched text|2024|LLM-based Code-Switched Text Generation for Grammatical Error Correction|[code]| |EPO|2025|Edit-Wise Preference Optimization for Grammatical Error Correction|| ||2024|Prompting open-source and commercial language models for grammatical error correction of English learner text|[code]| ||2025|Explanation based In-Context Demonstrations Retrieval for Multilingual Grammatical Error Correction|[code]| ||2025|Encode Errors: Representational Retrieval of In-Context Demonstrations for Multilingual Grammatical Error Correction|[code]| ||2025|Adapting LLMs for Minimal-edit Grammatical Error Correction|[code]|
Ensembles / Post-processing
Keywords / Overview | Year | Paper | Note |
---|---|---|---|
Use MENT | 2014 | System Combination for Grammatical Error Correction | |
2016 | Grammatical Error Correction: Machine Translation and Classifiers | ||
2019 | [Learning to combine Grammatical Error Corrections] | [code] | |
Diversity-Driven Combination (DDC) | 2021 | [Diversity-Driven Combination for Grammatical Error Correction] | [code] |
Select a system for each error type with IP | 2021 | [System Combination for Grammatical Error Correction Based on Integer Programming] | [code] |
2022 | Frustratingly Easy System Combination for Grammatical Error Correction | [code] | |
EditScorer | 2022 | Improved grammatical error correction by ranking elementary edits | [code] |
GRECO | 2023 | System Combination via Quality Estimation for Grammatical Error Correction | [code] |
2024 | Improving Grammatical Error Correction by Correction Acceptability Discrimination |
Strategies
This includes methods such as decoding techniques and approaches that modify the loss function while keeping the model architecture unchanged.
Data Augmentation
Data Cleaning
Keywords / Overview | Year | Paper | Note |
---|---|---|---|
A Self-Refinement Strategy for Noise Reduction | 2020 | [A Self-Refinement Strategy for Noise Reduction in Grammatical Error Correction] | |
cLang8 (Cleaned Lang-8) | 2021 | [A Simple Recipe for Multilingual Grammatical Error Correction] | [code] |
Analyses
Spoken Domain
Keywords / Overview | Year | Paper | Note |
---|---|---|---|
2019 | AUTOMATIC GRAMMATICAL ERROR DETECTION OF NON-NATIVE SPOKEN LEARNER ENGLISH | ||
2020 | Grammatical error detection in transcriptions of spoken English | ||
Disfluency detection (DD) model | 2020 | Spoken Language ‘Grammatical Error Correction’ | |
2022 | On Assessing and Developing Spoken ’Grammatical Error Correction’ Systems |
Applications
Name | Year | Paper | Note |
---|---|---|---|
GECko++ | [GECko+: a Grammatical and Discourse Error Correction Tool] | [website] [code] An English assiting tool. Correction grammatical error and re-ordering sentences automatically. |
|
MiSS | 2021 | [MiSS: An Assistant for Multi-Style Simultaneous Translation] | [website] [demo video] |
ALLECS | 2023 | ALLECS: A Lightweight Language Error Correction System | [website] [code] |
2023 | Doolittle: Benchmarks and Corpora for Academic Writing Formalization | [code] | |
2025 | Who Wrote This? The Key to Zero-Shot LLM-Generated Text Detection Is GECScore | [code] |
Projects
Name | Website |
---|---|
GramFormer | [GitHub] |
Other Tools
Name | Code | Note |
---|---|---|
Lang8-NAIST-extractor | [code] | Scripts for extracting error-correct pairs from the Lang-8 Corpus. |
M2Converter | [code] | Scripts for converting m2 file into source file and target file. |
EFCamDat-Preprocess | [code] |
Other materials
Name | Paper | Note |
---|---|---|
NLP-progress | [website] The performance ranking on some datasets. |
|
A Crash Course in Automatic Grammatical Error Correction | [paper] | [materials] The tutorial about GEC in COLING2020. |
Chunngai/gec-papers | [github] The papers are being compiled around 2019-2020? |
Related Tasks
Grammatical Error Detection
Feedback Comment Generation
Keywords / Overview | Year | Paper | Note |
---|---|---|---|
2014 | [Correcting Preposition Errors in Learner English Using Error Case Frames and Feedback Messages] | ||
English grammar checker with feedback in Japanese | 2018 | [Grammatical Error Checker for Japanese Learners of English] | This is not a research as a feedback comment generation, but I classify it here for now |
2019 | [Toward a Task of Feedback Comment Generation for Writing Learning] | ||
2020 | [Creating Corpora for Research in Feedback Comment Generation] | ||
2021 | [Shared Task on Feedback Comment Generation for Language Learners] | ||
2023 | Template-guided Grammatical Error Feedback Comment Generation |
Explainable Grammatical Error Correction
- Studies to explain the reasons for and intentions of error correction.
Keywords / Overview | Year | Paper | Note |
---|---|---|---|
EXPECT | 2023 | Enhancing Grammatical Error Correction Systems with Explanations | [code] |
XGEC dataset | 2024 | Controlled Generation with Prompt Insertion for Natural Language Explanations in Grammatical Error Correction | [data] |
GEE | 2024 | GEE! Grammar Error Explanation with Large Language Models | [code] |
Document-level Revision
Keywords / Overview | Year | Paper | Note |
---|---|---|---|
TETRA | 2024 | Towards Automated Document Revision: Grammatical Error Correction, Fluency Edits, and Beyond | [code] |
Other Languages
Arabic
Keywords / Overview | Year | Paper | Note |
---|---|---|---|
Arabic Learner Corpus | 2013 | [Arabic Learner Corpus v1: A New Resource for Arabic Language Research] | [website] |
QALB | 2014 | [Large Scale Arabic Error Annotation: Guidelines and Framework] | [QALB Project Website] |
QALB 2014 Shared Task | 2014 | [The First QALB Shared Task on Automatic Text Correction for Arabic] | [website] |
QALB 2015 Shared Task | 2015 | [The Second QALB Shared Task on Automatic Text Correction for Arabic] | |
ARETA | 2021 | [Automatic Error Type Annotation for Arabic] | [code] |
2023 | Advancements in Arabic Grammatical Error Detection and Correction: An Empirical Investigation | [code] | |
2023 | Beyond English: Evaluating LLMs for Arabic Grammatical Error Correction | [[code]] | |
2025 | ARWI: Arabic Write and Improve | [website] | |
2025 | Enhancing Text Editing for Grammatical Error Correction: Arabic as a Case Study | [code] |
Bangla
Chinese
Czech
Keywords / Overview | Year | Paper | Note |
---|---|---|---|
AKCES-GEC dataset | 2019 | [Grammatical Error Correction in Low-Resource Scenarios] | [data] |
Grammar Error Correction Corpus for Czech (GECCC) | 2022 | Czech Grammar Error Correction with a Large and Diverse Corpus | [data] |
Estonian
Keywords / Overview | Year | Paper | Note |
---|---|---|---|
2025 | Paragraph-level Error Correction and Explanation Generation: Case Study for Estonian | [data] |
Finnish
|Keywords / Overview|Year|Paper|Note| |:–|:–|:–|:–| ||2024|Correcting Challenging Finnish Learner Texts With Claude, GPT-3.5 and GPT-4 Large Language Models ||
Geek
Keywords / Overview | Year | Paper | Note |
---|---|---|---|
Greek Learner Corpus | 2018 | [Stand-off annotation in learner corpora: compiling the Greek Learner Corpus (GLC)] | |
ELERRANT | 2021 | [ELERRANT: Automatic Grammatical Error Type Classification for Greek] | [code] |
German
Keywords / Overview | Year | Paper | Note |
---|---|---|---|
Falko-MERLIN dataset | 2018 | [Using Wikipedia Edits in Low Resource Grammatical Error Correction] | [data] |
Hindi
Keywords / Overview | Year | Paper | Note |
---|---|---|---|
2014 | [Detection and correction of non word spelling errors in Hindi language] | ||
HiWikiEd dataset | 2020 | [Generating Inflectional Errors for Grammatical Error Correction in Hindi] | [data] |
Hi-GEC | 2025 | Hi-GEC: Hindi Grammar Error Correction in Low Resource Scenario | [code] |
Icelandic
|Keywords / Overview|Year|Paper|Note| |:–|:–|:–|:–| |Byte-level approach|2023|Byte-Level Grammatical Error Correction Using Synthetic and Curated Corpora|[code]|
Japanese
Keywords / Overview | Year | Paper | Note |
---|---|---|---|
Character-level RNN-based seq2seq | 2018 | [Automatic Error Correction on Japanese Functional Expressions Using Character-based Neural Machine Translation] | |
Constructing retrieval system for Japanese GEC | 2019 | [Grammatical-Error-Aware Incorrect Example Retrieval System for Learners of Japanese as a Second Language] | |
TMU Evaluation Corpus for Japanese Learners | 2020 | [Construction of an Evaluation Corpus for Grammatical Error Correction for Learners of Japanese as a Second Language] | [data: Fill this form] |
Non-Autoregressive approach | 2020 | [Non-Autoregressive Grammatical Error Correction Toward a Writing Support System] | |
2022 | Construction of a Quality Estimation Dataset for Automatic Evaluation of Japanese Grammatical Error Correction |
Korean
|Keywords / Overview|Year|Paper|Note| |:–|:–|:–|:–| |KAGAS|2023|Towards standardizing Korean Grammatical Error Correction: Datasets and Annotation|[code] [data request form]| ||2024|Search if you don’t know! Knowledge-Augmented Korean Grammatical Error Correction with Large Language Models|| ||2025|Unified Automated Essay Scoring and Grammatical Error Correction||
Lithuanian
|Keywords / Overview|Year|Paper|Note| |:–|:–|:–|:–| ||2022|Towards Lithuanian grammatical error correction|[code]|
Romain
Keywords / Overview | Year | Paper | Note |
---|---|---|---|
2020 | [Neural Grammatical Error Correction for Romanian] | [code] |
Russian
Keywords / Overview | Year | Paper | Note |
---|---|---|---|
RULEC-GEC dataset | 2019 | [Grammar Error Correction in Morphologically Rich Languages: The Case of Russian] | [data] |
RU-Lang8 dataset | 2021 | [New Dataset and Strong Baselines for the Grammatical Error Correction of Russian] | [data] |
Additional annotations for RULEC and RU-Lang8 | 2024 | Multi-Reference Benchmarks for Russian Grammatical Error Correction | [RULEC] [RU-Lang8] |
2024 | Universal Dependencies for Learner Russian | [code] | |
2025 | Grammatical Error Correction via Sequence Tagging for Russian | [code] | |
LORuGEC | 2025 | LLMs in alliance with Edit-based models: advancing In-Context Learning for Grammatical Error Correction by Specific Example Selection | [data] [code] |
Spanish
Keywords / Overview | Year | Paper | Note |
---|---|---|---|
COWS-L2H | 2020 | [Developing NLP Tools with a New Corpus of Learner Spanish] | [data] |
Swedish
Keywords / Overview | Year | Paper | Note |
---|---|---|---|
2024 | Evaluation of Really Good Grammatical Error Correction | code |
Turkish
|Keywords / Overview|Year|Paper|Note| |:–|:–|:–|:–| |ERRANT-TR|2023|Towards Automatic Grammatical Error Type Classification for Turkish|[code]| |GECTurk WEB|2025|GECTurk WEB: An Explainable Online Platform for Turkish Grammatical Error Detection and Correction|[website]|
Ukrainian
Keywords / Overview | Year | Paper | Note | |
---|---|---|---|---|
UA-GEC | 2023 | [UA-GEC: Grammatical Error Correction and Fluency Corpus for the Ukrainian Language] | [data] | |
UNLP 2023 Shared Task | 2023 | The UNLP 2023 Shared Task on Grammatical Error Correction for Ukrainian | ||
2023 | Comparative Study of Models Trained on Synthetic Data for Ukrainian Grammatical Error Correction | UNLP-2023: Pravopysnyk | ||
2023 | A Low-Resource Approach to the Grammatical Error Correction of Ukrainian | UNLP-2023: QC-NLP | ||
2023 | RedPenNet for Grammatical Error Correction: Outputs to Tokens, Attentions to Spans | UNLP-2023: WebSpellChecker |