GEC Information
Policy
- This repository aims to collect and categorize GEC (Grammatical Error Correction) papers.
- Unlike NLP-progress, GEC-Info does not consider performance on benchmarks.
- Authors and conferences are also not be considered.
- The papers are limited to refereed papers in international conferences for now.
- This is not the case for survey papers.
Contributing
- Pull Requests for adding papers are accepted. Please make a commit changing only lines regarding the addition of papers (and take care of changing by auto-formatting).
- You can also request to add papers as an issue.
It can also be viewed on GitHub Pages
Overview
- Surveys
- Shared Tasks
- Libraries
- Datasets
- Performance Measures
- Quality Estimation
- Models
- Ensembles / Post-processing
- Strategies
- Data Augmentation
- Analyses
- Other Tools
- Spoken Domain
- Applications
- Projects
- Other Materials
- Related Tasks
- Other Languages
Surveys
| Title | Year | Page | Note |
|---|---|---|---|
| “Automated Grammatical Error Correction: A Comprehensive Review” | 2017 | [paper] | |
| “A Comprehensive Survey of Grammar Error Correction” | 2020 | [paper] | |
| “Recent Trends in the Use of Deep Learning Models for Grammar Error Handling” | 2020 | [paper] | |
| “Grammatical Error Correction: A Survey of the State of the Art” | 2022 | [paper] |
Shared Tasks
| Name | Year | Paper | Note |
|---|---|---|---|
| HOO 2011 | 2011 | [paper] | [website] |
| HOO 2012 | 2012 | [paper] | [website] |
| CoNLL-2013 | 2013 | [paper] | [website] |
| CoNLL-2014 | 2014 | [paper] | [website] [system outputs] |
| BEA-2019 | 2019 | [paper] | [website] [system outpus] |
Libraries
| Name | Year | Paper | Note |
|---|---|---|---|
| UnifiedGEC | 2025 | UnifiedGEC: Integrating Grammatical Error Correction Approaches for Multi-languages with a Unified Framework | [code] |
| gec-metrics | 2025 | gec-metrics: A Unified Library for Grammatical Error Correction Evaluation | [code] |
Datasets
For Training (Real Data)
For Training (Pseudo/Systhetic Data)
| Name | Year | Paper | Note |
|---|---|---|---|
| PIE-synthetic | 2019 | Parallel Iterative Edit Models for Local Sequence Transduction | [download] |
| OmniGEC | 2025 | Introducing OmniGEC: A Silver Multilingual Dataset for Grammatical Error Correction | [HF datasets], [code]. Czech, English, Estonian, German, Greek, Icelandic, Italian, Latvian, Slovene, Swedish, and Ukrainian |
For Evaluation
Performance measures
Reference-based
Reference-free
Meta-evaluation
| Keywords / Overview | Year | Paper | Note |
|---|---|---|---|
| Re-rank the CoNLL14 systems by human evaluation | 2015 | Human Evaluation of Grammatical Error Correction Systems | [code] |
| Reassess M^2, I-measure, GLEU by comparing human evaluation | 2018 | A Reassessment of Reference-Based Grammatical Error Correction Metrics | [code] |
| MAEGE | 2018 | Automatic Metric Validation for Grammatical Error Correction | [code] |
| SEEDA | 2024 | Revisiting Meta-evaluation for Grammatical Error Correction | [code] |
Quality Estimation
| Keywords / Overview | Year | Paper | Note |
|---|---|---|---|
| 2022 | ProQE: Proficiency-wise Quality Estimation dataset for Grammatical Error Correction |
Models
Before Neural (E.g., SMT)
Encoder-Decoder
Tagging / Non-autogressive
| Keywords / Overview | Year | Paper | Note |
|---|---|---|---|
| LSTM tagger for word coice task | 2019 | Choosing the Right Word: Using Bidirectional LSTM Tagger for Writing Support Systems | [code] |
| PIE | 2019 | Parallel Iterative Edit Models for Local Sequence Transduction | [code] |
| LaserTagger | 2019 | Encode, Tag, Realize: High-Precision Text Editing | [code] |
| GECToR | 2020 | GECToR – Grammatical Error Correction: Tag, Not Rewrite | [code] |
| Seq2Edits | 2020 | Seq2Edits: Sequence Transduction Using Span-level Edit Operations | [code] |
| GAN-like sequence labeling | 2021 | Grammatical Error Correction as GAN-like Sequence Labeling | |
| GECToR Large | 2022 | Ensembling and Knowledge Distilling of Large Sequence Taggers for Grammatical Error Correction | [code] [Author’s Master Thesis] |
| 2021 | Efficient Grammatical Error Correction with Hierarchical Error Detections and Correction | [code] | |
| 2022 | Type-Driven Multi-Turn Corrections for Grammatical Error Correction | [code] | |
| 2023 | An Extended Sequence Tagging Vocabulary for Grammatical Error Correction | [code] |
Large Language Model
Ensembles / Post-processing
| Keywords / Overview | Year | Paper | Note |
|---|---|---|---|
| Use MENT | 2014 | System Combination for Grammatical Error Correction | |
| 2016 | Grammatical Error Correction: Machine Translation and Classifiers | ||
| 2019 | Learning to combine Grammatical Error Corrections | [code] | |
| Diversity-Driven Combination (DDC) | 2021 | Diversity-Driven Combination for Grammatical Error Correction | [code] |
| Select a system for each error type with IP | 2021 | System Combination for Grammatical Error Correction Based on Integer Programming | [code] |
| 2022 | Frustratingly Easy System Combination for Grammatical Error Correction | [code] | |
| EditScorer | 2022 | Improved grammatical error correction by ranking elementary edits | [code] |
| GRECO | 2023 | System Combination via Quality Estimation for Grammatical Error Correction | [code] |
| 2024 | Improving Grammatical Error Correction by Correction Acceptability Discrimination |
Strategies
This includes methods such as decoding techniques and approaches that modify the loss function while keeping the model architecture unchanged.
Data Augmentation
Data Cleaning
| Keywords / Overview | Year | Paper | Note |
|---|---|---|---|
| A Self-Refinement Strategy for Noise Reduction | 2020 | A Self-Refinement Strategy for Noise Reduction in Grammatical Error Correction | |
| cLang8 (Cleaned Lang-8) | 2021 | A Simple Recipe for Multilingual Grammatical Error Correction | [code] |
Analyses
Spoken Domain
| Keywords / Overview | Year | Paper | Note |
|---|---|---|---|
| 2019 | AUTOMATIC GRAMMATICAL ERROR DETECTION OF NON-NATIVE SPOKEN LEARNER ENGLISH | ||
| 2020 | Grammatical error detection in transcriptions of spoken English | ||
| Disfluency detection (DD) model | 2020 | Spoken Language ‘Grammatical Error Correction’ | |
| 2022 | On Assessing and Developing Spoken ’Grammatical Error Correction’ Systems |
Applications
| Name | Year | Paper | Note |
|---|---|---|---|
| GECko++ | GECko+: a Grammatical and Discourse Error Correction Tool | [website] [code] An English assiting tool. Correction grammatical error and re-ordering sentences automatically. |
|
| MiSS | 2021 | MiSS: An Assistant for Multi-Style Simultaneous Translation | [website] [demo video] |
| ALLECS | 2023 | ALLECS: A Lightweight Language Error Correction System | [website] [code] |
| 2023 | Doolittle: Benchmarks and Corpora for Academic Writing Formalization | [code] | |
| 2025 | Who Wrote This? The Key to Zero-Shot LLM-Generated Text Detection Is GECScore | [code] |
Projects
| Name | Website |
|---|---|
| GramFormer | [GitHub] |
Other Tools
| Name | Code | Note |
|---|---|---|
| Lang8-NAIST-extractor | [code] | Scripts for extracting error-correct pairs from the Lang-8 Corpus. |
| M2Converter | [code] | Scripts for converting m2 file into source file and target file. |
| EFCamDat-Preprocess | [code] |
Other materials
| Name | Paper | Note |
|---|---|---|
| NLP-progress | website The performance ranking on some datasets. |
|
| A Crash Course in Automatic Grammatical Error Correction | [paper] | materials The tutorial about GEC in COLING2020. |
| Chunngai/gec-papers | github |
Related Tasks
Grammatical Error Detection
Feedback Comment Generation
| Keywords / Overview | Year | Paper | Note |
|---|---|---|---|
| 2014 | Correcting Preposition Errors in Learner English Using Error Case Frames and Feedback Messages | ||
| English grammar checker with feedback in Japanese | 2018 | Grammatical Error Checker for Japanese Learners of English | This is not a research as a feedback comment generation, but I classify it here for now |
| 2019 | Toward a Task of Feedback Comment Generation for Writing Learning | ||
| 2020 | Creating Corpora for Research in Feedback Comment Generation | ||
| 2021 | Shared Task on Feedback Comment Generation for Language Learners | ||
| 2023 | Template-guided Grammatical Error Feedback Comment Generation |
Explainable Grammatical Error Correction
- Studies to explain the reasons for and intentions of error correction.
| Keywords / Overview | Year | Paper | Note |
|---|---|---|---|
| EXPECT | 2023 | Enhancing Grammatical Error Correction Systems with Explanations | [code] |
| XGEC dataset | 2024 | Controlled Generation with Prompt Insertion for Natural Language Explanations in Grammatical Error Correction | [data] |
| GEE | 2024 | GEE! Grammar Error Explanation with Large Language Models | [code] |
Document-level Revision
| Keywords / Overview | Year | Paper | Note |
|---|---|---|---|
| TETRA | 2024 | Towards Automated Document Revision: Grammatical Error Correction, Fluency Edits, and Beyond | [code] |
Other Languages
Arabic
| Keywords / Overview | Year | Paper | Note |
|---|---|---|---|
| Arabic Learner Corpus | 2013 | Arabic Learner Corpus v1: A New Resource for Arabic Language Research | [website] |
| QALB | 2014 | Large Scale Arabic Error Annotation: Guidelines and Framework | [QALB Project Website] |
| QALB 2014 Shared Task | 2014 | The First QALB Shared Task on Automatic Text Correction for Arabic | [website] |
| QALB 2015 Shared Task | 2015 | The Second QALB Shared Task on Automatic Text Correction for Arabic | |
| ARETA | 2021 | Automatic Error Type Annotation for Arabic | [code] |
| 2023 | Advancements in Arabic Grammatical Error Detection and Correction: An Empirical Investigation | [code] | |
| 2023 | Beyond English: Evaluating LLMs for Arabic Grammatical Error Correction | ||
| 2025 | ARWI: Arabic Write and Improve | [website] | |
| 2025 | Enhancing Text Editing for Grammatical Error Correction: Arabic as a Case Study | [code] |
Bangla
Chinese
Czech
| Keywords / Overview | Year | Paper | Note |
|---|---|---|---|
| AKCES-GEC dataset | 2019 | Grammatical Error Correction in Low-Resource Scenarios | [data] |
| Grammar Error Correction Corpus for Czech (GECCC) | 2022 | Czech Grammar Error Correction with a Large and Diverse Corpus | [data] |
Estonian
| Keywords / Overview | Year | Paper | Note |
|---|---|---|---|
| 2025 | Paragraph-level Error Correction and Explanation Generation: Case Study for Estonian | [data] |
Finnish
|Keywords / Overview|Year|Paper|Note| |:–|:–|:–|:–| ||2024|Correcting Challenging Finnish Learner Texts With Claude, GPT-3.5 and GPT-4 Large Language Models||
Geek
| Keywords / Overview | Year | Paper | Note |
|---|---|---|---|
| Greek Learner Corpus | 2018 | Stand-off annotation in learner corpora: compiling the Greek Learner Corpus (GLC) | |
| ELERRANT | 2021 | ELERRANT: Automatic Grammatical Error Type Classification for Greek | [code] |
German
| Keywords / Overview | Year | Paper | Note |
|---|---|---|---|
| Falko-MERLIN dataset | 2018 | Using Wikipedia Edits in Low Resource Grammatical Error Correction | [data] |
Indian
| Keywords / Overview | Year | Paper | Note |
|---|---|---|---|
| 2014 | Detection and correction of non word spelling errors in Hindi language | ||
| HiWikiEd dataset | 2020 | Generating Inflectional Errors for Grammatical Error Correction in Hindi | [data] |
| Hi-GEC | 2025 | Hi-GEC: Hindi Grammar Error Correction in Low Resource Scenario | [code] |
| IndiGEC | 2025 | IndiGEC: Multilingual Grammar Error Correction for Low-Resource Indian Languages | [code] |
Icelandic
|Keywords / Overview|Year|Paper|Note| |:–|:–|:–|:–| |Byte-level approach|2023|Byte-Level Grammatical Error Correction Using Synthetic and Curated Corpora|[code]|
Japanese
| Keywords / Overview | Year | Paper | Note |
|---|---|---|---|
| Character-level RNN-based seq2seq | 2018 | Automatic Error Correction on Japanese Functional Expressions Using Character-based Neural Machine Translation | |
| Constructing retrieval system for Japanese GEC | 2019 | Grammatical-Error-Aware Incorrect Example Retrieval System for Learners of Japanese as a Second Language | |
| TMU Evaluation Corpus for Japanese Learners | 2020 | Construction of an Evaluation Corpus for Grammatical Error Correction for Learners of Japanese as a Second Language | [data: Fill this form] |
| Non-Autoregressive approach | 2020 | Non-Autoregressive Grammatical Error Correction Toward a Writing Support System | |
| 2022 | Construction of a Quality Estimation Dataset for Automatic Evaluation of Japanese Grammatical Error Correction |
Korean
|Keywords / Overview|Year|Paper|Note| |:–|:–|:–|:–| |KAGAS|2023|Towards standardizing Korean Grammatical Error Correction: Datasets and Annotation|[code] [data request form]| ||2024|Search if you don’t know! Knowledge-Augmented Korean Grammatical Error Correction with Large Language Models|| ||2025|Unified Automated Essay Scoring and Grammatical Error Correction||
Lithuanian
|Keywords / Overview|Year|Paper|Note| |:–|:–|:–|:–| ||2022|Towards Lithuanian grammatical error correction|[code]|
Romain
| Keywords / Overview | Year | Paper | Note |
|---|---|---|---|
| 2020 | Neural Grammatical Error Correction for Romanian | [code] |
Russian
| Keywords / Overview | Year | Paper | Note |
|---|---|---|---|
| RULEC-GEC dataset | 2019 | Grammar Error Correction in Morphologically Rich Languages: The Case of Russian | [data] |
| RU-Lang8 dataset | 2021 | New Dataset and Strong Baselines for the Grammatical Error Correction of Russian | [data] |
| Additional annotations for RULEC and RU-Lang8 | 2024 | Multi-Reference Benchmarks for Russian Grammatical Error Correction | [RULEC] [RU-Lang8] |
| 2024 | Universal Dependencies for Learner Russian | [code] | |
| 2025 | Grammatical Error Correction via Sequence Tagging for Russian | [code] | |
| LORuGEC | 2025 | LLMs in alliance with Edit-based models: advancing In-Context Learning for Grammatical Error Correction by Specific Example Selection | [data] [code] |
Spanish
| Keywords / Overview | Year | Paper | Note |
|---|---|---|---|
| COWS-L2H | 2020 | Developing NLP Tools with a New Corpus of Learner Spanish | [data] |
Swedish
| Keywords / Overview | Year | Paper | Note |
|---|---|---|---|
| 2024 | Evaluation of Really Good Grammatical Error Correction | code |
Turkish
|Keywords / Overview|Year|Paper|Note| |:–|:–|:–|:–| |ERRANT-TR|2023|Towards Automatic Grammatical Error Type Classification for Turkish|[code]| |GECTurk WEB|2025|GECTurk WEB: An Explainable Online Platform for Turkish Grammatical Error Detection and Correction|[website]|
Ukrainian
| Keywords / Overview | Year | Paper | Note | |
|---|---|---|---|---|
| UA-GEC | 2023 | UA-GEC: Grammatical Error Correction and Fluency Corpus for the Ukrainian Language | [data] | |
| UNLP 2023 Shared Task | 2023 | The UNLP 2023 Shared Task on Grammatical Error Correction for Ukrainian | ||
| 2023 | Comparative Study of Models Trained on Synthetic Data for Ukrainian Grammatical Error Correction | UNLP-2023: Pravopysnyk | ||
| 2023 | A Low-Resource Approach to the Grammatical Error Correction of Ukrainian | UNLP-2023: QC-NLP | ||
| 2023 | RedPenNet for Grammatical Error Correction: Outputs to Tokens, Attentions to Spans | UNLP-2023: WebSpellChecker |