Building Educational Applications 2019 Shared Task:
Grammatical Error Correction
NEW! 25/01/2019: Training data released!
CALL FOR PARTICIPATION
================================================================================
Building Educational Applications 2019 Shared Task:
Grammatical Error Correction
Florence, Italy
August 2, 2019
https://www.cl.cam.ac.uk/research/nl/bea2019st/
================================================================================
Call for Participation
================================================================================
Grammatical error correction (GEC) is the task of automatically
correcting grammatical errors in text; e.g. [I follows his advices -> I
followed his advice]. It can be used to not only help language learners
improve their writing skills, but also alert native speakers to
accidental mistakes or typos.
GEC gained significant attention in the Helping Our Own (HOO) and CoNLL
shared tasks between 2011 and 2014, but has since become more difficult
to evaluate given a lack of standardised experimental settings. In
particular, recent systems have been trained, tuned and tested on
different combinations of corpora using different metrics. One of the
aims of this shared task is hence to once again provide a platform where
different approaches can be trained and tested under the same
conditions.
Another significant problem facing the field is that system performance
is still primarily benchmarked against the CoNLL-2014 test set, even
though this 5-year-old dataset only contains 50 essays on 2 different
topics written by 25 South-East Asian undergraduates in Singapore. This
means that systems have increasingly overfit to a very specific genre of
English and so do not generalise well to other domains. As a result,
this shared task introduces the Cambridge English Write & Improve (W&I)
corpus, a new error-annotated dataset that represents a much more
diverse cross-section of English language levels and domains. Write &
Improve is an online web platform that assists non-native English
students with their writing (https://writeandimprove.com/).
Participating teams will be provided with training and development data
from the W&I corpus to build their systems. Depending on the chosen
track, supplementary data may also be used. System output will be
evaluated on a blind test set using ERRANT
(https://github.com/chrisjbryant/errant).
In addition to learner data, we will provide an annotated development
and test set extracted from the LOCNESS corpus, a collection of essays
written by native English students compiled by the Centre for English
Corpus Linguistics at the University of Louvain.
Tracks
------
There are 3 tracks in the BEA 2019 shared task. Each track controls the
amount of annotated data that can be used in a system. We place no
restrictions on the amount of unannotated data that can be used (e.g.
for language modelling).
* Restricted
In the restricted setting, participants may only use the following
annotated datasets: FCE, Lang-8 Corpus of Learner English, NUCLE, W&I
and LOCNESS.
Note that we restrict participants to the preprocessed Lang-8 Corpus
of Learner English rather than the raw, multilingual Lang-8 Learner
Corpus because participants would otherwise need to filter the raw
corpus themselves.
* Unrestricted
In the unrestricted setting, participants may use any and all
datasets, including those in the restricted setting.
* Unsupervised (or minimally supervised)
In the unsupervised setting, participants may not use any annotated
training data. Since current state-of-the-art systems rely on as much
training data as possible to reach the best performance, the goal of the
unsupervised track is to encourage research into systems that do not
rely on annotated training data. This track should be of particular
interest to researchers working with low-resource languages. Since we
also expect this to be a challenging track however, we will allow
participants to use the W&I+LOCNESS development set to develop their
systems.
Participation
-------------
In order to participate in the BEA 2019 Shared Task, teams are required
to submit their system output any time between March 25-29, 2019 at
23:59 GMT. There is no explicit registration procedure. Further details
about the submission process will be provided soon.
Important Dates
---------------
Friday, Jan 25, 2019: New training data released
Monday, March 25, 2019: New test data released
Friday, March 29, 2019: System output submission deadline
Friday, April 12, 2019: System results announced
Friday, May 3, 2019: System paper submission deadline
Friday, May 17, 2019: Review deadline
Friday, May 24, 2019: Notification of acceptance
Friday, June 7, 2019: Camera-ready submission deadline
Friday, August 2, 2019: BEA-2019 Workshop (Florence, Italy)
Organisers
----------
Christopher Bryant, University of Cambridge
Mariano Felice, University of Cambridge
Øistein Andersen, University of Cambridge
Ted Briscoe, University of Cambridge
Contact
-------
Questions and queries about the shared task can be sent to
bea201...@gmail.com.
Further details can be found at
https://www.cl.cam.ac.uk/research/nl/bea2019st/
--
Kind regards,
Ekaterina Kochmar
_______________________________________________
uai mailing list
uai@ENGR.ORST.EDU
https://secure.engr.oregonstate.edu/mailman/listinfo/uai