[UAI] 2nd CfP: BEA2019 GEC Shared Task – Training data released!

Ekaterina Kochmar Mon, 28 Jan 2019 10:16:38 -0800

Building Educational Applications 2019 Shared Task:
Grammatical Error Correction


NEW! 25/01/2019: Training data released!

CALL FOR PARTICIPATION

================================================================================

Building Educational Applications 2019 Shared Task:
Grammatical Error Correction
Florence, Italy
August 2, 2019

https://www.cl.cam.ac.uk/research/nl/bea2019st/

================================================================================
Call for Participation
================================================================================

Grammatical error correction (GEC) is the task of automaticallycorrecting grammatical errors in text; e.g. [I follows his advices -> Ifollowed his advice]. It can be used to not only help language learnersimprove their writing skills, but also alert native speakers toaccidental mistakes or typos.

GEC gained significant attention in the Helping Our Own (HOO) and CoNLLshared tasks between 2011 and 2014, but has since become more difficultto evaluate given a lack of standardised experimental settings. Inparticular, recent systems have been trained, tuned and tested ondifferent combinations of corpora using different metrics. One of theaims of this shared task is hence to once again provide a platform wheredifferent approaches can be trained and tested under the sameconditions.

Another significant problem facing the field is that system performanceis still primarily benchmarked against the CoNLL-2014 test set, eventhough this 5-year-old dataset only contains 50 essays on 2 differenttopics written by 25 South-East Asian undergraduates in Singapore. Thismeans that systems have increasingly overfit to a very specific genre ofEnglish and so do not generalise well to other domains. As a result,this shared task introduces the Cambridge English Write & Improve (W&I)corpus, a new error-annotated dataset that represents a much morediverse cross-section of English language levels and domains. Write &Improve is an online web platform that assists non-native Englishstudents with their writing (https://writeandimprove.com/).

Participating teams will be provided with training and development datafrom the W&I corpus to build their systems. Depending on the chosentrack, supplementary data may also be used. System output will beevaluated on a blind test set using ERRANT(https://github.com/chrisjbryant/errant).

In addition to learner data, we will provide an annotated developmentand test set extracted from the LOCNESS corpus, a collection of essayswritten by native English students compiled by the Centre for EnglishCorpus Linguistics at the University of Louvain.


Tracks
------

There are 3 tracks in the BEA 2019 shared task. Each track controls theamount of annotated data that can be used in a system. We place norestrictions on the amount of unannotated data that can be used (e.g.for language modelling).


* Restricted

In the restricted setting, participants may only use the followingannotated datasets: FCE, Lang-8 Corpus of Learner English, NUCLE, W&Iand LOCNESS.Note that we restrict participants to the preprocessed Lang-8 Corpusof Learner English rather than the raw, multilingual Lang-8 LearnerCorpus because participants would otherwise need to filter the rawcorpus themselves.


* Unrestricted

In the unrestricted setting, participants may use any and alldatasets, including those in the restricted setting.


* Unsupervised (or minimally supervised)

In the unsupervised setting, participants may not use any annotatedtraining data. Since current state-of-the-art systems rely on as muchtraining data as possible to reach the best performance, the goal of theunsupervised track is to encourage research into systems that do notrely on annotated training data. This track should be of particularinterest to researchers working with low-resource languages. Since wealso expect this to be a challenging track however, we will allowparticipants to use the W&I+LOCNESS development set to develop theirsystems.


Participation
-------------

In order to participate in the BEA 2019 Shared Task, teams are requiredto submit their system output any time between March 25-29, 2019 at23:59 GMT. There is no explicit registration procedure. Further detailsabout the submission process will be provided soon.


Important Dates
---------------
Friday, Jan 25, 2019: New training data released
Monday, March 25, 2019: New test data released
Friday, March 29, 2019: System output submission deadline
Friday, April 12, 2019: System results announced
Friday, May 3, 2019: System paper submission deadline
Friday, May 17, 2019: Review deadline
Friday, May 24, 2019: Notification of acceptance
Friday, June 7, 2019: Camera-ready submission deadline
Friday, August 2, 2019: BEA-2019 Workshop (Florence, Italy)

Organisers
----------
Christopher Bryant, University of Cambridge
Mariano Felice, University of Cambridge
Øistein Andersen, University of Cambridge
Ted Briscoe, University of Cambridge

Contact
-------

Questions and queries about the shared task can be sent tobea201...@gmail.com.


Further details can be found at
https://www.cl.cam.ac.uk/research/nl/bea2019st/

--
Kind regards,
Ekaterina Kochmar


_______________________________________________
uai mailing list
uai@ENGR.ORST.EDU
https://secure.engr.oregonstate.edu/mailman/listinfo/uai

[UAI] 2nd CfP: BEA2019 GEC Shared Task – Training data released!

Reply via email to