🚀 Call for Participation: DISRPT 2025 Shared Task on Discourse Relation Parsing 
and Treebanking. 
🛎️ training data has been released and the submission is now open! 
https://softconf.com/emnlp2025/disrpt2025/
In conjunction with CODI-CRAC & EMNLP 2025 - Suzhou, China, Nov. 5-9.
This year, we are organizing the fourth edition of the DISRPT shared task on 
discourse processing across formalisms, for a variety of languages and genres, 
with three subtasks:
 
* Task 1: Discourse segmentation
* Task 2: Connective identification
* Task 3: Relation classification
 
We will provide training, development and test datasets from (almost) all 
available languages  in RST / eRST, SDRT, PDTB, ISO 24617, and discourse 
dependencies, using a uniform format. Because different corpora, languages, and 
frameworks use different guidelines, the shared task will promote the design of 
flexible methods for dealing with various guidelines, and will help to push 
forward the discussion of converging standards for discourse units. We will 
evaluate segmentation and connective detection in two different scenarios: with 
and without gold syntax. An automatically parsed version is provided for all 
corpora without a gold parse. 
 
This year, the shared task will feature: 
 * The inclusion of more frameworks, with datasets from: RST / eRST, SDRT, 
PDTB, ISO 24617, and discourse dependencies * The inclusion of new corpora and 
new languages, some of them kept a surprise! * A unified set of labels for the 
discourse relations, to make easier the evaluation across datasets * A new 
constraint: only one multilingual model should be submitted per task, and it 
should be small (4B parameters max)! This will make our replication work 
easier, but more importantly, it will simplify using such a model and test the 
robustness of your solution. 
We’re excited to announce the release of the training data for the DISRPT 2025 
Shared Task! You can now access the data, format documentation, and tools on 
our GitHub 🔗 https://github.com/disrpt/sharedtask2025
The data covers five discourse frameworks — RST / eRST, PDTB, SDRT, and 
Discourse Dependencies — across 14 languages: Basque, Chinese, Czech, Dutch, 
English, Farsi, French, German, Italian, Portuguese, Russian, Spanish, Thai and 
Turkish Thai.
We invite researchers and teams interested in participating to register now. 
Registered participants will be added to our mailing list and receive all 
future updates.
📅 The full testing data will be released on July 14, 2025 — stay tuned!
To join the mailing list and stay informed, please email us at:
📧 [email protected] 
Let us know you're interested — we’d love to have you on board!
**Important dates**
 
 * May 16 2025 – Sample data release * June 17 2025 – Training data release 
[NOW] * July 14 2025 – Test data release * August 1 2025 – System + paper 
submissions due * September 12 2025 – Notification of acceptance * September 19 
2025 – Camera ready papers * November 8-9 2025 – CODI at EMNLP
All deadlines are 11.59 pm UTC -12h (AoE, "Anywhere on Earth").
 
**Information:**
 
Contact the organizers: [email protected] 
Official website: https://sites.google.com/view/disrpt2025/
​​​​​Google group for participants, please join us on: 
[email protected]
 
 
**Organization:**
 
Chloé Braud (CNRS - IRIT, University of Toulouse, France)
Chuyuan Li (University of British Columbia, Canada)
Janet Yang Liu (LMU Munich, Germany)
Philippe Muller (CNRS - University of Toulouse, France)
Amir Zeldes (Georgetown University, Washington DC, USA)
_______________________________________________
Corpora mailing list -- [email protected]
https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/
To unsubscribe send an email to [email protected]

Reply via email to