CASE-2023 Shared Task - Task 2: Collecting and Geocoding Armed Clash Events
in Russo-Ukrainian  Conflict

================================================

The unprecedented quantity of easily accessible data on social, political,
and economic processes offers ground-breaking potential in guiding
data-driven analysis of socio political phenomena: Armed conflicts,
political movements, fights for economic and social rights, and various
related socio-political happenings are reported in news articles and social
media posts and recorded in curated databases. On the other hand, automatic
event detection from texts and  event geocoding has long been a challenge
for the natural language processing (NLP) community. It requires
sophisticated methods and resources, such as Machine Learning models,
linguistic rules and dictionaries, geographic gazetteers.

Task definition

The task Collecting and Geocoding Armed Clash Events in Russo-Ukrainian
Conflict  is being held as a sub-task of the 6th Workshop on Challenges and
Applications of Automated Extraction of Socio-political Events from Text
(CASE 2023). The task will use data from the Russo-Ukrainian  Conflict to
test the capabilities of event detection systems to extract, geocode and
de-duplicate armed clashes in news and social media postsл Evaluation will
be based on the  correlation between the spatio-temporal distribution and
number of the extracted events and those which are in the ground truth data
set.

We invite contributions from researchers in  NLP, ML, Deep Learning, and
AI. The call is directed also towards socio-political scientists,
researchers in conflict analysis and forecasting, peace studies, and
computational social science.

All participating teams will be able to publish their system description
paper in the workshop proceedings published by ACL. For more information on
the workshop,

please visit the Workshop website https://emw.ku.edu.tr/case-2023/
<https://emw.ku.edu.tr/case-2022/> and the conference website
https://ranlp.org/ranlp2023/.

================================================

   1.

   Data

Gold Standard and Text Input Data for the participant systems for the time
range 24.02.2022-24.08.2022 has been prepared and will be shared with the
applicants on the Task website.

1.1 Training Data

No training data are provided for this Task. The data utilized for CASE
2023 Task 1, which is described in Hürriyetoğlu, A. et al. (2022, 2020b),
can be used for training systems for this task (Task 2). Additionally data
can be used to build systems/models that can detect protest events in
tweets and news articles.


1.2 Input Data

The participant systems will be evaluated on raw data collections including
Telegram messages, the New York Times and Ukrainian-Russian official news
channels.

Namely, the data collections comprise:

• English  language social media massage and news corpus comprising.

48.007 Telegram Messages and The New York Times News about Ukraine.

• Ukrainian language social media collection comprising

102.135 Telegram Messages and Ukraine News Agency News.

• Russian language social media collection comprising

8.534 Telegram Message and Russian News Agency News

Further details on the text collections and sampling methods are provided
in the folders news and Social Media of the github repo for the Task (
https://github.com/zavavan/case2023_task2).


1.3  Gold Standard Data

The Russo-Ukrainian Conflict ground truth data primarily consists of data
coming from the Armed Conflict Location & Event Data Project (ACLED). We
will be adding alternative ground-truth datasets in order to prevent the
bias that may be introduced by using a single definition and interpretation
of an event. Full details on the manually curated data used as Gold
Standard for the correlation analysis will be disclosed at the end of the
evaluation period. Please check documentation on the folder gold_standard
of the Task github repo.

================================================



   1.

    Evaluation

The systems which participate in this shared task will be required to
detect news articles and Telegram posts which contain description of
ongoing armed clashes. The time and place of each armed clash should be
detected at date level (regarding the time) and precise geographic
coordinates (latitude and longitude). The systems should ideally extract
event times, based on multiple text reports.

In order to evaluate the ability of automatic event-coders to reproduce the
gold standard armed clash event dataset, we adapt two correlation methods
originally used in micro-level analysis of political violence by Hammond
and Weidmann (2014), based on aggregation of event counts uniform grid
geographical cells and 1-day time spans and apply a number of standard
correlation coefficients and error measures.

For each of the input text corpora in1.2, each participant may submit up to
3 different system responses. Each system response will consist of a csv
file with the following naming pattern:

“submission.<team-name>.<corpus>.<response-number>.csv”

where <corpus> is either “social_media” or “news”.

For instance: “submission.MyTeam.news.3.csv” for the 3rd submission of team
“MyTeam” on the news corpus.

Each system response file will have one line per event, where each line
will have the following format:

<id>,<City>,<Region>,<Country>,<Date>

where <id> is a numerical event identifier, <City>,<Region>,<Country> are
canonical English names of the City,State/Region and Country, respectively,
of the detected event location. While only the <country> attribute is
mandatory, systems are expected to assign a description of the event
location at the finest grained level possible, as otherwise geographical
coordinate conversion may penalize the correlation score on geographical
cell aggregation. <Date> is the assigned date of the event in the format
YYYY-MM-DD.

A sample system response file line:

0,Kharkiv,Kharkiv Oblast,Ukraine,2022-05-02

A sample system output file can be downloaded from the Task repo at:

https://github.com/zavavan/case2023_task2/blob/main/submission.myteam.news.3.csv


Important Dates (AoE time)

================================================

It is optional to use Task 1 systems. Participants may also use their own
systems, which are developed independently of Task 1.

Task 1 Training data available: May 1, 2023

Task 1 Test data available: May 15, 2023

Task 1 Evaluation period ends: June 30, 2023

Task 2 Sample Text  archive is available: May 22, 2023

Task 2 Text archive for evaluation is available: July 1, 2023

Task 2 Evaluation period starts: July 1, 2023

Task 2 Evaluation period  ends: July 24

System Description Paper submissions due: July 31, 2023

Notification to authors after review: August 7, 2023

Camera ready: August 25, 2023

Workshop period @ RANLP: Sep 7-8, 2023


Organization

================================================

   -

   Hristo Tanev (Joint Research Centre (JRC), European Commission, Italy)
   -

   Onur Uca, Sociology (Sociology, Mersin University, Turkey)
   -

   Vanni Zavarella (University of Cagliari, Italy)
   -

   Ali Hürriyetoğlu (KNAW Humanities Cluster DHLab, the Netherlands)

Please contact the organizers at  [email protected] or
[email protected]  for your questions.

5.References

Jesse Hammond and Nils B Weidmann. Using machine-coded event data for the
micro-level study of political violence. Research & Politics,
1(2):2053168014539924, 2014.

Hürriyetoğlu, A., Mutlu, O.,  Duruşan, F.,  Uca, O,.  Gürel, A.,S.,
Radford, B., Dai, Y., Hettiarachchi, H., Stoehr, N., Nomoto, T., Slavcheva,
M.,  Vargas, F.,  Javid, A.,  Beyhan, F., Yörük, E. (2022). Extended
Multilingual Protest News Detection Shared Task1,CASE2021 and 2022. arXiv
preprint arXiv:2211.11360. Url: https://arxiv.org/abs/2211.11360

Hürriyetoğlu, A., Yörük, E., Yüret, D., Mutlu, O., Yoltar, Ç., Duruşan, F.,
& Gürel, B. (2020b). Cross-context news corpus for protest events related
knowledge base construction. arXiv preprint arXiv:2008.00351. In Automated
Knowledge Base Construction (AKBC). URL:
https://www.akbc.ws/2020/papers/7NZkNhLCjp
_______________________________________________
Corpora mailing list -- [email protected]
https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/
To unsubscribe send an email to [email protected]

Reply via email to