> > CASE-2023 Shared Task - Task 2: Collecting and Geocoding Armed Clash > Events in Russo-Ukrainian Conflict > > ================================================ > > The unprecedented quantity of easily accessible data on social, political, > and economic processes offers ground-breaking potential in guiding > data-driven analysis of socio political phenomena: Armed conflicts, > political movements, fights for economic and social rights, and various > related socio-political happenings are reported in news articles and social > media posts and recorded in curated databases. On the other hand, automatic > event detection from texts and event geocoding has long been a challenge > for the natural language processing (NLP) community. It requires > sophisticated methods and resources, such as Machine Learning models, > linguistic rules and dictionaries, geographic gazetteers. > > Task definition > > The task Collecting and Geocoding Armed Clash Events in Russo-Ukrainian > Conflict is being held as a sub-task of the 6th Workshop on Challenges > and Applications of Automated Extraction of Socio-political Events from > Text (CASE 2023). The task will use data from the Russo-Ukrainian Conflict to > test the capabilities of event detection systems to extract, geocode and > de-duplicate armed clashes in news and social media postsл Evaluation > will be based on the correlation between the spatio-temporal > distribution and number of the extracted events and those which are in > the ground truth data set. > > We invite contributions from researchers in NLP, ML, Deep Learning, and > AI. The call is directed also towards socio-political scientists, > researchers in conflict analysis and forecasting, peace studies, and > computational social science. > > All participating teams will be able to publish their system description > paper in the workshop proceedings published by ACL. For more information on > the workshop, > > please visit the Workshop website https://emw.ku.edu.tr/case-2023/ > <https://emw.ku.edu.tr/case-2022/> and the conference website > https://ranlp.org/ranlp2023/. > > ================================================ > > 1. > > Data > > Gold Standard and Text Input Data for the participant systems for the time > range 24.02.2022-24.08.2022 has been prepared and will be shared with the > applicants on the Task website. > > 1.1 Training Data > > No training data are provided for this Task. The data utilized for CASE > 2023 Task 1, which is described in Hürriyetoğlu, A. et al. (2022, 2020b), > can be used for training systems for this task (Task 2). Additionally data > can be used to build systems/models that can detect protest events in > tweets and news articles. > > > 1.2 Input Data > > The participant systems will be evaluated on raw data collections > including Telegram messages, the New York Times and Ukrainian-Russian > official news channels. > > Namely, the data collections comprise: > > • English language social media massage and news corpus comprising. > > 48.007 Telegram Messages and The New York Times News about Ukraine. > > • Ukrainian language social media collection comprising > > 102.135 Telegram Messages and Ukraine News Agency News. > > • Russian language social media collection comprising > > 8.534 Telegram Message and Russian News Agency News > > Further details on the text collections and sampling methods are provided > in the folders news and Social Media of the github repo for the Task ( > https://github.com/zavavan/case2023_task2). > > > 1.3 Gold Standard Data > > The Russo-Ukrainian Conflict ground truth data primarily consists of data > coming from the Armed Conflict Location & Event Data Project (ACLED). We > will be adding alternative ground-truth datasets in order to prevent the > bias that may be introduced by using a single definition and interpretation > of an event. Full details on the manually curated data used as Gold > Standard for the correlation analysis will be disclosed at the end of the > evaluation period. Please check documentation on the folder gold_standard > of the Task github repo. > > ================================================ > > > > 1. > > Evaluation > > The systems which participate in this shared task will be required to > detect news articles and Telegram posts which contain description of > ongoing armed clashes. The time and place of each armed clash should be > detected at date level (regarding the time) and precise geographic > coordinates (latitude and longitude). The systems should ideally extract > event times, based on multiple text reports. > > In order to evaluate the ability of automatic event-coders to reproduce > the gold standard armed clash event dataset, we adapt two correlation > methods originally used in micro-level analysis of political violence by > Hammond and Weidmann (2014), based on aggregation of event counts uniform > grid geographical cells and 1-day time spans and apply a number of standard > correlation coefficients and error measures. > > For each of the input text corpora in1.2, each participant may submit up > to 3 different system responses. Each system response will consist of a csv > file with the following naming pattern: > > “submission.<team-name>.<corpus>.<response-number>.csv” > > where <corpus> is either “social_media” or “news”. > > For instance: “submission.MyTeam.news.3.csv” for the 3rd submission of > team “MyTeam” on the news corpus. > > Each system response file will have one line per event, where each line > will have the following format: > > <id>,<City>,<Region>,<Country>,<Date> > > where <id> is a numerical event identifier, <City>,<Region>,<Country> are > canonical English names of the City,State/Region and Country, respectively, > of the detected event location. While only the <country> attribute is > mandatory, systems are expected to assign a description of the event > location at the finest grained level possible, as otherwise geographical > coordinate conversion may penalize the correlation score on geographical > cell aggregation. <Date> is the assigned date of the event in the format > YYYY-MM-DD. > > A sample system response file line: > > 0,Kharkiv,Kharkiv Oblast,Ukraine,2022-05-02 > > A sample system output file can be downloaded from the Task repo at: > > > https://github.com/zavavan/case2023_task2/blob/main/submission.myteam.news.3.csv > > > Important Dates (AoE time) > > ================================================ > > It is optional to use Task 1 systems. Participants may also use their own > systems, which are developed independently of Task 1. > > Task 1 Training data available: May 1, 2023 > > Task 1 Test data available: May 15, 2023 > > Task 1 Evaluation period ends: June 30, 2023 > > Task 2 Sample Text archive is available: May 22, 2023 > > Task 2 Text archive for evaluation is available: July 1, 2023 > > Task 2 Evaluation period starts: July 1, 2023 > > Task 2 Evaluation period ends: July 24 > > System Description Paper submissions due: July 31, 2023 > > Notification to authors after review: August 7, 2023 > > Camera ready: August 25, 2023 > > Workshop period @ RANLP: Sep 7-8, 2023 > > > Organization > > ================================================ > > - > > Hristo Tanev (Joint Research Centre (JRC), European Commission, Italy) > - > > Onur Uca, Sociology (Sociology, Mersin University, Turkey) > - > > Vanni Zavarella (University of Cagliari, Italy) > - > > Ali Hürriyetoğlu (KNAW Humanities Cluster DHLab, the Netherlands) > > Please contact the organizers at [email protected] or > [email protected] for your questions. > > 5.References > > Jesse Hammond and Nils B Weidmann. Using machine-coded event data for the > micro-level study of political violence. Research & Politics, > 1(2):2053168014539924, 2014. > > Hürriyetoğlu, A., Mutlu, O., Duruşan, F., Uca, O,. Gürel, A.,S., > Radford, B., Dai, Y., Hettiarachchi, H., Stoehr, N., Nomoto, T., Slavcheva, > M., Vargas, F., Javid, A., Beyhan, F., Yörük, E. (2022). Extended > Multilingual Protest News Detection Shared Task1,CASE2021 and 2022. arXiv > preprint arXiv:2211.11360. Url: https://arxiv.org/abs/2211.11360 > > Hürriyetoğlu, A., Yörük, E., Yüret, D., Mutlu, O., Yoltar, Ç., Duruşan, > F., & Gürel, B. (2020b). Cross-context news corpus for protest events > related knowledge base construction. arXiv preprint arXiv:2008.00351. In > Automated Knowledge Base Construction (AKBC). URL: > https://www.akbc.ws/2020/papers/7NZkNhLCjp >
_______________________________________________ Corpora mailing list -- [email protected] https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/ To unsubscribe send an email to [email protected]
