[Corpora-List] First CfP: Workshop on RESOURCEs and representations For Under-resourced Languages and domains (RESOURCEFUL-2023)

Simon Dobnik via Corpora Wed, 25 Jan 2023 07:48:33 -0800

[apologies for x-posting]

Call for Papers and Extended Abstracts


Workshop on RESOURCEs and representations For Under-resourced Languages and 
domains (RESOURCEFUL-2023)
collocated with the 24th Nordic Conference on Computational Linguistics 
(NoDaLiDa)
Norðurlandahúsið - The Nordic House in Tórshavn, Faroe Islands
22nd May 2023

https://resourceful-workshop.github.io/resourceful-2023/



Important dates:

- Submission deadline (both papers and abstracts): 28th March 2023
- Notification of acceptance: 25th April 2023
- Camera-ready version: 9th May 2023
- Workshop date: 22nd May 2023

All deadlines are 11:59PM UTC-12:00 ("anywhere on Earth").



Workshop description

The second workshop on resources and representations for under-resourced 
language and domains (RESOURCEFUL-2023) explores the role of the kind and the 
quality of resources that are available to us and challenges and directions for 
constructing new resources in light of the latest trends in natural language 
processing.

Data-driven machine-learning techniques in natural language processing have 
achieved remarkable performance (e.g., BERT, GPT, ChatGPT) but in order to do 
so large quantities of quality data (which is mostly text) is required. 
Interpretability studies of large language models in both text-only and 
multi-modal setups have revealed that even in cases where large text datasets 
are available, the models still do not cover all the contexts of human social 
activity and are prone to capturing unwanted bias where data is focused towards 
only some contexts. A question has also been raised whether textual data is 
enough to capture semantics of natural language processing and other modalities 
such as visual representations or a situated context of a robot might be 
required. Annotator-based resources have been constructed over years based on 
theoretical work in linguistics, psychology and related fields and a large 
amount of work has been done both theoretically and practically.

The purpose of the workshop is to initiate a discussion between the two 
communities involved in building resources (data vs annotation-based) and 
exploring their synergies for the new challenges in natural language 
processing. We encourage contributions in the areas of resource creation, 
representation learning and interpretability in data-driven and expert-driven 
machine learning setups and both uni-modal and multi-modal scenarios.

In particular we would like to open a forum by bringing together students, 
researchers, and experts to address and discuss the following questions:

- What is relevant linguistic knowledge the models should capture and how can 
this knowledge be sampled and extracted in practice?
- What kind of linguistic knowledge do we want and can capture in different 
contexts and tasks?
- To what degree are resources that have been traditionally aimed at rule-based 
natural language processing approaches relevant today both for machine learning 
techniques and hybrid approaches?
- How can they be adapted for data-driven approaches?
- To what degree data-driven approaches can be used to facilitate expert-driven 
annotation?
- What are current challenges for expert-based annotation?
- How can crowd-sourcing and citizen science be used in building resources?
- How can we evaluate and reduce unwanted biases?

Intended participants are researchers, PhD students and practitioners from 
diverse backgrounds (linguistics, psychology, computational linguistics, 
speech, computer science, machine learning, computer vision etc). We foresee an 
interactive workshop with plenty of time for discussion, complemented with 
invited talks and presentations of on-going or completed research.

This workshop is a continuation of the first workshop on resources and 
representations for under-resourced languages and domains held together with 
the SLTC 2020, https://gu-clasp.github.io/resourceful-2020/.



Submission

We invite submissions of both long (8 pages) and short papers (4 pages) with 
any number of pages for references. All submissions must follow the NoDaLida 
template, available in both LaTeX and MS Word, the templates are available at 
the official conference website, 
https://www.nodalida2023.fo/authorkit-nodalida23 Submissions must be anonymous 
and submitted in the PDF format through OpenReview.

We also invite submissions of maximum 2-page extended non-anonymous abstracts 
with any number of pages for references describing work in progress, negative 
results and opinion pieces. Papers related to our theme and already presented 
at other venues or have already been published elsewhere will be considered for 
acceptance for presentation as well. The abstracts, which should follow the 
same formatting templates as the archival track, will be reviewed by the 
workshop organisers and the accepted ones will be posted on the workshop 
website.



Workshop organisers

Dana Dannélls, Språkbanken Text, University of Gothenburg
Simon Dobnik, CLASP, University of Gothenburg
Adam Ek, CLASP, University of Gothenburg
Stella Frank, University of Copenhagen
Nikolai Ilinykh, CLASP, University of Gothenburg
Beáta Megyesi, Uppsala University
Felix Morger, Språkbanken Text, University of Gothenburg
Joakim Nivre, RISE and Uppsala University
Magnus Sahlgren, AI Sweden
Sara Stymne, Uppsala University
Jörg Tiedemann, University of Helsinki
Lilja Øvrelid, University of Oslo

[email protected]
_______________________________________________
Corpora mailing list -- [email protected]
https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/
To unsubscribe send an email to [email protected]

[Corpora-List] First CfP: Workshop on RESOURCEs and representations For Under-resourced Languages and domains (RESOURCEFUL-2023)

Reply via email to