Hi Kilian
Hope all has been well.
I'm surprised that people are still "wording around" nowadays. Some
suggestions:
1. Can't we rename "MWEs" to "fixed/idiomatic expressions" instead?
One can reformulate these as sequences/strings/expressions of various
lengths/vocabs in characters.
2. Also, one can interpret these without information/association with
any syntactic categories, nouns or verbs etc..
3. They do just represent lexical info (some reflecting/encoding
historico-social habits, though one also should be aware of the
ethical aspects of reinforcing some "traditional values"). Perhaps a
more sophisticated view of language could help wean practitioners from
a mindframe that relies of "linguistic structure(s)" as we've had it
thus far (i.e. based on "words" and "sentences")?
4. Re " their meaning often does not result from the direct
combination of the meanings of their parts": non-compositionality may
be a better description of a more realistic view of language, it
should prob be our default expectation (instead of the cherry-picked
compositional counterparts).
I think efforts towards mitigating a mental dependency on "words"
would be a good direction to pursue, what do you think?
Can we get SIGLEX to update in this regard?
Best
Ada
On Wed, Feb 8, 2023 at 11:12 AM Kilian Evang via Corpora
<[email protected]> wrote:
[Apologies for cross-postings]
********************************************************************************
Call for Papers: Deadline extended
19th Workshop on Multiword Expressions (MWE 2023)
Organized and sponsored by SIGLEX, the Special Interest Group
on the Lexicon of the ACL
Full-day workshop collocated with EACL 2023, Dubrovnik, Croatia, May 5
or 6, 2023
Hybrid (on-site & on-line)
NEW: Submission deadline: February 20, 2023
NEW: Invited speakers announced (see below)
NEW: Best paper award (see below)
MWE 2023 website: https://multiword.org/mwe2023/
********************************************************************************
Multiword expressions (MWEs) are word combinations that exhibit
lexical, syntactic, semantic, pragmatic, and/or statistical
idiosyncrasies (Baldwin & Kim 2010), such as by and large, hot dog,
pay a visit and pull one's leg. The notion encompasses closely related
phenomena: idioms, compounds, light-verb constructions, phrasal verbs,
rhetorical figures, collocations, institutionalised phrases, etc.
Their behaviour is often unpredictable; for example, their meaning
often does not result from the direct combination of the meanings of
their parts. Given their irregular nature, MWEs often pose complex
problems in linguistic modelling (e.g. annotation), NLP tasks (e.g.
parsing), and end-user applications (e.g. natural language
understanding and MT), hence still representing an open issue for
computational linguistics (Constant et al. 2017).
For almost two decades, modelling and processing MWEs for NLP has been
the topic of the MWE workshop organised by the MWE section of SIGLEX
in conjunction with major NLP conferences since 2003. Impressive
progress has been made in the field, but our understanding of MWEs
still requires much research considering their need and usefulness in
NLP applications. This is also relevant to domain-specific NLP
pipelines that need to tackle terminologies most often realised as
MWEs. Following previous years, for this 19th edition of the workshop,
we identified the following topics on which contributions are
particularly encouraged:
MWE processing and identification in specialized languages and
domains: Multiword terminology extraction from domain-specific corpora
(Bonin et al. 2010) is of particular importance to various
applications, such as MT (Semmar & Laib, 2017), or for the
identification and monitoring of neologisms and technical jargon
(Chatzitheodorou et al, 2021). We expect approaches that deal with
the processing of MWEs as well as the processing of terminology in
specialised domains can benefit from each other.
MWE processing to enhance end-user applications: MWEs have gained
particular attention in end-user applications, including MT (Zaninello
& Birch 2020; Han et al. 2021, 2022), simplification (Kochmar et al.
2020), language learning and assessment (Paquot et al. 2019;
Christiansen & Arnon 2017), social media mining (Maisto et al. 2017),
and abusive language detection (Zampieri et al. 2020; Caselli et al.
2020). We believe that it is crucial to extend and deepen these first
attempts to integrate and evaluate MWE technology in these and further
end-user applications.
MWE identification and interpretation in pre-trained language models:
Most current MWE processing is limited to their identification and
detection using pre-trained language models, but we still lack
understanding about how MWEs are represented and dealt with therein
(Nedumpozhimana & Kelleher 2021; Garcia et al. 2021, Fakharian & Cook
2021), how to better model the compositionality of MWEs from semantics
(Moreau et al. 2018). Now that NLP has shifted towards end-to-end
neural models like BERT, capable of solving complex tasks with little
or no intermediary linguistic symbols, questions arise about the
extent to which MWEs should be implicitly or explicitly modelled
(Shwartz & Dagan, 2019).
MWE processing in low-resource languages: The PARSEME shared tasks
(Ramisch et al. 2020; 2018; Savary et al. 2017), among others, have
fostered significant progress in MWE identification, providing
datasets that include low-resource languages, evaluation measures, and
tools that now allow fully integrating MWE identification into
end-user applications. A few efforts have recently explored methods
for the automatic interpretation of MWEs (Bhatia, et al. 2018; 2017),
and their processing in low-resource languages (Liu & Wang 2020; Kumar
et al. 2017). Resource creation and sharing should be pursued in
parallel with the development of methods able to capitalize on small
datasets (Han et al. 2020).
Through this workshop, we would like to bring together and encourage
researchers in various NLP subfields to submit MWE-related research,
so that approaches that deal with processing of MWEs including
processing for low-resource languages and for various applications can
benefit from each other. We also intend to consolidate the converging
effects of previous joint workshops LAW-MWE-CxG 2018, MWE-WN 2019 and
MWE-LEX 2020, the joint MWE-WOAH panel in 2021, and the MWE-SIGUL 2022
joint session, extending our scope to MWEs in e-lexicons and WordNets,
MWE annotation, as well as grammatical constructions. Correspondingly,
we call for papers on research related (but not limited) to MWEs and
constructions in:
Computationally-applicable theoretical work in psycholinguistics and
corpus linguistics;
Annotation (expert, crowdsourcing, automatic) and representation in
resources such as corpora, treebanks, e-lexicons, and WordNets (also
for low-resource languages);
Processing in syntactic and semantic frameworks (e.g. CCG, CxG, HPSG,
LFG, TAG, UD, etc.);
Discovery and identification methods, including for specialized
languages and domains such as clinical or biomedical NLP;
Interpretation of MWEs and understanding of text containing them;
Language acquisition, language learning, and non-standard language
(e.g. tweets, speech);
Evaluation of annotation and processing techniques;
Retrospective comparative analyses from the PARSEME shared tasks;
Processing for end-user applications (e.g. MT, NLU, summarisation,
language learning, etc.);
Implicit and explicit representation in pre-trained language models
and end-user applications;
Evaluation and probing of pre-trained language models;
Resources and tools (e.g. lexicons, identifiers) and their integration
into end-user applications;
Multiword terminology extraction;
Adaptation and transfer of annotations and related resources to new
languages and domains including low-resource ones.
Shared Task
We do not have a shared task this year, but a new release of the
PARSEME corpus of verbal MWEs is currently underway. We encourage
submission of research papers that include analyses of the new edition
of the PARSEME data and improvements over the results for PARSEME 2020
shared task as well as SemEval 2022 task 2 on idiomaticity prediction.
*** Special Track on MWEs in Clinical NLP ***
Pursuing the MWE Section’s tradition of synergies with other
communities, this year, we are organizing a joint session with the
Clinical NLP workshop for shared papers/poster presentations. Since
clinical texts contain an important amount of multiword expressions
(e.g. medical terms or domain-specific collocations), a joint session
is deemed beneficial for both communities. The goal is to foster
future synergies that could address scientific challenges in the
creation of resources, models and applications to deal with multiword
expressions and related phenomena in the specialised domain of
ClinicalNLP. Submissions describing research on MWEs in the
specialized domain of ClinicalNLP, especially introducing new datasets
or new tools and resources, are welcome. Papers accepted in this track
will have the option to present their work in the Clinical NLP
workshop at ACL 2023 as well, after being presented at MWE 2023.
Invited Speakers
We are looking forward to invited talks by two amazing speakers:
Leo Wanner, Universitat Pompeu Fabra
TBD
Best paper award
All full papers in the workshop will be considered by the program
committee for a best paper award. The decision will be announced in
the closing session.
Submission formats
The workshop invites two types of submissions:
archival submissions that present substantially original research in
both long paper format (8 pages + references) and short paper format
(4 pages + references).
non-archival submissions of abstracts describing relevant research
presented/published elsewhere which will not be included in the MWE
proceedings.
Paper submission and templates
Papers should be submitted via the workshop's START submission page
(https://softconf.com/eacl2023/mwe2023/). Please choose the
appropriate submission format (archival/non-archival). Archival papers
with existing reviews will also be accepted through the ACL Rolling
Review. Submissions must follow the ACL 2023 stylesheet.
Archival papers with existing reviews from ACL Rolling Review will
also be considered. A paper may not be simultaneously under review
through ARR and MWE. A paper that has or will receive reviews through
ARR may not be submitted for review to MWE.
Important Dates
Paper submission: February 20, 2023
ARR paper commitment: March 6, 2023
Notification of acceptance: March 13, 2023
Camera-ready papers due: March 27, 2023
Workshop: May 5 or 6, 2023
All deadlines are at 23:59 UTC-12 (Anywhere on Earth).
Organizing Committee
Program chairs: Marcos Garcia, Voula Giouli, Lifeng Han, Shiva
Taslimipoor
Publication chair: Archna Bhatia
Publicity chair: Kilian Evang
Anti-harassment policy
The workshop follows the ACL anti-harassment policy.
Contact
For any inquiries regarding the workshop, please send an email to the
Organizing Committee at [email protected].
_______________________________________________
Corpora mailing list -- [email protected]
https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/
To unsubscribe send an email to [email protected]
_______________________________________________
Corpora mailing list [email protected]
https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/
To unsubscribe send an email [email protected]