Hi Kilian Hope all has been well.
I'm surprised that people are still "wording around" nowadays. Some suggestions: 1. Can't we rename "MWEs" to "fixed/idiomatic expressions" instead? One can reformulate these as sequences/strings/expressions of various lengths/vocabs in characters. 2. Also, one can interpret these without information/association with any syntactic categories, nouns or verbs etc.. 3. They do just represent lexical info (some reflecting/encoding historico-social habits, though one also should be aware of the ethical aspects of reinforcing some "traditional values"). Perhaps a more sophisticated view of language could help wean practitioners from a mindframe that relies of "linguistic structure(s)" as we've had it thus far (i.e. based on "words" and "sentences")? 4. Re " their meaning often does not result from the direct combination of the meanings of their parts": non-compositionality may be a better description of a more realistic view of language, it should prob be our default expectation (instead of the cherry-picked compositional counterparts). I think efforts towards mitigating a mental dependency on "words" would be a good direction to pursue, what do you think? Can we get SIGLEX to update in this regard? Best Ada On Wed, Feb 8, 2023 at 11:12 AM Kilian Evang via Corpora < [email protected]> wrote: > [Apologies for cross-postings] > > > ******************************************************************************** > > Call for Papers: Deadline extended > > 19th Workshop on Multiword Expressions (MWE 2023) > > Organized and sponsored by SIGLEX, the Special Interest Group > on the Lexicon of the ACL > > Full-day workshop collocated with EACL 2023, Dubrovnik, Croatia, May 5 > or 6, 2023 > > Hybrid (on-site & on-line) > > NEW: Submission deadline: February 20, 2023 > > NEW: Invited speakers announced (see below) > > NEW: Best paper award (see below) > > MWE 2023 website: https://multiword.org/mwe2023/ > > > ******************************************************************************** > > Multiword expressions (MWEs) are word combinations that exhibit > lexical, syntactic, semantic, pragmatic, and/or statistical > idiosyncrasies (Baldwin & Kim 2010), such as by and large, hot dog, > pay a visit and pull one's leg. The notion encompasses closely related > phenomena: idioms, compounds, light-verb constructions, phrasal verbs, > rhetorical figures, collocations, institutionalised phrases, etc. > Their behaviour is often unpredictable; for example, their meaning > often does not result from the direct combination of the meanings of > their parts. Given their irregular nature, MWEs often pose complex > problems in linguistic modelling (e.g. annotation), NLP tasks (e.g. > parsing), and end-user applications (e.g. natural language > understanding and MT), hence still representing an open issue for > computational linguistics (Constant et al. 2017). > > For almost two decades, modelling and processing MWEs for NLP has been > the topic of the MWE workshop organised by the MWE section of SIGLEX > in conjunction with major NLP conferences since 2003. Impressive > progress has been made in the field, but our understanding of MWEs > still requires much research considering their need and usefulness in > NLP applications. This is also relevant to domain-specific NLP > pipelines that need to tackle terminologies most often realised as > MWEs. Following previous years, for this 19th edition of the workshop, > we identified the following topics on which contributions are > particularly encouraged: > > MWE processing and identification in specialized languages and > domains: Multiword terminology extraction from domain-specific corpora > (Bonin et al. 2010) is of particular importance to various > applications, such as MT (Semmar & Laib, 2017), or for the > identification and monitoring of neologisms and technical jargon > (Chatzitheodorou et al, 2021). We expect approaches that deal with > the processing of MWEs as well as the processing of terminology in > specialised domains can benefit from each other. > > MWE processing to enhance end-user applications: MWEs have gained > particular attention in end-user applications, including MT (Zaninello > & Birch 2020; Han et al. 2021, 2022), simplification (Kochmar et al. > 2020), language learning and assessment (Paquot et al. 2019; > Christiansen & Arnon 2017), social media mining (Maisto et al. 2017), > and abusive language detection (Zampieri et al. 2020; Caselli et al. > 2020). We believe that it is crucial to extend and deepen these first > attempts to integrate and evaluate MWE technology in these and further > end-user applications. > > MWE identification and interpretation in pre-trained language models: > Most current MWE processing is limited to their identification and > detection using pre-trained language models, but we still lack > understanding about how MWEs are represented and dealt with therein > (Nedumpozhimana & Kelleher 2021; Garcia et al. 2021, Fakharian & Cook > 2021), how to better model the compositionality of MWEs from semantics > (Moreau et al. 2018). Now that NLP has shifted towards end-to-end > neural models like BERT, capable of solving complex tasks with little > or no intermediary linguistic symbols, questions arise about the > extent to which MWEs should be implicitly or explicitly modelled > (Shwartz & Dagan, 2019). > > MWE processing in low-resource languages: The PARSEME shared tasks > (Ramisch et al. 2020; 2018; Savary et al. 2017), among others, have > fostered significant progress in MWE identification, providing > datasets that include low-resource languages, evaluation measures, and > tools that now allow fully integrating MWE identification into > end-user applications. A few efforts have recently explored methods > for the automatic interpretation of MWEs (Bhatia, et al. 2018; 2017), > and their processing in low-resource languages (Liu & Wang 2020; Kumar > et al. 2017). Resource creation and sharing should be pursued in > parallel with the development of methods able to capitalize on small > datasets (Han et al. 2020). > > Through this workshop, we would like to bring together and encourage > researchers in various NLP subfields to submit MWE-related research, > so that approaches that deal with processing of MWEs including > processing for low-resource languages and for various applications can > benefit from each other. We also intend to consolidate the converging > effects of previous joint workshops LAW-MWE-CxG 2018, MWE-WN 2019 and > MWE-LEX 2020, the joint MWE-WOAH panel in 2021, and the MWE-SIGUL 2022 > joint session, extending our scope to MWEs in e-lexicons and WordNets, > MWE annotation, as well as grammatical constructions. Correspondingly, > we call for papers on research related (but not limited) to MWEs and > constructions in: > > Computationally-applicable theoretical work in psycholinguistics and > corpus linguistics; > > Annotation (expert, crowdsourcing, automatic) and representation in > resources such as corpora, treebanks, e-lexicons, and WordNets (also > for low-resource languages); > > Processing in syntactic and semantic frameworks (e.g. CCG, CxG, HPSG, > LFG, TAG, UD, etc.); > > Discovery and identification methods, including for specialized > languages and domains such as clinical or biomedical NLP; > > Interpretation of MWEs and understanding of text containing them; > > Language acquisition, language learning, and non-standard language > (e.g. tweets, speech); > > Evaluation of annotation and processing techniques; > > Retrospective comparative analyses from the PARSEME shared tasks; > > Processing for end-user applications (e.g. MT, NLU, summarisation, > language learning, etc.); > > Implicit and explicit representation in pre-trained language models > and end-user applications; > > Evaluation and probing of pre-trained language models; > > Resources and tools (e.g. lexicons, identifiers) and their integration > into end-user applications; > > Multiword terminology extraction; > > Adaptation and transfer of annotations and related resources to new > languages and domains including low-resource ones. > > > Shared Task > > We do not have a shared task this year, but a new release of the > PARSEME corpus of verbal MWEs is currently underway. We encourage > submission of research papers that include analyses of the new edition > of the PARSEME data and improvements over the results for PARSEME 2020 > shared task as well as SemEval 2022 task 2 on idiomaticity prediction. > > > *** Special Track on MWEs in Clinical NLP *** > > Pursuing the MWE Section’s tradition of synergies with other > communities, this year, we are organizing a joint session with the > Clinical NLP workshop for shared papers/poster presentations. Since > clinical texts contain an important amount of multiword expressions > (e.g. medical terms or domain-specific collocations), a joint session > is deemed beneficial for both communities. The goal is to foster > future synergies that could address scientific challenges in the > creation of resources, models and applications to deal with multiword > expressions and related phenomena in the specialised domain of > ClinicalNLP. Submissions describing research on MWEs in the > specialized domain of ClinicalNLP, especially introducing new datasets > or new tools and resources, are welcome. Papers accepted in this track > will have the option to present their work in the Clinical NLP > workshop at ACL 2023 as well, after being presented at MWE 2023. > > > Invited Speakers > > We are looking forward to invited talks by two amazing speakers: > > Leo Wanner, Universitat Pompeu Fabra > > TBD > > > Best paper award > > All full papers in the workshop will be considered by the program > committee for a best paper award. The decision will be announced in > the closing session. > > > Submission formats > > The workshop invites two types of submissions: > > archival submissions that present substantially original research in > both long paper format (8 pages + references) and short paper format > (4 pages + references). > > non-archival submissions of abstracts describing relevant research > presented/published elsewhere which will not be included in the MWE > proceedings. > > > Paper submission and templates > > Papers should be submitted via the workshop's START submission page > (https://softconf.com/eacl2023/mwe2023/). Please choose the > appropriate submission format (archival/non-archival). Archival papers > with existing reviews will also be accepted through the ACL Rolling > Review. Submissions must follow the ACL 2023 stylesheet. > > > Archival papers with existing reviews from ACL Rolling Review will > also be considered. A paper may not be simultaneously under review > through ARR and MWE. A paper that has or will receive reviews through > ARR may not be submitted for review to MWE. > > > Important Dates > > Paper submission: February 20, 2023 > > ARR paper commitment: March 6, 2023 > > Notification of acceptance: March 13, 2023 > > Camera-ready papers due: March 27, 2023 > > Workshop: May 5 or 6, 2023 > > > All deadlines are at 23:59 UTC-12 (Anywhere on Earth). > > > Organizing Committee > > Program chairs: Marcos Garcia, Voula Giouli, Lifeng Han, Shiva Taslimipoor > > Publication chair: Archna Bhatia > > Publicity chair: Kilian Evang > > > Anti-harassment policy > > The workshop follows the ACL anti-harassment policy. > > > Contact > > For any inquiries regarding the workshop, please send an email to the > Organizing Committee at [email protected]. > _______________________________________________ > Corpora mailing list -- [email protected] > https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/ > To unsubscribe send an email to [email protected] >
_______________________________________________ Corpora mailing list -- [email protected] https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/ To unsubscribe send an email to [email protected]
