What do people think of "complex lexical items", instead of "multiword
expressions"?

Am Do., 9. Feb. 2023 um 08:54 Uhr schrieb Kilian Evang <
[email protected]>:

> Here is a response from Archna Bhatia whose messages don't seem to go
> through for technical reasons:
>
> *From: *Archna Bhatia <[email protected]>
> *Subject: **Re: [Corpora-List] Deadline extension: 19th Workshop on
> Multiword Expressions (MWE 2023)*
> *Date: *February 8, 2023 at 2:59:35 PM EST
> *To: *Ada Wan <[email protected]>
> *Cc: *Ken Litkowski <[email protected]>, [email protected]
>
> Hi Ada,
>
> While appropriate space is found for this discussion, let me respond to
> just your first suggestion (for now): Why do you think they should be
> renamed “fixed/idiomatic expressions”? What would your definition of
> “fixed” and of “idiomatic” mean?  How fixed would you say these expressions
> would be? Is morphological variation allowed? Is variation in any of the
> other linguistic aspects allowed? From my point of view, “fixed/idiomatic
> expressions” results in a much restricted category than what all we
> consider could be treated as multiwords.
>
> Thanks,
> Archna
>
>
>
> Am Mi., 8. Feb. 2023 um 20:39 Uhr schrieb Ada Wan via Corpora <
> [email protected]>:
>
>> Hi Ken
>>
>> Thanks for the message. Unfortunately, it looks like there has been no
>> prior discussions on any of the topics I suggested, and the earliest post I
>> can access dates back only to 22Nov2020. I can surely start a discussion,
>> but that might look to be the first/only discussion on the list? (I went
>> through all the conversations accessible thus far and only saw
>> announcements.)
>>
>> Perhaps more importantly:
>> as this seems to be an issue that could also affect other areas of
>> concern to the general audience of the Corpora-List (*not just for
>> MWEs/SIGLEX*), is there a way that we all can make some changes in the
>> "language space" across the board?
>>
>> Thanks and best
>> Ada
>>
>>
>> On Wed, Feb 8, 2023 at 5:57 PM Ken Litkowski <[email protected]> wrote:
>>
>>> Dear Ada,
>>>
>>> When I added the SIGLEX discussion code back in 2010, I did so with the
>>> idea that we would have discussion of just like the topic of yours. The
>>> morph of the discussion now is located on the Google group, via
>>> https://groups.google.com/g/siglex-members. There, you will find a
>>> place "Search conversations ..." where you can add your topic so that all
>>> will be sent. Rather than just the announcements that are the mainly topics.
>>>
>>>     Ken (webmaster retiree)
>>> On 2/8/2023 10:18 AM, Ada Wan via Corpora wrote:
>>>
>>> Hi Kilian
>>>
>>> Hope all has been well.
>>>
>>> I'm surprised that people are still "wording around" nowadays. Some
>>> suggestions:
>>>
>>> 1. Can't we rename "MWEs" to "fixed/idiomatic expressions" instead? One
>>> can reformulate these as sequences/strings/expressions of various
>>> lengths/vocabs in characters.
>>> 2. Also, one can interpret these without information/association with
>>> any syntactic categories, nouns or verbs etc..
>>> 3. They do just represent lexical info (some reflecting/encoding
>>> historico-social habits, though one also should be aware of the ethical
>>> aspects of reinforcing some "traditional values"). Perhaps a more
>>> sophisticated view of language could help wean practitioners from a
>>> mindframe that relies of "linguistic structure(s)" as we've had it thus far
>>> (i.e. based on "words" and "sentences")?
>>> 4. Re " their meaning often does not result from the direct combination
>>> of the meanings of their parts": non-compositionality may be a better
>>> description of a more realistic view of language, it should prob be our
>>> default expectation (instead of the cherry-picked compositional
>>> counterparts).
>>>
>>> I think efforts towards mitigating a mental dependency on "words" would
>>> be a good direction to pursue, what do you think?
>>> Can we get SIGLEX to update in this regard?
>>>
>>> Best
>>> Ada
>>>
>>>
>>> On Wed, Feb 8, 2023 at 11:12 AM Kilian Evang via Corpora <
>>> [email protected]> wrote:
>>>
>>>> [Apologies for cross-postings]
>>>>
>>>>
>>>> ********************************************************************************
>>>>
>>>> Call for Papers: Deadline extended
>>>>
>>>> 19th Workshop on Multiword Expressions (MWE 2023)
>>>>
>>>> Organized and sponsored by SIGLEX, the Special Interest Group
>>>> on the Lexicon of the ACL
>>>>
>>>> Full-day workshop collocated with EACL 2023, Dubrovnik, Croatia, May 5
>>>> or 6, 2023
>>>>
>>>> Hybrid (on-site & on-line)
>>>>
>>>> NEW: Submission deadline: February 20, 2023
>>>>
>>>> NEW: Invited speakers announced (see below)
>>>>
>>>> NEW: Best paper award (see below)
>>>>
>>>> MWE 2023 website: https://multiword.org/mwe2023/
>>>>
>>>>
>>>> ********************************************************************************
>>>>
>>>> Multiword expressions (MWEs) are word combinations that exhibit
>>>> lexical, syntactic, semantic, pragmatic, and/or statistical
>>>> idiosyncrasies (Baldwin & Kim 2010), such as by and large, hot dog,
>>>> pay a visit and pull one's leg. The notion encompasses closely related
>>>> phenomena: idioms, compounds, light-verb constructions, phrasal verbs,
>>>> rhetorical figures, collocations, institutionalised phrases, etc.
>>>> Their behaviour is often unpredictable; for example, their meaning
>>>> often does not result from the direct combination of the meanings of
>>>> their parts. Given their irregular nature, MWEs often pose complex
>>>> problems in linguistic modelling (e.g. annotation), NLP tasks (e.g.
>>>> parsing), and end-user applications (e.g. natural language
>>>> understanding and MT), hence still representing an open issue for
>>>> computational linguistics (Constant et al. 2017).
>>>>
>>>> For almost two decades, modelling and processing MWEs for NLP has been
>>>> the topic of the MWE workshop organised by the MWE section of SIGLEX
>>>> in conjunction with major NLP conferences since 2003. Impressive
>>>> progress has been made in the field, but our understanding of MWEs
>>>> still requires much research considering their need and usefulness in
>>>> NLP applications. This is also relevant to domain-specific NLP
>>>> pipelines that need to tackle terminologies most often realised as
>>>> MWEs. Following previous years, for this 19th edition of the workshop,
>>>> we identified the following topics on which contributions are
>>>> particularly encouraged:
>>>>
>>>> MWE processing and identification in specialized languages and
>>>> domains: Multiword terminology extraction from domain-specific corpora
>>>> (Bonin et al. 2010) is of particular importance to various
>>>> applications, such as MT (Semmar & Laib, 2017), or for the
>>>> identification and monitoring of neologisms and technical jargon
>>>> (Chatzitheodorou et al, 2021).  We expect approaches that deal with
>>>> the processing of MWEs as well as the processing of terminology in
>>>> specialised domains can benefit from each other.
>>>>
>>>> MWE processing to enhance end-user applications: MWEs have gained
>>>> particular attention in end-user applications, including MT (Zaninello
>>>> & Birch 2020; Han et al. 2021, 2022), simplification (Kochmar et al.
>>>> 2020), language learning and assessment (Paquot et al. 2019;
>>>> Christiansen & Arnon 2017), social media mining (Maisto et al. 2017),
>>>> and abusive language detection (Zampieri et al. 2020; Caselli et al.
>>>> 2020). We believe that it is crucial to extend and deepen these first
>>>> attempts to integrate and evaluate MWE technology in these and further
>>>> end-user applications.
>>>>
>>>> MWE identification and interpretation in pre-trained language models:
>>>> Most current MWE processing is limited to their identification and
>>>> detection using pre-trained language models, but we still lack
>>>> understanding about how MWEs are represented and dealt with therein
>>>> (Nedumpozhimana & Kelleher 2021; Garcia et al. 2021, Fakharian & Cook
>>>> 2021), how to better model the compositionality of MWEs from semantics
>>>> (Moreau et al. 2018). Now that NLP has shifted towards end-to-end
>>>> neural models like BERT, capable of solving complex tasks with little
>>>> or no intermediary linguistic symbols, questions arise about the
>>>> extent to which MWEs should be implicitly or explicitly modelled
>>>> (Shwartz & Dagan, 2019).
>>>>
>>>> MWE processing in low-resource languages: The PARSEME shared tasks
>>>> (Ramisch et al. 2020; 2018; Savary et al. 2017), among others, have
>>>> fostered significant progress in MWE identification, providing
>>>> datasets that include low-resource languages, evaluation measures, and
>>>> tools that now allow fully integrating MWE identification into
>>>> end-user applications. A few efforts have recently explored methods
>>>> for the automatic interpretation of MWEs (Bhatia, et al. 2018; 2017),
>>>> and their processing in low-resource languages (Liu & Wang 2020; Kumar
>>>> et al. 2017). Resource creation and sharing should be pursued in
>>>> parallel with the development of methods able to capitalize on small
>>>> datasets (Han et al. 2020).
>>>>
>>>> Through this workshop, we would like to bring together and encourage
>>>> researchers in various NLP subfields to submit MWE-related research,
>>>> so that approaches that deal with processing of MWEs including
>>>> processing for low-resource languages and for various applications can
>>>> benefit from each other. We also intend to consolidate the converging
>>>> effects of previous joint workshops LAW-MWE-CxG 2018, MWE-WN 2019 and
>>>> MWE-LEX 2020, the joint MWE-WOAH panel in 2021, and the MWE-SIGUL 2022
>>>> joint session, extending our scope to MWEs in e-lexicons and WordNets,
>>>> MWE annotation, as well as grammatical constructions. Correspondingly,
>>>> we call for papers on research related (but not limited) to MWEs and
>>>> constructions in:
>>>>
>>>> Computationally-applicable theoretical work in psycholinguistics and
>>>> corpus linguistics;
>>>>
>>>> Annotation (expert, crowdsourcing, automatic) and representation in
>>>> resources such as corpora, treebanks, e-lexicons, and WordNets (also
>>>> for low-resource languages);
>>>>
>>>> Processing in syntactic and semantic frameworks (e.g. CCG, CxG, HPSG,
>>>> LFG, TAG, UD, etc.);
>>>>
>>>> Discovery and identification methods, including for specialized
>>>> languages and domains such as clinical or biomedical NLP;
>>>>
>>>> Interpretation of MWEs and understanding of text containing them;
>>>>
>>>> Language acquisition, language learning, and non-standard language
>>>> (e.g. tweets, speech);
>>>>
>>>> Evaluation of annotation and processing techniques;
>>>>
>>>> Retrospective comparative analyses from the PARSEME shared tasks;
>>>>
>>>> Processing for end-user applications (e.g. MT, NLU, summarisation,
>>>> language learning, etc.);
>>>>
>>>> Implicit and explicit representation in pre-trained language models
>>>> and end-user applications;
>>>>
>>>> Evaluation and probing of pre-trained language models;
>>>>
>>>> Resources and tools (e.g. lexicons, identifiers) and their integration
>>>> into end-user applications;
>>>>
>>>> Multiword terminology extraction;
>>>>
>>>> Adaptation and transfer of annotations and related resources to new
>>>> languages and domains including low-resource ones.
>>>>
>>>>
>>>> Shared Task
>>>>
>>>> We do not have a shared task this year, but a new release of the
>>>> PARSEME corpus of verbal MWEs is currently underway. We encourage
>>>> submission of research papers that include analyses of the new edition
>>>> of the PARSEME data and improvements over the results for PARSEME 2020
>>>> shared task as well as SemEval 2022 task 2 on idiomaticity prediction.
>>>>
>>>>
>>>> *** Special Track on MWEs in Clinical NLP ***
>>>>
>>>> Pursuing the MWE Section’s tradition of synergies with other
>>>> communities, this year, we are organizing a joint session with the
>>>> Clinical NLP workshop for shared papers/poster presentations. Since
>>>> clinical texts contain an important amount of multiword expressions
>>>> (e.g. medical terms or domain-specific collocations), a joint session
>>>> is deemed beneficial for both communities. The goal is to foster
>>>> future synergies that could address scientific challenges in the
>>>> creation of resources, models and applications to deal with multiword
>>>> expressions and related phenomena in the specialised domain of
>>>> ClinicalNLP. Submissions describing research on MWEs in the
>>>> specialized domain of ClinicalNLP, especially introducing new datasets
>>>> or new tools and resources, are welcome. Papers accepted in this track
>>>> will have the option to present their work in the Clinical NLP
>>>> workshop at ACL 2023 as well, after being presented at MWE 2023.
>>>>
>>>>
>>>> Invited Speakers
>>>>
>>>> We are looking forward to invited talks by two amazing speakers:
>>>>
>>>> Leo Wanner, Universitat Pompeu Fabra
>>>>
>>>> TBD
>>>>
>>>>
>>>> Best paper award
>>>>
>>>> All full papers in the workshop will be considered by the program
>>>> committee for a best paper award. The decision will be announced in
>>>> the closing session.
>>>>
>>>>
>>>> Submission formats
>>>>
>>>> The workshop invites  two types of submissions:
>>>>
>>>> archival submissions that present substantially original research in
>>>> both long paper format (8 pages + references) and short paper format
>>>> (4 pages + references).
>>>>
>>>> non-archival submissions of abstracts describing relevant research
>>>> presented/published elsewhere which will not be included in the MWE
>>>> proceedings.
>>>>
>>>>
>>>> Paper submission and templates
>>>>
>>>> Papers should be submitted via the workshop's START submission page
>>>> (https://softconf.com/eacl2023/mwe2023/). Please choose the
>>>> appropriate submission format (archival/non-archival). Archival papers
>>>> with existing reviews will also be accepted through the ACL Rolling
>>>> Review. Submissions must follow the ACL 2023 stylesheet.
>>>>
>>>>
>>>> Archival papers with existing reviews from ACL Rolling Review will
>>>> also be considered. A paper may not be simultaneously under review
>>>> through ARR and MWE. A paper that has or will receive reviews through
>>>> ARR may not be submitted for review to MWE.
>>>>
>>>>
>>>> Important Dates
>>>>
>>>> Paper submission: February 20, 2023
>>>>
>>>> ARR paper commitment: March 6, 2023
>>>>
>>>> Notification of acceptance: March 13, 2023
>>>>
>>>> Camera-ready papers due: March 27, 2023
>>>>
>>>> Workshop: May 5 or 6, 2023
>>>>
>>>>
>>>> All deadlines are at 23:59 UTC-12 (Anywhere on Earth).
>>>>
>>>>
>>>> Organizing Committee
>>>>
>>>> Program chairs: Marcos Garcia, Voula Giouli, Lifeng Han, Shiva
>>>> Taslimipoor
>>>>
>>>> Publication chair: Archna Bhatia
>>>>
>>>> Publicity chair: Kilian Evang
>>>>
>>>>
>>>> Anti-harassment policy
>>>>
>>>> The workshop follows the ACL anti-harassment policy.
>>>>
>>>>
>>>> Contact
>>>>
>>>> For any inquiries regarding the workshop, please send an email to the
>>>> Organizing Committee at [email protected].
>>>> _______________________________________________
>>>> Corpora mailing list -- [email protected]
>>>> https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/
>>>> To unsubscribe send an email to [email protected]
>>>>
>>>
>>> _______________________________________________
>>> Corpora mailing list -- 
>>> [email protected]https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/
>>> To unsubscribe send an email to [email protected]
>>>
>>> --
>>> Ken Litkowski                     TEL.: 301-482-0237
>>> CL Research                       EMAIL: [email protected]
>>> 9208 Gue Road                     Home Page: http://www.clres.com
>>> Damascus, MD 20872-1025 USA       Blog: http://www.clres.com/blog
>>>
>>> _______________________________________________
>> Corpora mailing list -- [email protected]
>> https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/
>> To unsubscribe send an email to [email protected]
>>
>
_______________________________________________
Corpora mailing list -- [email protected]
https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/
To unsubscribe send an email to [email protected]

Reply via email to