What do people think of "complex lexical items", instead of "multiword expressions"?
Am Do., 9. Feb. 2023 um 08:54 Uhr schrieb Kilian Evang < [email protected]>: > Here is a response from Archna Bhatia whose messages don't seem to go > through for technical reasons: > > *From: *Archna Bhatia <[email protected]> > *Subject: **Re: [Corpora-List] Deadline extension: 19th Workshop on > Multiword Expressions (MWE 2023)* > *Date: *February 8, 2023 at 2:59:35 PM EST > *To: *Ada Wan <[email protected]> > *Cc: *Ken Litkowski <[email protected]>, [email protected] > > Hi Ada, > > While appropriate space is found for this discussion, let me respond to > just your first suggestion (for now): Why do you think they should be > renamed “fixed/idiomatic expressions”? What would your definition of > “fixed” and of “idiomatic” mean? How fixed would you say these expressions > would be? Is morphological variation allowed? Is variation in any of the > other linguistic aspects allowed? From my point of view, “fixed/idiomatic > expressions” results in a much restricted category than what all we > consider could be treated as multiwords. > > Thanks, > Archna > > > > Am Mi., 8. Feb. 2023 um 20:39 Uhr schrieb Ada Wan via Corpora < > [email protected]>: > >> Hi Ken >> >> Thanks for the message. Unfortunately, it looks like there has been no >> prior discussions on any of the topics I suggested, and the earliest post I >> can access dates back only to 22Nov2020. I can surely start a discussion, >> but that might look to be the first/only discussion on the list? (I went >> through all the conversations accessible thus far and only saw >> announcements.) >> >> Perhaps more importantly: >> as this seems to be an issue that could also affect other areas of >> concern to the general audience of the Corpora-List (*not just for >> MWEs/SIGLEX*), is there a way that we all can make some changes in the >> "language space" across the board? >> >> Thanks and best >> Ada >> >> >> On Wed, Feb 8, 2023 at 5:57 PM Ken Litkowski <[email protected]> wrote: >> >>> Dear Ada, >>> >>> When I added the SIGLEX discussion code back in 2010, I did so with the >>> idea that we would have discussion of just like the topic of yours. The >>> morph of the discussion now is located on the Google group, via >>> https://groups.google.com/g/siglex-members. There, you will find a >>> place "Search conversations ..." where you can add your topic so that all >>> will be sent. Rather than just the announcements that are the mainly topics. >>> >>> Ken (webmaster retiree) >>> On 2/8/2023 10:18 AM, Ada Wan via Corpora wrote: >>> >>> Hi Kilian >>> >>> Hope all has been well. >>> >>> I'm surprised that people are still "wording around" nowadays. Some >>> suggestions: >>> >>> 1. Can't we rename "MWEs" to "fixed/idiomatic expressions" instead? One >>> can reformulate these as sequences/strings/expressions of various >>> lengths/vocabs in characters. >>> 2. Also, one can interpret these without information/association with >>> any syntactic categories, nouns or verbs etc.. >>> 3. They do just represent lexical info (some reflecting/encoding >>> historico-social habits, though one also should be aware of the ethical >>> aspects of reinforcing some "traditional values"). Perhaps a more >>> sophisticated view of language could help wean practitioners from a >>> mindframe that relies of "linguistic structure(s)" as we've had it thus far >>> (i.e. based on "words" and "sentences")? >>> 4. Re " their meaning often does not result from the direct combination >>> of the meanings of their parts": non-compositionality may be a better >>> description of a more realistic view of language, it should prob be our >>> default expectation (instead of the cherry-picked compositional >>> counterparts). >>> >>> I think efforts towards mitigating a mental dependency on "words" would >>> be a good direction to pursue, what do you think? >>> Can we get SIGLEX to update in this regard? >>> >>> Best >>> Ada >>> >>> >>> On Wed, Feb 8, 2023 at 11:12 AM Kilian Evang via Corpora < >>> [email protected]> wrote: >>> >>>> [Apologies for cross-postings] >>>> >>>> >>>> ******************************************************************************** >>>> >>>> Call for Papers: Deadline extended >>>> >>>> 19th Workshop on Multiword Expressions (MWE 2023) >>>> >>>> Organized and sponsored by SIGLEX, the Special Interest Group >>>> on the Lexicon of the ACL >>>> >>>> Full-day workshop collocated with EACL 2023, Dubrovnik, Croatia, May 5 >>>> or 6, 2023 >>>> >>>> Hybrid (on-site & on-line) >>>> >>>> NEW: Submission deadline: February 20, 2023 >>>> >>>> NEW: Invited speakers announced (see below) >>>> >>>> NEW: Best paper award (see below) >>>> >>>> MWE 2023 website: https://multiword.org/mwe2023/ >>>> >>>> >>>> ******************************************************************************** >>>> >>>> Multiword expressions (MWEs) are word combinations that exhibit >>>> lexical, syntactic, semantic, pragmatic, and/or statistical >>>> idiosyncrasies (Baldwin & Kim 2010), such as by and large, hot dog, >>>> pay a visit and pull one's leg. The notion encompasses closely related >>>> phenomena: idioms, compounds, light-verb constructions, phrasal verbs, >>>> rhetorical figures, collocations, institutionalised phrases, etc. >>>> Their behaviour is often unpredictable; for example, their meaning >>>> often does not result from the direct combination of the meanings of >>>> their parts. Given their irregular nature, MWEs often pose complex >>>> problems in linguistic modelling (e.g. annotation), NLP tasks (e.g. >>>> parsing), and end-user applications (e.g. natural language >>>> understanding and MT), hence still representing an open issue for >>>> computational linguistics (Constant et al. 2017). >>>> >>>> For almost two decades, modelling and processing MWEs for NLP has been >>>> the topic of the MWE workshop organised by the MWE section of SIGLEX >>>> in conjunction with major NLP conferences since 2003. Impressive >>>> progress has been made in the field, but our understanding of MWEs >>>> still requires much research considering their need and usefulness in >>>> NLP applications. This is also relevant to domain-specific NLP >>>> pipelines that need to tackle terminologies most often realised as >>>> MWEs. Following previous years, for this 19th edition of the workshop, >>>> we identified the following topics on which contributions are >>>> particularly encouraged: >>>> >>>> MWE processing and identification in specialized languages and >>>> domains: Multiword terminology extraction from domain-specific corpora >>>> (Bonin et al. 2010) is of particular importance to various >>>> applications, such as MT (Semmar & Laib, 2017), or for the >>>> identification and monitoring of neologisms and technical jargon >>>> (Chatzitheodorou et al, 2021). We expect approaches that deal with >>>> the processing of MWEs as well as the processing of terminology in >>>> specialised domains can benefit from each other. >>>> >>>> MWE processing to enhance end-user applications: MWEs have gained >>>> particular attention in end-user applications, including MT (Zaninello >>>> & Birch 2020; Han et al. 2021, 2022), simplification (Kochmar et al. >>>> 2020), language learning and assessment (Paquot et al. 2019; >>>> Christiansen & Arnon 2017), social media mining (Maisto et al. 2017), >>>> and abusive language detection (Zampieri et al. 2020; Caselli et al. >>>> 2020). We believe that it is crucial to extend and deepen these first >>>> attempts to integrate and evaluate MWE technology in these and further >>>> end-user applications. >>>> >>>> MWE identification and interpretation in pre-trained language models: >>>> Most current MWE processing is limited to their identification and >>>> detection using pre-trained language models, but we still lack >>>> understanding about how MWEs are represented and dealt with therein >>>> (Nedumpozhimana & Kelleher 2021; Garcia et al. 2021, Fakharian & Cook >>>> 2021), how to better model the compositionality of MWEs from semantics >>>> (Moreau et al. 2018). Now that NLP has shifted towards end-to-end >>>> neural models like BERT, capable of solving complex tasks with little >>>> or no intermediary linguistic symbols, questions arise about the >>>> extent to which MWEs should be implicitly or explicitly modelled >>>> (Shwartz & Dagan, 2019). >>>> >>>> MWE processing in low-resource languages: The PARSEME shared tasks >>>> (Ramisch et al. 2020; 2018; Savary et al. 2017), among others, have >>>> fostered significant progress in MWE identification, providing >>>> datasets that include low-resource languages, evaluation measures, and >>>> tools that now allow fully integrating MWE identification into >>>> end-user applications. A few efforts have recently explored methods >>>> for the automatic interpretation of MWEs (Bhatia, et al. 2018; 2017), >>>> and their processing in low-resource languages (Liu & Wang 2020; Kumar >>>> et al. 2017). Resource creation and sharing should be pursued in >>>> parallel with the development of methods able to capitalize on small >>>> datasets (Han et al. 2020). >>>> >>>> Through this workshop, we would like to bring together and encourage >>>> researchers in various NLP subfields to submit MWE-related research, >>>> so that approaches that deal with processing of MWEs including >>>> processing for low-resource languages and for various applications can >>>> benefit from each other. We also intend to consolidate the converging >>>> effects of previous joint workshops LAW-MWE-CxG 2018, MWE-WN 2019 and >>>> MWE-LEX 2020, the joint MWE-WOAH panel in 2021, and the MWE-SIGUL 2022 >>>> joint session, extending our scope to MWEs in e-lexicons and WordNets, >>>> MWE annotation, as well as grammatical constructions. Correspondingly, >>>> we call for papers on research related (but not limited) to MWEs and >>>> constructions in: >>>> >>>> Computationally-applicable theoretical work in psycholinguistics and >>>> corpus linguistics; >>>> >>>> Annotation (expert, crowdsourcing, automatic) and representation in >>>> resources such as corpora, treebanks, e-lexicons, and WordNets (also >>>> for low-resource languages); >>>> >>>> Processing in syntactic and semantic frameworks (e.g. CCG, CxG, HPSG, >>>> LFG, TAG, UD, etc.); >>>> >>>> Discovery and identification methods, including for specialized >>>> languages and domains such as clinical or biomedical NLP; >>>> >>>> Interpretation of MWEs and understanding of text containing them; >>>> >>>> Language acquisition, language learning, and non-standard language >>>> (e.g. tweets, speech); >>>> >>>> Evaluation of annotation and processing techniques; >>>> >>>> Retrospective comparative analyses from the PARSEME shared tasks; >>>> >>>> Processing for end-user applications (e.g. MT, NLU, summarisation, >>>> language learning, etc.); >>>> >>>> Implicit and explicit representation in pre-trained language models >>>> and end-user applications; >>>> >>>> Evaluation and probing of pre-trained language models; >>>> >>>> Resources and tools (e.g. lexicons, identifiers) and their integration >>>> into end-user applications; >>>> >>>> Multiword terminology extraction; >>>> >>>> Adaptation and transfer of annotations and related resources to new >>>> languages and domains including low-resource ones. >>>> >>>> >>>> Shared Task >>>> >>>> We do not have a shared task this year, but a new release of the >>>> PARSEME corpus of verbal MWEs is currently underway. We encourage >>>> submission of research papers that include analyses of the new edition >>>> of the PARSEME data and improvements over the results for PARSEME 2020 >>>> shared task as well as SemEval 2022 task 2 on idiomaticity prediction. >>>> >>>> >>>> *** Special Track on MWEs in Clinical NLP *** >>>> >>>> Pursuing the MWE Section’s tradition of synergies with other >>>> communities, this year, we are organizing a joint session with the >>>> Clinical NLP workshop for shared papers/poster presentations. Since >>>> clinical texts contain an important amount of multiword expressions >>>> (e.g. medical terms or domain-specific collocations), a joint session >>>> is deemed beneficial for both communities. The goal is to foster >>>> future synergies that could address scientific challenges in the >>>> creation of resources, models and applications to deal with multiword >>>> expressions and related phenomena in the specialised domain of >>>> ClinicalNLP. Submissions describing research on MWEs in the >>>> specialized domain of ClinicalNLP, especially introducing new datasets >>>> or new tools and resources, are welcome. Papers accepted in this track >>>> will have the option to present their work in the Clinical NLP >>>> workshop at ACL 2023 as well, after being presented at MWE 2023. >>>> >>>> >>>> Invited Speakers >>>> >>>> We are looking forward to invited talks by two amazing speakers: >>>> >>>> Leo Wanner, Universitat Pompeu Fabra >>>> >>>> TBD >>>> >>>> >>>> Best paper award >>>> >>>> All full papers in the workshop will be considered by the program >>>> committee for a best paper award. The decision will be announced in >>>> the closing session. >>>> >>>> >>>> Submission formats >>>> >>>> The workshop invites two types of submissions: >>>> >>>> archival submissions that present substantially original research in >>>> both long paper format (8 pages + references) and short paper format >>>> (4 pages + references). >>>> >>>> non-archival submissions of abstracts describing relevant research >>>> presented/published elsewhere which will not be included in the MWE >>>> proceedings. >>>> >>>> >>>> Paper submission and templates >>>> >>>> Papers should be submitted via the workshop's START submission page >>>> (https://softconf.com/eacl2023/mwe2023/). Please choose the >>>> appropriate submission format (archival/non-archival). Archival papers >>>> with existing reviews will also be accepted through the ACL Rolling >>>> Review. Submissions must follow the ACL 2023 stylesheet. >>>> >>>> >>>> Archival papers with existing reviews from ACL Rolling Review will >>>> also be considered. A paper may not be simultaneously under review >>>> through ARR and MWE. A paper that has or will receive reviews through >>>> ARR may not be submitted for review to MWE. >>>> >>>> >>>> Important Dates >>>> >>>> Paper submission: February 20, 2023 >>>> >>>> ARR paper commitment: March 6, 2023 >>>> >>>> Notification of acceptance: March 13, 2023 >>>> >>>> Camera-ready papers due: March 27, 2023 >>>> >>>> Workshop: May 5 or 6, 2023 >>>> >>>> >>>> All deadlines are at 23:59 UTC-12 (Anywhere on Earth). >>>> >>>> >>>> Organizing Committee >>>> >>>> Program chairs: Marcos Garcia, Voula Giouli, Lifeng Han, Shiva >>>> Taslimipoor >>>> >>>> Publication chair: Archna Bhatia >>>> >>>> Publicity chair: Kilian Evang >>>> >>>> >>>> Anti-harassment policy >>>> >>>> The workshop follows the ACL anti-harassment policy. >>>> >>>> >>>> Contact >>>> >>>> For any inquiries regarding the workshop, please send an email to the >>>> Organizing Committee at [email protected]. >>>> _______________________________________________ >>>> Corpora mailing list -- [email protected] >>>> https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/ >>>> To unsubscribe send an email to [email protected] >>>> >>> >>> _______________________________________________ >>> Corpora mailing list -- >>> [email protected]https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/ >>> To unsubscribe send an email to [email protected] >>> >>> -- >>> Ken Litkowski TEL.: 301-482-0237 >>> CL Research EMAIL: [email protected] >>> 9208 Gue Road Home Page: http://www.clres.com >>> Damascus, MD 20872-1025 USA Blog: http://www.clres.com/blog >>> >>> _______________________________________________ >> Corpora mailing list -- [email protected] >> https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/ >> To unsubscribe send an email to [email protected] >> >
_______________________________________________ Corpora mailing list -- [email protected] https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/ To unsubscribe send an email to [email protected]
