Dear all Thanks for your replies. I understand that lemmatization might have been a historical practice for many linguistic and literary studies. What can be done to translate and transition from such analytical methodology/framework as well as infrastructural settings to enable working with texts in a more general fashion (e.g. through simple character matching), while preserving all the rights to and availability of data?
Thanks and best Ada On Tue, Oct 10, 2023 at 7:45 PM James Tauber <[email protected]> wrote: > Related to "The Bridge" is my own Greek Learner Texts Project > https://greek-learner-texts.org which relies heavily on lemmatization for > building vocabulary lists. > > At the Perseus Digital Library https://scaife.perseus.org , we also make > extensive use of lemmatization of texts to link to dictionaries, etc. > > James > > On Tue, Oct 10, 2023 at 1:11 PM Hugh Paterson III via Corpora < > [email protected]> wrote: > >> Hi Ada good to hear from you, >> >> The project is called: "The Bridge". https://bridge.haverford.edu/ >> I am not the PI. The project has been in existence for about 12 years. >> I was invited to become involved through my Drexel LEADING Fellowship. >> Here is a paper we published this summer: >> https://hughandbecky.us/Hugh-CV/publication/2023-bridging-corpora/4LR_pre_print.pdf >> >> The Bridge is a linked data application supporting curriculum >> development. It was developed with Latin in mind, but has been extended to >> Greek as well. It quickly helps instructors and students find new >> vocabulary words in newly assigned texts, based on texts they have already >> encountered in their curriculum. >> >> The current workflow takes a variety of texts from several sources and >> then stores the lemmas for comparison across texts and broad stats >> generation. I see value in modeling the whole text not just the lemmas as >> this may allow future services. So, while NIF could model the whole text, >> the current operational activities really only involve using lemmas. To >> move forward in a linked data model we need to support current operations. >> More broadly, I see the lemmas as an "annotation" or abstraction layer >> whereas I would see the actual content of texts as the "source data". Using >> linked data and lemmas allows the bridge to connect via lemmas to LiLa >> data. https://lila-erc.eu/ >> >> Kind regards, >> Hugh >> >> >> >> On Tue, Oct 10, 2023 at 3:39 AM Ada Wan <[email protected]> wrote: >> >>> Dear Hugh >>> >>> What project are you working on that still requires lemmatization? Would >>> it not be a better approach to use (sub-)character n-grams (esp. if you are >>> doing textual analysis/interpretation, vs. processing which can be >>> byte-based) to decipher what segments would occur most frequently first and >>> (re-)analyze from there? >>> I understand there has been a habit in the "language space" to call >>> certain segments "lemmata". I am curious to know what one can do as a >>> community, though, to transition to more general methods (and >>> interpretations on "language"). >>> >>> Thanks and best >>> Ada >>> >>> >>> On Tue, Oct 10, 2023 at 12:15 AM Hugh Paterson III via Corpora < >>> [email protected]> wrote: >>> >>>> Greetings, >>>> >>>> I am working on a project which is using lemmatization. I'm wondering >>>> how people have approached combining NIF and lemmatization. are there any >>>> "blessed" extensions or ontologies? >>>> I'm not seeing nif:lemma as an option within the nif ontology... though >>>> I am likely missing something. >>>> >>>> Kind regards, >>>> - Hugh >>>> _______________________________________________ >>>> Corpora mailing list -- [email protected] >>>> https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/ >>>> To unsubscribe send an email to [email protected] >>>> >>> _______________________________________________ >> Corpora mailing list -- [email protected] >> https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/ >> To unsubscribe send an email to [email protected] >> >
_______________________________________________ Corpora mailing list -- [email protected] https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/ To unsubscribe send an email to [email protected]
