Dear all

Thanks for your replies.
I understand that lemmatization might have been a historical practice for
many linguistic and literary studies. What can be done to translate and
transition from such analytical methodology/framework as well as
infrastructural settings to enable working with texts in a more general
fashion (e.g. through simple character matching), while preserving all the
rights to and availability of data?

Thanks and best
Ada


On Tue, Oct 10, 2023 at 7:45 PM James Tauber <[email protected]> wrote:

> Related to "The Bridge" is my own Greek Learner Texts Project
> https://greek-learner-texts.org which relies heavily on lemmatization for
> building vocabulary lists.
>
> At the Perseus Digital Library https://scaife.perseus.org , we also make
> extensive use of lemmatization of texts to link to dictionaries, etc.
>
> James
>
> On Tue, Oct 10, 2023 at 1:11 PM Hugh Paterson III via Corpora <
> [email protected]> wrote:
>
>> Hi Ada good to hear from you,
>>
>> The project is called: "The Bridge". https://bridge.haverford.edu/
>> I am not the PI. The project has been in existence for about 12 years.
>>  I was invited to become involved through my Drexel LEADING Fellowship.
>> Here is a paper we published this summer:
>> https://hughandbecky.us/Hugh-CV/publication/2023-bridging-corpora/4LR_pre_print.pdf
>>
>> The Bridge is a linked data application supporting curriculum
>> development. It was developed with Latin in mind, but has been extended to
>> Greek as well. It quickly helps instructors and students find new
>> vocabulary words in newly assigned texts, based on texts they have already
>> encountered in their curriculum.
>>
>> The current workflow takes a variety of texts from several sources and
>> then stores the lemmas for comparison across texts and broad stats
>> generation. I see value in modeling the whole text not just the lemmas as
>> this may allow future services. So, while NIF could model the whole text,
>> the current operational activities really only involve using lemmas. To
>> move forward in a linked data model we need to support current operations.
>> More broadly, I see the lemmas as an "annotation" or abstraction layer
>> whereas I would see the actual content of texts as the "source data". Using
>> linked data and lemmas allows the bridge to connect via lemmas to LiLa
>> data. https://lila-erc.eu/
>>
>> Kind regards,
>> Hugh
>>
>>
>>
>> On Tue, Oct 10, 2023 at 3:39 AM Ada Wan <[email protected]> wrote:
>>
>>> Dear Hugh
>>>
>>> What project are you working on that still requires lemmatization? Would
>>> it not be a better approach to use (sub-)character n-grams (esp. if you are
>>> doing textual analysis/interpretation, vs. processing which can be
>>> byte-based) to decipher what segments would occur most frequently first and
>>> (re-)analyze from there?
>>> I understand there has been a habit in the "language space" to call
>>> certain segments "lemmata". I am curious to know what one can do as a
>>> community, though, to transition to more general methods (and
>>> interpretations on "language").
>>>
>>> Thanks and best
>>> Ada
>>>
>>>
>>> On Tue, Oct 10, 2023 at 12:15 AM Hugh Paterson III via Corpora <
>>> [email protected]> wrote:
>>>
>>>> Greetings,
>>>>
>>>> I am working on a project which is using lemmatization. I'm wondering
>>>> how people have approached combining NIF and lemmatization. are there any
>>>> "blessed" extensions or ontologies?
>>>> I'm not seeing nif:lemma as an option within the nif ontology... though
>>>> I am likely missing something.
>>>>
>>>> Kind regards,
>>>> - Hugh
>>>> _______________________________________________
>>>> Corpora mailing list -- [email protected]
>>>> https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/
>>>> To unsubscribe send an email to [email protected]
>>>>
>>> _______________________________________________
>> Corpora mailing list -- [email protected]
>> https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/
>> To unsubscribe send an email to [email protected]
>>
>
_______________________________________________
Corpora mailing list -- [email protected]
https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/
To unsubscribe send an email to [email protected]

Reply via email to