Hi Ada good to hear from you,

The project is called: "The Bridge". https://bridge.haverford.edu/
I am not the PI. The project has been in existence for about 12 years.
 I was invited to become involved through my Drexel LEADING Fellowship.
Here is a paper we published this summer:
https://hughandbecky.us/Hugh-CV/publication/2023-bridging-corpora/4LR_pre_print.pdf

The Bridge is a linked data application supporting curriculum development.
It was developed with Latin in mind, but has been extended to Greek as
well. It quickly helps instructors and students find new vocabulary words
in newly assigned texts, based on texts they have already encountered in
their curriculum.

The current workflow takes a variety of texts from several sources and then
stores the lemmas for comparison across texts and broad stats generation. I
see value in modeling the whole text not just the lemmas as this may allow
future services. So, while NIF could model the whole text, the current
operational activities really only involve using lemmas. To move forward in
a linked data model we need to support current operations. More broadly, I
see the lemmas as an "annotation" or abstraction layer whereas I would see
the actual content of texts as the "source data". Using linked data and
lemmas allows the bridge to connect via lemmas to LiLa data.
https://lila-erc.eu/

Kind regards,
Hugh



On Tue, Oct 10, 2023 at 3:39 AM Ada Wan <[email protected]> wrote:

> Dear Hugh
>
> What project are you working on that still requires lemmatization? Would
> it not be a better approach to use (sub-)character n-grams (esp. if you are
> doing textual analysis/interpretation, vs. processing which can be
> byte-based) to decipher what segments would occur most frequently first and
> (re-)analyze from there?
> I understand there has been a habit in the "language space" to call
> certain segments "lemmata". I am curious to know what one can do as a
> community, though, to transition to more general methods (and
> interpretations on "language").
>
> Thanks and best
> Ada
>
>
> On Tue, Oct 10, 2023 at 12:15 AM Hugh Paterson III via Corpora <
> [email protected]> wrote:
>
>> Greetings,
>>
>> I am working on a project which is using lemmatization. I'm wondering how
>> people have approached combining NIF and lemmatization. are there any
>> "blessed" extensions or ontologies?
>> I'm not seeing nif:lemma as an option within the nif ontology... though I
>> am likely missing something.
>>
>> Kind regards,
>> - Hugh
>> _______________________________________________
>> Corpora mailing list -- [email protected]
>> https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/
>> To unsubscribe send an email to [email protected]
>>
>
_______________________________________________
Corpora mailing list -- [email protected]
https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/
To unsubscribe send an email to [email protected]

Reply via email to