Re: [OT] Re: the 'original' string function?

jeremy ardley Wed, 10 Jul 2024 03:30:01 -0700



On 10/7/24 18:01, Nicolas George wrote:

Emanuel Berg (12024-07-10):

Okay, this is gonna be a challenge to most guys who have been
processing text for a long time.

So, I would like a command, function or script, 'original',
that takes a string STR and a text file TXT and outputs
a score, from 0 to 100, how _original_ STR is, compared to
what is already in TXT.

So if I do

$ original "This isn't just another party" comments.txt

this will score 0 if that exact phrase to the letter already
exists in comments.txt.

But it will score 100 if not a single of those words exists in
the file! Because that would be 100% original.

Those endpoints are easy. But how to make it score - say - 62%
if some of the words are present, mostly spelled like that and
combined in ways that are not completely different?

Note: The above examples are examples, other definitions of
originality are okay. That is not the important part now - but
can be as interesting a part, later.


You can use that:

https://en.wikipedia.org/wiki/Levenshtein_distance

But you also need to define what you want with more precision:

How do you count the replacement of a word by a synonym?

How do you count a change in the order of the words?

How do you count a transparent spelling mistake?

How do you count a spelling mistake that turns a word into another
existing word?

Not related to Debian, putting “[OT]” in the subject.

Regards,

The modern way would be to use a LLM in API mode and set a context toachieve your aims.

You can do this locally using a LLM hosted on your computer or you canuse a remote API such as ChatGPT.


This is usually scripted in python.

The interesting thing is you can get a good LLM such as GPT4 to helpwrite a context to be run by a lesser LLM.

You should not expect perfection and may not get 100% repeatable resultsbut It'll still be fairly good.

Re: [OT] Re: the 'original' string function?

Reply via email to