Itamar Syn-Hershko a écrit :
For what it worths, I did something similar in my BidiAnalyzer so I can
index both Hebrew/Semitic texts and English/Latin words without switching
analyzers, giving each the proper treatment. I did it simply by testing the
first char and looking at its numeric value - so it falls between Hebrew
Aleph and Taph then its Hebrew, else its Latin. I wonder how you would spot
a French word in an English text for instance (aren't there parallel words?)

Itamar.
With ngram statistic compare.
Finding foreign word in a sentence is very difficult, many words are very similar, and some are "faux amis" : same differents means in each language. Querying in mixing language seems to be a bit vicious. Mixing alphabet is more common (and easier to handle).

M.

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to