Suppose I have a bunch of text documents in language X but I index ithem using 
an analyzer for language Y. Once the index is created, is it possible to 
perform some sort of simple "sanity" check to see if the original language 
selection was wrong? I presume I can try searching for some common word in 
language Y, but I am not sure how reliable this would be. On the other hand, if 
languages are from the same group, say X and Y are English and Spanish, I 
should expect that this sanity check would produce a false match. However, I 
would be happy if it worked reliably enough for languages using different 
scripts, e.g. Latin vs Cyrillic vs Arabic vs Chinese etc.


Thanks much



Ilya Zavorin

Reply via email to