Suppose I have a bunch of text documents in language X but I index ithem using an analyzer for language Y. Once the index is created, is it possible to perform some sort of simple "sanity" check to see if the original language selection was wrong? I presume I can try searching for some common word in language Y, but I am not sure how reliable this would be. On the other hand, if languages are from the same group, say X and Y are English and Spanish, I should expect that this sanity check would produce a false match. However, I would be happy if it worked reliably enough for languages using different scripts, e.g. Latin vs Cyrillic vs Arabic vs Chinese etc.
Thanks much Ilya Zavorin