Hi All, I've been trying to index some non-english [Indian languages] in unicode utf-8. For all these languages we don't have any stemmer or tokenizers etc. To keep the searching simple I'ld like to be able to do exact word searches/matches as a first step. I'ld like to know which will be the simplest yet working analyzer to use for both indexing as well as searhing[lucene wiki says both should be same, else you might not get search results, right?]
Many a people must have done indexing for non-english text for which there is no standard analyzers. I request them to give me ideas on this. Along with this I would also like to do hit highlighting irrespective of language. Ideas on this will be equally helpful. Is simpleAnalyzer() good enough for indexing and searching? Thanks, KK