Which analyzer to use for non-english unicoded text?

KK Fri, 22 May 2009 23:24:03 -0700

Hi All,
I've been trying to index some non-english [Indian languages] in unicode
utf-8. For all these languages we don't have any stemmer or tokenizers etc.
To keep the searching simple I'ld like to be able to do exact word
searches/matches as a first step. I'ld like to know which will be the
simplest yet working analyzer to use for both indexing as well as
searhing[lucene wiki says both should be same, else you might not get search
results, right?]


Many a people must have done indexing for non-english text for which there
is no standard analyzers. I request them to give me ideas on this. Along
with this I would also like to do hit highlighting irrespective of language.
Ideas on this will be equally helpful.

Is simpleAnalyzer() good enough for indexing and searching?

Thanks,
KK

Which analyzer to use for non-english unicoded text?

Reply via email to