Thanks for all the responses. From the above, it sounds that there are two options.
1. Use ICUTokenizer ( is it in Lucene 4.0 or 4.1)? If its in 4.1, then we cannot use at this time as it is not released out. 2. Write a custom analyzer by extending ( StandardAnalyzer) and add filters for additional languages. The problem that we are facing currently is described in detail at: http://lucene.472066.n3.nabble.com/Lucene-support-for-multi-byte-characters-2-4-0-version-td4031654.html <http://lucene.472066.n3.nabble.com/Lucene-support-for-multi-byte-characters-2-4-0-version-td4031654.html> Just to summarize it, we are facing some issues tokenizing some Japanese keyword characters (while uploading some documents, we have some keywords where people can type in any language) and as a result, searching using such specific keywords words is not working with the StandardAnalyzer (2.4.0 version). Can you suggest any filter for this to integrate in Standard Analyzer? Thanks, Sai. -- View this message in context: http://lucene.472066.n3.nabble.com/Is-StandardAnalyzer-good-enough-for-multi-languages-tp4031660p4031942.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org