We are using Lucene (2.4.0 libraries) for implementing search in our application. We are using Standard Analyzer for Analyzer part.
Our application has a documents upload feature which lets you upload the documents and be able to put in some keywords (while uploading it). When we search (using the keywords), the search will retrieve the documents based on the keywords. The problem that we are facing is the search works fine if the keywords are in English or Simplified Chinese but is not supporting Japanese. I am not sure if its the problem with the Analyzer that we are using or is not being supported in 2.4.0 version (Japanese Characters). I did find the following below doing a Google search. https://issues.apache.org/jira/browse/LUCENE-2847 ( support all of the unicode) http://lucene.472066.n3.nabble.com/which-unicode-version-is-supported-with-lucene-td2574222.html We are not tokenizing the document, we are only tokenizing the keywords added while uploading the document. document.add(new Field(field.getKeyword(), value, Field.Store.NO, Field.Index.ANALYZED)); Do you think upgrading to the latest version of the Lucene would solve the issue? or do we need to use special analyzers for each specific language? Does the Standard Analyzer does not support Unicode characters? Any thoughts on this is much appreciated? Thanks, Sai -- View this message in context: http://lucene.472066.n3.nabble.com/Lucene-support-for-multi-byte-characters-2-4-0-version-tp4031654.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org