TO add some more context - I am able to index english and Western european langauages.
asitag wrote: > > Hi, > > We are trying to index html files which have japanese / korean / chinese > content using the CJK analyser. But while indexing we are getting Lexical > parse error. Encountered unkown character. We tried setting the string > encoding to UTF 8 but it does not help. > > Can anyone please help. Any pointers will be highly appreciated. > > Thanks > -- View this message in context: http://www.nabble.com/Chinese-Japanese-Korean-Indexing-issue-Version-2.4-tp25388003p25388078.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org