Thanks for your quick response! Mike
On Fri, Jul 31, 2009 at 10:25 AM, Simon Willnauer<simon.willna...@googlemail.com> wrote: > If I understand you correctly you are asking if lucene can deal with > encodings that use more than 16 bit. Well yes and no but mainly no. > The support for unicode 4.0 was introduced in Java 1.5 and lucene core > has still back-compat requirements for java 1.4. Lucene's analyzers > make use of char[] all over the place which is a sequence of UTF-16 > code unit not a code point. As I said the support for codepoints was > introduced in 1.5 and I can remember that there is an issue which aims > to implement support for upplementary characters (those above FFFF). > Such a character is represented as 2 chars and the most of the > analysis code will simply remove those characters. > Have a look at this issue: > https://issues.apache.org/jira/browse/LUCENE-1689 ( @ Robert are you > working on this?) > > I'm sure there will be support for that in lucene 3.1. > > Simon > On Fri, Jul 31, 2009 at 4:08 PM, Michael Thomsen<mikerthom...@gmail.com> > wrote: >> Is Lucene capable of handling UCS4 data natively? >> >> Thanks, >> >> Mike >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> >> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org