Hey Robert, good to see that you found the link :) On Fri, Jul 31, 2009 at 6:06 PM, Robert Muir<rcm...@gmail.com> wrote: > Michael, as Simon mentioned I created an issue describing where you > might run into trouble, at least in lucene core. > > The low-level lucene stuff, it treats these just fine (as surrogate pairs). > > But most analyzers run into some trouble. (things like > WhitespaceAnalyzer are ok) > > Also wildcard queries and some things like that might not work as you > expect, for example ? operator will not match a codepoint > FFFF, but > of course you could use ?? as a workaround. > > On Fri, Jul 31, 2009 at 10:54 AM, Michael Thomsen<mikerthom...@gmail.com> > wrote: >> Thanks for your quick response! >> >> Mike >> >> On Fri, Jul 31, 2009 at 10:25 AM, Simon >> Willnauer<simon.willna...@googlemail.com> wrote: >>> If I understand you correctly you are asking if lucene can deal with >>> encodings that use more than 16 bit. Well yes and no but mainly no. >>> The support for unicode 4.0 was introduced in Java 1.5 and lucene core >>> has still back-compat requirements for java 1.4. Lucene's analyzers >>> make use of char[] all over the place which is a sequence of UTF-16 >>> code unit not a code point. As I said the support for codepoints was >>> introduced in 1.5 and I can remember that there is an issue which aims >>> to implement support for upplementary characters (those above FFFF). >>> Such a character is represented as 2 chars and the most of the >>> analysis code will simply remove those characters. >>> Have a look at this issue: >>> https://issues.apache.org/jira/browse/LUCENE-1689 ( @ Robert are you >>> working on this?) >>> >>> I'm sure there will be support for that in lucene 3.1. >>> >>> Simon >>> On Fri, Jul 31, 2009 at 4:08 PM, Michael Thomsen<mikerthom...@gmail.com> >>> wrote: >>>> Is Lucene capable of handling UCS4 data natively? >>>> >>>> Thanks, >>>> >>>> Mike >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>>> For additional commands, e-mail: java-user-h...@lucene.apache.org >>>> >>>> >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>> For additional commands, e-mail: java-user-h...@lucene.apache.org >>> >>> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> >> > > > > -- > Robert Muir > rcm...@gmail.com > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >
--------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org