Re: Quick question about Lucene and UCS4

Michael Thomsen Fri, 31 Jul 2009 07:54:52 -0700

Thanks for your quick response!

Mike


On Fri, Jul 31, 2009 at 10:25 AM, Simon
Willnauer<simon.willna...@googlemail.com> wrote:
> If I understand you correctly you are asking if lucene can deal with
> encodings that use more than 16 bit. Well yes and no but mainly no.
> The support for unicode 4.0 was introduced in Java 1.5 and lucene core
> has still back-compat requirements for java 1.4. Lucene's analyzers
> make use of char[] all over the place which is a sequence of UTF-16
> code unit not a code point. As I said the support for codepoints was
> introduced in 1.5 and I can remember that there is an issue which aims
> to implement support for upplementary characters (those above FFFF).
> Such a character is represented as 2 chars and the most of the
> analysis code will simply remove those characters.
> Have a look at this issue:
> https://issues.apache.org/jira/browse/LUCENE-1689 ( @ Robert are you
> working on this?)
>
> I'm sure there will be support for that in lucene 3.1.
>
> Simon
> On Fri, Jul 31, 2009 at 4:08 PM, Michael Thomsen<mikerthom...@gmail.com> 
> wrote:
>> Is Lucene capable of handling UCS4 data natively?
>>
>> Thanks,
>>
>> Mike
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Quick question about Lucene and UCS4

Reply via email to