Hey Robert, good to see that you found the link :)

On Fri, Jul 31, 2009 at 6:06 PM, Robert Muir<rcm...@gmail.com> wrote:
> Michael, as Simon mentioned I created an issue describing where you
> might run into trouble, at least in lucene core.
>
> The low-level lucene stuff, it treats these just fine (as surrogate pairs).
>
> But most analyzers run into some trouble. (things like
> WhitespaceAnalyzer are ok)
>
> Also wildcard queries and some things like that might not work as you
> expect, for example ? operator will not match a codepoint > FFFF, but
> of course you could use ?? as a workaround.
>
> On Fri, Jul 31, 2009 at 10:54 AM, Michael Thomsen<mikerthom...@gmail.com> 
> wrote:
>> Thanks for your quick response!
>>
>> Mike
>>
>> On Fri, Jul 31, 2009 at 10:25 AM, Simon
>> Willnauer<simon.willna...@googlemail.com> wrote:
>>> If I understand you correctly you are asking if lucene can deal with
>>> encodings that use more than 16 bit. Well yes and no but mainly no.
>>> The support for unicode 4.0 was introduced in Java 1.5 and lucene core
>>> has still back-compat requirements for java 1.4. Lucene's analyzers
>>> make use of char[] all over the place which is a sequence of UTF-16
>>> code unit not a code point. As I said the support for codepoints was
>>> introduced in 1.5 and I can remember that there is an issue which aims
>>> to implement support for upplementary characters (those above FFFF).
>>> Such a character is represented as 2 chars and the most of the
>>> analysis code will simply remove those characters.
>>> Have a look at this issue:
>>> https://issues.apache.org/jira/browse/LUCENE-1689 ( @ Robert are you
>>> working on this?)
>>>
>>> I'm sure there will be support for that in lucene 3.1.
>>>
>>> Simon
>>> On Fri, Jul 31, 2009 at 4:08 PM, Michael Thomsen<mikerthom...@gmail.com> 
>>> wrote:
>>>> Is Lucene capable of handling UCS4 data natively?
>>>>
>>>> Thanks,
>>>>
>>>> Mike
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>>>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>>>>
>>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>>>
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>>
>>
>
>
>
> --
> Robert Muir
> rcm...@gmail.com
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to