@Michael: add yourself as a Watcher for the issue.
@Robert: I can start working on this within the next weeks - can you help too?

simon

On Fri, Jul 31, 2009 at 7:49 PM, Robert Muir<rcm...@gmail.com> wrote:
> Michael, makes sense. most of the issues probably have some
> workaround, so reply back if you need.
>
> Thanks for your feedback though, it is helpful to know that its important!
>
> On Fri, Jul 31, 2009 at 1:36 PM, Michael Thomsen<mikerthom...@gmail.com> 
> wrote:
>> Not really. At this point, I just needed to know where the UCS4
>> support stands. I'm reasonably familiar with the various analyzers and
>> what they can do. It's just the state of UCS4 support that might be an
>> issue for us.
>>
>> Thanks,
>>
>> Mike
>>
>> On Fri, Jul 31, 2009 at 12:25 PM, Robert Muir<rcm...@gmail.com> wrote:
>>> Michael just out of curiousity, did you have a particular Analyzer in
>>> mind you were planning on using, or rather certain features in Lucene
>>> you were concerned would work with these codepoints?
>>>
>>> On Fri, Jul 31, 2009 at 12:19 PM, Simon
>>> Willnauer<simon.willna...@googlemail.com> wrote:
>>>> Hey Robert, good to see that you found the link :)
>>>>
>>>> On Fri, Jul 31, 2009 at 6:06 PM, Robert Muir<rcm...@gmail.com> wrote:
>>>>> Michael, as Simon mentioned I created an issue describing where you
>>>>> might run into trouble, at least in lucene core.
>>>>>
>>>>> The low-level lucene stuff, it treats these just fine (as surrogate 
>>>>> pairs).
>>>>>
>>>>> But most analyzers run into some trouble. (things like
>>>>> WhitespaceAnalyzer are ok)
>>>>>
>>>>> Also wildcard queries and some things like that might not work as you
>>>>> expect, for example ? operator will not match a codepoint > FFFF, but
>>>>> of course you could use ?? as a workaround.
>>>>>
>>>>> On Fri, Jul 31, 2009 at 10:54 AM, Michael Thomsen<mikerthom...@gmail.com> 
>>>>> wrote:
>>>>>> Thanks for your quick response!
>>>>>>
>>>>>> Mike
>>>>>>
>>>>>> On Fri, Jul 31, 2009 at 10:25 AM, Simon
>>>>>> Willnauer<simon.willna...@googlemail.com> wrote:
>>>>>>> If I understand you correctly you are asking if lucene can deal with
>>>>>>> encodings that use more than 16 bit. Well yes and no but mainly no.
>>>>>>> The support for unicode 4.0 was introduced in Java 1.5 and lucene core
>>>>>>> has still back-compat requirements for java 1.4. Lucene's analyzers
>>>>>>> make use of char[] all over the place which is a sequence of UTF-16
>>>>>>> code unit not a code point. As I said the support for codepoints was
>>>>>>> introduced in 1.5 and I can remember that there is an issue which aims
>>>>>>> to implement support for upplementary characters (those above FFFF).
>>>>>>> Such a character is represented as 2 chars and the most of the
>>>>>>> analysis code will simply remove those characters.
>>>>>>> Have a look at this issue:
>>>>>>> https://issues.apache.org/jira/browse/LUCENE-1689 ( @ Robert are you
>>>>>>> working on this?)
>>>>>>>
>>>>>>> I'm sure there will be support for that in lucene 3.1.
>>>>>>>
>>>>>>> Simon
>>>>>>> On Fri, Jul 31, 2009 at 4:08 PM, Michael 
>>>>>>> Thomsen<mikerthom...@gmail.com> wrote:
>>>>>>>> Is Lucene capable of handling UCS4 data natively?
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>>
>>>>>>>> Mike
>>>>>>>>
>>>>>>>> ---------------------------------------------------------------------
>>>>>>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>>>>>>>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> ---------------------------------------------------------------------
>>>>>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>>>>>>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>>>>>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Robert Muir
>>>>> rcm...@gmail.com
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>>>>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>>>>>
>>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>>>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> Robert Muir
>>> rcm...@gmail.com
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>>>
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>>
>>
>
>
>
> --
> Robert Muir
> rcm...@gmail.com
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to