Simon, no problem. I am looking at it now. I will just post my
approach and let people tear it apart / get things moving :)

On Fri, Jul 31, 2009 at 2:45 PM, Simon
Willnauer<simon.willna...@googlemail.com> wrote:
> @Michael: add yourself as a Watcher for the issue.
> @Robert: I can start working on this within the next weeks - can you help too?
>
> simon
>
> On Fri, Jul 31, 2009 at 7:49 PM, Robert Muir<rcm...@gmail.com> wrote:
>> Michael, makes sense. most of the issues probably have some
>> workaround, so reply back if you need.
>>
>> Thanks for your feedback though, it is helpful to know that its important!
>>
>> On Fri, Jul 31, 2009 at 1:36 PM, Michael Thomsen<mikerthom...@gmail.com> 
>> wrote:
>>> Not really. At this point, I just needed to know where the UCS4
>>> support stands. I'm reasonably familiar with the various analyzers and
>>> what they can do. It's just the state of UCS4 support that might be an
>>> issue for us.
>>>
>>> Thanks,
>>>
>>> Mike
>>>
>>> On Fri, Jul 31, 2009 at 12:25 PM, Robert Muir<rcm...@gmail.com> wrote:
>>>> Michael just out of curiousity, did you have a particular Analyzer in
>>>> mind you were planning on using, or rather certain features in Lucene
>>>> you were concerned would work with these codepoints?
>>>>
>>>> On Fri, Jul 31, 2009 at 12:19 PM, Simon
>>>> Willnauer<simon.willna...@googlemail.com> wrote:
>>>>> Hey Robert, good to see that you found the link :)
>>>>>
>>>>> On Fri, Jul 31, 2009 at 6:06 PM, Robert Muir<rcm...@gmail.com> wrote:
>>>>>> Michael, as Simon mentioned I created an issue describing where you
>>>>>> might run into trouble, at least in lucene core.
>>>>>>
>>>>>> The low-level lucene stuff, it treats these just fine (as surrogate 
>>>>>> pairs).
>>>>>>
>>>>>> But most analyzers run into some trouble. (things like
>>>>>> WhitespaceAnalyzer are ok)
>>>>>>
>>>>>> Also wildcard queries and some things like that might not work as you
>>>>>> expect, for example ? operator will not match a codepoint > FFFF, but
>>>>>> of course you could use ?? as a workaround.
>>>>>>
>>>>>> On Fri, Jul 31, 2009 at 10:54 AM, Michael 
>>>>>> Thomsen<mikerthom...@gmail.com> wrote:
>>>>>>> Thanks for your quick response!
>>>>>>>
>>>>>>> Mike
>>>>>>>
>>>>>>> On Fri, Jul 31, 2009 at 10:25 AM, Simon
>>>>>>> Willnauer<simon.willna...@googlemail.com> wrote:
>>>>>>>> If I understand you correctly you are asking if lucene can deal with
>>>>>>>> encodings that use more than 16 bit. Well yes and no but mainly no.
>>>>>>>> The support for unicode 4.0 was introduced in Java 1.5 and lucene core
>>>>>>>> has still back-compat requirements for java 1.4. Lucene's analyzers
>>>>>>>> make use of char[] all over the place which is a sequence of UTF-16
>>>>>>>> code unit not a code point. As I said the support for codepoints was
>>>>>>>> introduced in 1.5 and I can remember that there is an issue which aims
>>>>>>>> to implement support for upplementary characters (those above FFFF).
>>>>>>>> Such a character is represented as 2 chars and the most of the
>>>>>>>> analysis code will simply remove those characters.
>>>>>>>> Have a look at this issue:
>>>>>>>> https://issues.apache.org/jira/browse/LUCENE-1689 ( @ Robert are you
>>>>>>>> working on this?)
>>>>>>>>
>>>>>>>> I'm sure there will be support for that in lucene 3.1.
>>>>>>>>
>>>>>>>> Simon
>>>>>>>> On Fri, Jul 31, 2009 at 4:08 PM, Michael 
>>>>>>>> Thomsen<mikerthom...@gmail.com> wrote:
>>>>>>>>> Is Lucene capable of handling UCS4 data natively?
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>>
>>>>>>>>> Mike
>>>>>>>>>
>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>>>>>>>>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>> ---------------------------------------------------------------------
>>>>>>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>>>>>>>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> ---------------------------------------------------------------------
>>>>>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>>>>>>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Robert Muir
>>>>>> rcm...@gmail.com
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>>>>>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>>>>>>
>>>>>>
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>>>>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Robert Muir
>>>> rcm...@gmail.com
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>>>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>>>>
>>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>>>
>>>
>>
>>
>>
>> --
>> Robert Muir
>> rcm...@gmail.com
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>



-- 
Robert Muir
rcm...@gmail.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to