Re: Chinese sorting

2014-12-19 Thread Nils Knappmeier
Hi Tomoko, thank you for the detailed explanation and many thanks for trying out the analyzer for me. I think "Very good compared to Unicode codepoint based sorting" is good enough for me. I will just try and use that Analyzer and see how it satisfies our customer. Regards, Nils On 18.

Re: Chinese sorting

2014-12-18 Thread Tomoko Uchida
Yes, sorting Kanji is not so easy as Hiragana/Kanji. We simply expect that collators sort strings based on phonetics regardless of how they written in (Hiragana, Katakana, Kanji.) However a Kanji has multiple (usually 2 or 3) readings. We human naturally make judgement which reading is suitable de

Re: Chinese sorting

2014-12-18 Thread Nils Knappmeier
Hi Tomoko, does sorting with Locala.JAPANESE also work for Kanji. Since Hiragana and Katakana are based on the phonetics, I guess it is easier to define a sorting order. But Kanji is more similar to the Chinese. Thanks, Nils On 17.12.2014 17:01, Tomoko Uchida wrote: Hi, Nils, I don't kno

Re: Chinese sorting

2014-12-17 Thread Tomoko Uchida
Hi, Nils, I don't know Chinese at all... but collation is very important in Japanese too. Lucene has org.apache.lucene.collation package that use ICU4J's collators (you can find "lucene-analyzers-icu-4.10.2.jar" in analysis/icu directory). http://lucene.apache.org/core/4_10_2/analyzers-icu/index.h

Chinese sorting

2014-12-17 Thread Nils Knappmeier
Hi, is there any implementation for a chinese collator in Lucene. I've seen that there is a chinese analyzer which uses Hidden Markov Models. But sorting seems to be an issue on its own and all my googling hasn't led to any results yet. I understand that this is not a trivial issue and I've