Hi Tomoko,
thank you for the detailed explanation and many thanks for trying out
the analyzer for me.
I think "Very good compared to Unicode codepoint based sorting" is good
enough for me.
I will just try and use that Analyzer and see how it satisfies our customer.
Regards,
Nils
On 18.
Yes, sorting Kanji is not so easy as Hiragana/Kanji.
We simply expect that collators sort strings based on phonetics regardless
of how they written in (Hiragana, Katakana, Kanji.)
However a Kanji has multiple (usually 2 or 3) readings. We human naturally
make judgement which reading is suitable de
Hi Tomoko,
does sorting with Locala.JAPANESE also work for Kanji. Since Hiragana
and Katakana are based on the phonetics, I guess it is easier to define
a sorting order. But Kanji is more similar to the Chinese.
Thanks,
Nils
On 17.12.2014 17:01, Tomoko Uchida wrote:
Hi, Nils,
I don't kno
Hi, Nils,
I don't know Chinese at all... but collation is very important in Japanese
too.
Lucene has org.apache.lucene.collation package that use ICU4J's collators
(you can find "lucene-analyzers-icu-4.10.2.jar" in analysis/icu directory).
http://lucene.apache.org/core/4_10_2/analyzers-icu/index.h
Hi,
is there any implementation for a chinese collator in Lucene. I've seen
that there is a chinese analyzer which uses Hidden Markov Models. But
sorting seems to be an issue on its own and all my googling hasn't led
to any results yet.
I understand that this is not a trivial issue and I've