Hi All,
I'm using the default setup of lucene (no custom analyzers configured) and came across the following issue: In Hindi if there is a letter with a diacritic in a phrase lucene will find the phrase with this letter even if the search string is for the letter without a diacritics. Is this an expected behavior? Maybe this is standard for all languages with letters that have diacritics? >From pure byte standpoint I can see the logic, the letter with diacritics takes 6 bytes (E0 A4 95 E0 A5 87) and the single letter takes 3 (E0 A4 95) so if I search for *some_letter* where some letter has code (E0 A4 95) lucene finds the "phrase" (E0 A4 95 E0 A5 87) that includes that letter. Any comments much appreciated. Thanks.