Thank you for all this information Uwe and Walter!
Let me digest this information and education myself on these matters and figure
out a way forward.
Have a great one,
Mete
> On May 20, 2021, at 6:43 PM, Walter Underwood wrote:
>
> I recommend normalizing all characters with a compatibility
I recommend normalizing all characters with a compatibility transformation,
whether they are Arabic or not.
We use this charFilter as the first step in every query and indexing analysis
chain.
You’ll also need to include the ICU library, which should be included by
default. Actually
This is only for Arabic language.
If you don't know the language and just want to assist people searching with
different scripts (search with latin letters for Arabic text), see my other
answer.
Uwe
Am May 20, 2021 2:38:26 PM UTC schrieb Mete Kural
:
>Hello Michael,
>
>Thank you very much fo
Hi,
As answer to your question looking for character substitutions. There is the
ICU library doing this with ICU Transformers. It may also change all Cyrillic
text to latin during indexing and search. This greatly helps people to find
stuff.
A great example of a transformer is here as part of
Hello Michael,
Thank you very much for this information.
I will try at java-u...@lucene.apache.org also.
By the way, is the Arabic analyzer referenced here
(https://github.com/apache/lucene/tree/main/lucene/analysis/common/src/java/org/apache/lucene/analysis/ar)
just for the Arabic language
Hi Mete
You might also want to try the java-u...@lucene.apache.org mailing list
https://lucene.apache.org/core/discussion.html#java-user-list-java-userluceneapacheorg
Re languages other than english you might find more information at
https://cwiki.apache.org/confluence/display/lucene/LuceneFAQ
Hello Lucene Community,
I hope this finds you all well. I want to ask you if this would be the right
medium to discuss some matters surrounding text search in relation to variant
Unicode codings of words in Arabic and Arabic scripted languages. This is not a
great example but the said matters a