benwtrent commented on issue #14429:
URL: https://github.com/apache/lucene/issues/14429#issuecomment-2786821619
@mikemccand OK, I gathered more info:
- Modern OpenJDK (22.0.1)
- Modern Linux
So other system stuff doesn't seem very exotic.
However, the data being ingested might have various pieces of turkish
unicode. Digging around the analyzers, I didn't find any special handling, so
its all using the StandardAnalyzer with no additional normalization.
I wonder if we are just hitting the dreaded turkish "i" unicode issue
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]