On 2020-06-16 16:37, Tom Lane wrote:
After further reflection, I think these are indeed mistakes and we should
change them all.  The argument for the Russian/English case, AIUI, is
"if we come across an all-ASCII word, it is most certainly not Russian,
and the most likely Latin-based language is English".  Given the world
as it is, I think the same argument works for all non-Latin-alphabet
languages.  Obviously specific applications might have a different idea
of the best fallback language, but that's why we let users make their
own text search configurations.  For general-purpose use, falling back
to English seems reasonable.  And we can be dead certain that applying
a Greek stemmer to an ASCII word will do nothing useful, so the
configuration choice shown above is unhelpful.

Do we *have* to have an ASCII stemmer that corresponds to an actual language? Couldn't we use the simple stemmer or no stemmer at all?

In my experience, ASCII text in, say, Russian or Greek will typically be acronyms or brand names or the like, and there doesn't seem to be a great need to stem that kind of thing. Just doing nothing seems at least as good.

--
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply via email to