Re: snowball ASCII stemmer configuration

Peter Eisentraut Fri, 19 Jun 2020 02:47:18 -0700

On 2020-06-16 16:37, Tom Lane wrote:

After further reflection, I think these are indeed mistakes and we should
change them all.  The argument for the Russian/English case, AIUI, is
"if we come across an all-ASCII word, it is most certainly not Russian,
and the most likely Latin-based language is English".  Given the world
as it is, I think the same argument works for all non-Latin-alphabet
languages.  Obviously specific applications might have a different idea
of the best fallback language, but that's why we let users make their
own text search configurations.  For general-purpose use, falling back
to English seems reasonable.  And we can be dead certain that applying
a Greek stemmer to an ASCII word will do nothing useful, so the
configuration choice shown above is unhelpful.

Do we *have* to have an ASCII stemmer that corresponds to an actuallanguage? Couldn't we use the simple stemmer or no stemmer at all?

In my experience, ASCII text in, say, Russian or Greek will typically beacronyms or brand names or the like, and there doesn't seem to be agreat need to stem that kind of thing. Just doing nothing seems atleast as good.


--
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: snowball ASCII stemmer configuration

Reply via email to