Peter Eisentraut writes:
> Do we *have* to have an ASCII stemmer that corresponds to an actual
> language? Couldn't we use the simple stemmer or no stemmer at all?
> In my experience, ASCII text in, say, Russian or Greek will typically be
> acronyms or brand names or the like, and there doesn't
On 2020-06-16 16:37, Tom Lane wrote:
After further reflection, I think these are indeed mistakes and we should
change them all. The argument for the Russian/English case, AIUI, is
"if we come across an all-ASCII word, it is most certainly not Russian,
and the most likely Latin-based language is
Mark Dilger writes:
> I am a bit surprised to see that you are right about this, because non-latin
> languages often have transliteration/romanization schemes for writing the
> language in the Latin alphabet, developed before computers had wide spread
> adoption of non-ASCII character sets, and
> On Jun 16, 2020, at 7:37 AM, Tom Lane wrote:
>
> I wrote:
>> Peter Eisentraut writes:
>>> Moreover, AFAIK, the following other languages do not use Latin-based
>>> alphabets:
>
>>> arabic arabic \
>>> greek greek \
>>> nepali nepali \
>>> tamil tamil
I wrote:
> Peter Eisentraut writes:
>> Moreover, AFAIK, the following other languages do not use Latin-based
>> alphabets:
>> arabic arabic \
>> greek greek \
>> nepali nepali \
>> tamil tamil \
> Hmm. I think all of those entries are ones that got a
On Tue, Jun 16, 2020 at 4:53 PM Tom Lane wrote:
> Peter Eisentraut writes:
> > There are two cases where these two columns are not the same:
>
> > hindi english \
> > russian english \
>
> > The second one is old; the first one I added using the second one as
> > exam
Peter Eisentraut writes:
> There are two cases where these two columns are not the same:
> hindi english \
> russian english \
> The second one is old; the first one I added using the second one as
> example. But I wonder what the rationale for this is. Maybe for h
While I was updating the snowball code, I noticed something strange. In
src/backend/snowball/Makefile:
# first column is language name and also name of dictionary for
not-all-ASCII
# words, second is name of dictionary for all-ASCII words
# Note order dependency: use of some other language as