Hunspell stemmer generates multiple tokens

Luca Cavanna Fri, 07 Jun 2013 06:17:05 -0700

Hi,
I just noticed that the HunspellStemmer outputs more than one tokens, the
original word plus the stems as far as I understood.


This is not quite what I would expect and becomes tricky especially at
query time. Using for instance elasticsearch to query a stemmed field, a
boolean query would be generated, containing multiple clauses (one for each
token generated by the stemmer) instead of just a clause with the stem that
we expect to find in the index (if we indexed using stemming of course).

I would like to know if you think this is the correct behaviour and if this
is something you are aware of. If I look at snowball for example, I see
that only one token is generated.


Thanks,
Luca

Hunspell stemmer generates multiple tokens

Reply via email to