Limitations of StempelStemmer

2019-09-10 Thread Maciej Gawinecki
it returns "ać". I would expect that for words it has not be trained for it will return their original forms, as it happens, for instance, when stemming words like "xyz". With kind regards, Maciej Gawinecki Here's minimal example to reproduce the issue: package org.apache.l

Re: Limitations of StempelStemmer

2019-09-25 Thread Maciej Gawinecki
> > > You always pass "piwko" for stemming. > > I'm afraid that's not correct? You should *never* pass on piwko when > stemming. :) Haha, right, one should not mix both. Anyway, thank your for your original suggestions. Training it with a bigger corpus of inflection forms seems like a great idea.

Re: Limitations of StempelStemmer

2019-09-25 Thread Maciej Gawinecki
> You always pass "piwko" for stemming. Right, I've spotted my mistake once I've posted my question but didn't want spam with too many posts (there's no way to edit already posted question in a mailing list :-)). Anyway, the issue still persists. Here's the corrected version to reproduce it: imp