On Wed, Aug 31, 2022 at 6:57 AM Tom Lane <t...@sss.pgh.pa.us> wrote: > I wrote: > > The upstream recommendation, which seems pretty sane to me, is to > > simply reject any string exceeding some threshold length as not > > possibly being a word. Apparently it's common to use thresholds > > as small as 64 bytes, but in the attached I used 1000 bytes. > > On further thought: that coding treats anything longer than 1000 > bytes as a stopword, but maybe we should just accept it unmodified. > The manual says "A Snowball dictionary recognizes everything, whether > or not it is able to simplify the word". While "recognizes" formally > includes the case of "recognizes as a stopword", people might find > this behavior surprising. We could alternatively do it as attached, > which accepts overlength words but does nothing to them except > case-fold. This is closer to the pre-patch behavior, but gives up > the opportunity to avoid useless downstream processing of long words.
This patch looks good to me. It avoids overly-long words (> 1000 bytes) going through the stemmer so the stack overflow issue in Turkish stemmer should not exist any more. Thanks Richard