> Should we check for stop words before stemming or after ? I think you should.
> In the first case we have to collect all forms of stop-words > which is doable > but difficult to maintain, in latter - we'll have current problem. Looking at the list of stopwords you sent me, Oleg, there are only about 1 out of the list of 120 stopwords that need to have all word forms added. I also don't think it'll be a maintenance problem. The reason I think this is because stopwords in general don't have different word forms. eg. her, his, i, and, etc. They don't have different forms. In fact, the _only_ word in the stopword list that needs a different form is yourself and yourselves. Actually, according to dictionary.com 'ourself' is also a word. 'themself' isn't tho. Some others I don't know about are: 'veri' - I assume this is stemmed 'very', so why not just use 'very'? So, why don't you change tsearch to check for stop words _before_ stemming? I can give you a list of revised stopwords that haven't been stemmed, with all forms of the words. > It's time for beta1 and I'm not sure if we could work on this issue > right now, but I feel a big pressure from tsearch users :-) > If people want to help us why not to work on stop words list including > all forms ? In any case, we are not native english, so don't expect we'll > create more or less decent list. Programming changes are trivial, probably > we'll end for the moment just using compile time option. > As always, your patches are welcome ! I'm happy to work on the list of stopwords for you, Oleg. I agree this might be 7.4 thing though... Chris ---------------------------(end of broadcast)--------------------------- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]