Re: [HACKERS] contrib/tsearch

2002-09-09 Thread Teodor Sigaev
> Should we check for stop words before stemming or after ? Current implementation supports both variants. Look dictionary interface definition in morph.c: typedef struct { charlocalename[NAMEDATALEN]; /* init dictionary */ void *(*init) (void);

Re: [HACKERS] contrib/tsearch

2002-09-06 Thread Oleg Bartunov
On Fri, 6 Sep 2002, Christopher Kings-Lynne wrote: > > Should we check for stop words before stemming or after ? > > I think you should. > > > In the first case we have to collect all forms of stop-words > > which is doable > > but difficult to maintain, in latter - we'll have current problem. >

Re: [HACKERS] contrib/tsearch

2002-09-06 Thread Oleg Bartunov
On Fri, 6 Sep 2002, Christopher Kings-Lynne wrote: > > Looking at the list of stopwords you sent me, Oleg, there are only about 1 > > out of the list of 120 stopwords that need to have all word forms > > added. I > > also don't think it'll be a maintenance problem. The reason I > > think this i

Re: [HACKERS] contrib/tsearch

2002-09-06 Thread Oleg Bartunov
probably we could enhance our parser to handle such words too. Anyway, most problems just a question of time we don't have :-( > > Chris > > > -Original Message- > > From: [EMAIL PROTECTED] > > [mailto:[EMAIL PROTECTED]]On Behalf Of Christopher > > Ki

Re: [HACKERS] contrib/tsearch

2002-09-05 Thread Christopher Kings-Lynne
. wasn't, isn't, it's, etc.? Chris > -Original Message- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED]]On Behalf Of Christopher > Kings-Lynne > Sent: Friday, 6 September 2002 12:20 PM > To: Christopher Kings-Lynne; Oleg Bartunov > Cc: Hackers; [EMAI

Re: [HACKERS] contrib/tsearch

2002-09-05 Thread Christopher Kings-Lynne
> Looking at the list of stopwords you sent me, Oleg, there are only about 1 > out of the list of 120 stopwords that need to have all word forms > added. I > also don't think it'll be a maintenance problem. The reason I > think this is > because stopwords in general don't have different word f

Re: [HACKERS] contrib/tsearch

2002-09-05 Thread Christopher Kings-Lynne
> Should we check for stop words before stemming or after ? I think you should. > In the first case we have to collect all forms of stop-words > which is doable > but difficult to maintain, in latter - we'll have current problem. Looking at the list of stopwords you sent me, Oleg, there are onl

Re: [HACKERS] contrib/tsearch

2002-09-05 Thread Oleg Bartunov
On Thu, 5 Sep 2002, Martin Porter wrote: > > Oleg, > > The Porter stemming stems herring and herrings to her, which is a bit > unfortunate. A quick fix is to put 'herring/herrings' in the exception list > in the english (porter2) stemmer, but I'll look at this case over the next > few days and se

Re: [HACKERS] contrib/tsearch

2002-09-05 Thread Martin Porter
Oleg, The Porter stemming stems herring and herrings to her, which is a bit unfortunate. A quick fix is to put 'herring/herrings' in the exception list in the english (porter2) stemmer, but I'll look at this case over the next few days and see if I can come up with something a bit better. Inter

Re: [HACKERS] contrib/tsearch

2002-09-05 Thread Oleg Bartunov
On Thu, 5 Sep 2002, Christopher Kings-Lynne wrote: > Hmmm...thinking about it, maybe 'herring' is being reduced to 'her' after > the stemming process and hence is thought to be a stopword? This is a bug, > but how should it be fixed? > It's difficult question how to use stop words. We'll see wh

Re: [HACKERS] contrib/tsearch

2002-09-04 Thread Christopher Kings-Lynne
Hmmm...thinking about it, maybe 'herring' is being reduced to 'her' after the stemming process and hence is thought to be a stopword? This is a bug, but how should it be fixed? Although, tests don't support that: usa=# select food_id, brand,description,ftiidx from food_foods where ftiidx ## 'hi