> Should we check for stop words before stemming or after ? Current implementation supports both variants. Look dictionary interface definition in morph.c:
typedef struct { char localename[NAMEDATALEN]; /* init dictionary */ void *(*init) (void); /* close dictionary */ void (*close) (void *); /* find in dictionary */ char *(*lemmatize) (void *, char *, int *); int (*is_stoplemm) (void *, char *, int); int (*is_stemstoplemm) (void *, char *, int); } DICT; 'is_stoplemm' method is called before 'lemmtize' and 'is_stemstoplemm' after. dict/porter_english.dct at the end: TABLE_DICT_START "C", setup_english_stemmer, closedown_english_stemmer, engstemming, NULL, is_stopengword TABLE_DICT_END dict/russian_stemming.dct: TABLE_DICT_START "ru_RU.KOI8-R", NULL, NULL, ru_RUKOI8R_stem, ru_RUKOI8R_is_stopword, NULL TABLE_DICT_END So english stemmer defines is lexem stop or not after stemming, but russian before. -- Teodor Sigaev [EMAIL PROTECTED] ---------------------------(end of broadcast)--------------------------- TIP 4: Don't 'kill -9' the postmaster