Re: [HACKERS] tsearch2: enable non ascii stop words with C locale

Teodor Sigaev Tue, 13 Feb 2007 00:15:10 -0800

I know. My guess is the parser does not read the stop word file at
least with default configuration.


Parser should not read stopword file: its deal for dictionaries.


So if a character is not ASCII, it returns 0 even if p_isalpha returns
1. Is this what you expect?

No, p_islatin should return true only for latin characters, not for national 
ones.


In our case, we added JAPANESE_STOP_WORD into english.stop then:
select to_tsvector(JAPANESE_STOP_WORD)
which returns words even they are in JAPANESE_STOP_WORD.
And with the patches the problem was solved.

Pls, show your configuration for lexemes/dictionaries. I suspect that you haveen_stem dictionary on for lword lexemes type. Better way is to use 'simple'distionary (it's support stopword the same way as en_stem does) and set it fornlword, word, part_hword, nlpart_hword, hword, nlhword lexeme's types. Note,leave unchanged en_stem for any latin word.


--
Teodor Sigaev                                   E-mail: [EMAIL PROTECTED]
                                                   WWW: http://www.sigaev.ru/

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] tsearch2: enable non ascii stop words with C locale

Reply via email to