Re: [GENERAL] tsearch2 and hyphenated terms

2008-04-11 Thread Reece Hart
On Fri, 2008-04-11 at 22:07 +0400, Oleg Bartunov wrote: > We have the same problem with names in astronomy, so we implemented > dict_regex http://vo.astronet.ru/arxiv/dict_regex.html > Check it out ! Oleg- This gets me a lot closer. Thank you. I have two remaining problems. The first problem

Re: [GENERAL] tsearch2 and hyphenated terms

2008-04-11 Thread Oleg Bartunov
We have the same problem with names in astronomy, so we implemented dict_regex http://vo.astronet.ru/arxiv/dict_regex.html Check it out ! Oleg On Thu, 10 Apr 2008, Reece Hart wrote: I'd like to use tsearch2 to index protein and gene names. Unfortunately, such names are written inconsistently a

Re: [GENERAL] tsearch2 and hyphenated terms

2008-04-11 Thread Tom Lane
Reece Hart <[EMAIL PROTECTED]> writes: > For the purposes of indexing these names, I suspect I'd get the majority > of cases by removing a hyphen when it's followed by 1 or 2 chars from > [a-zA-Z0-9]. Does that require a custom parser? Yeah, looks like it: regression=# select * from ts_debug('MCL

[GENERAL] tsearch2 and hyphenated terms

2008-04-11 Thread Reece Hart
I'd like to use tsearch2 to index protein and gene names. Unfortunately, such names are written inconsistently and sometimes with hyphens. For example, MCL-1 and MCL1 are semantically equivalent but with the default parser and to_tsvector, I see this: [EMAIL PROTECTED]> select to_tsvector(