Re: [GENERAL] using Tsearch2 for chemical text

2007-07-25 Thread Oleg Bartunov
Naz, in posted link to the dict_regex dictionary for tsearch2 http://lynx.sao.ru/~karpov/software/postgres_dict_regex.html Feel free to test it and send us feedback. It's rather general, of course, it uses regex (pcre library). Oleg On Thu, 26 Jul 2007, Naz Gassiep wrote: I think you might

Re: [GENERAL] using Tsearch2 for chemical text

2007-07-25 Thread Oleg Bartunov
On Wed, 25 Jul 2007, Rajarshi Guha wrote: Hi, I have a table with about 9M entries. The table has 2 fields: id and name which are of serial and text types respectively. I have a ordinary index on the text field which allows me to do searches in reasonable time. Most of my searches are of the f

Re: [GENERAL] using Tsearch2 for chemical text

2007-07-25 Thread Naz Gassiep
I think you might need to write a custom lexer to divide the strings into meaningful units. If there are subsections of these names that make sense to search for, then tsearch2 can certainly handle the mechanics of that, but I doubt that the standard rules will divide these names into lexemes u

Re: [GENERAL] using Tsearch2 for chemical text

2007-07-25 Thread Tatsuo Ishii
> Rajarshi Guha <[EMAIL PROTECTED]> writes: > > My problem is that the name column contains names of chemicals. Now > > for many cases this may simply be a number (1674-56-2) and in other > > cases it may be an alphanumeric string (such as (-)O-acetylcarnitine > > or 1,2-cis-dihydroxybenzoate

Re: [GENERAL] using Tsearch2 for chemical text

2007-07-25 Thread Dann Corbit
Tsearch2 is used for full text indexing. It won't be any faster than a btree index like the one you have now (I assume it's unique -- if it isn't then I think it ought to be). If you cluster the table by your index it will speed up your queries. > -Original Message- > From: [EMAIL PROTEC

Re: [GENERAL] using Tsearch2 for chemical text

2007-07-25 Thread Tom Lane
Rajarshi Guha <[EMAIL PROTECTED]> writes: > My problem is that the name column contains names of chemicals. Now > for many cases this may simply be a number (1674-56-2) and in other > cases it may be an alphanumeric string (such as (-)O-acetylcarnitine > or 1,2-cis-dihydroxybenzoate). In some