Hi All

I have a question regarding PostgreSQL's full text capabilities and 
(presumably) the synonym dictionary.

I'm currently implementing FTS on a medical themed setup which uses domain 
specific jargon to denote a bunch of stuff. A specific request I wish to 
implement here are the jargon synonyms that are heavily used.

Of course, I can simply go ahead and create my own synonym dictionary with a 
jargon specific synonym file to feed it. However, most of the synonyms are 
comprised out of more then a single word. 

The term "heart attack" for example has the following "synonyms":

- Acute MI
- MI
- Myocardial infarction

As far as I understand it, the tokenizer within PostgreSQL FTS engine splits 
words on spaces to generate tokens which are then proposed to each dictionary. 
I think it is therefor impossible to have "multi-word synonyms" in this sense 
as multiple words cannot reach the dictionary. The term "heart attack" would be 
presented as the tokens "heart" and "attack".

>From a technical standpoint I understand FTS is about looking at individual 
>words and lexemizing them ... yet from a natural language lookup perspective 
>you still wish to tie "Heart attack" to "Acute MI" so when a client search on 
>one, the other will turn up as well.

Should I write my own tokenizer to catch all these words and present them as a 
single token? Or is this completely outside the realm of FTS (or FTS within 
Postgresql)?

Cheers,
Tim


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Reply via email to