Re: [GENERAL] Similarity search for sentences

2013-12-06 Thread Kevin Grittner
Janek Sendrowski wrote: > I didn't know that the pg_trgm Module provides KNN search It does, although my own experience shows that it tends to be more appropriate for name searches or similar smaller columns than for big text columns.  Using the war_and_peace table from another thread: test=# C

Re: [GENERAL] Similarity search for sentences

2013-12-06 Thread Janek Sendrowski
Hi, thanks for your Answers.   @Rémi Cura You suggest a kind of Full Text Search.  I already had a try with the tsearch2 extension. The issue is to realize the similarity search. I have to use many OR statements with a low set of arguments. That significantly slows the FTS down.   @Kevin Grittner

Re: [GENERAL] Similarity search for sentences

2013-12-05 Thread Kevin Grittner
Janek Sendrowski wrote: > I already had a try with gist/gin-index-based trigramm search > (pg_trgm extension), fulltextsearch (tsearch2 extension) and a > pivot-based indexing (Fixed Query Array), but it's all to slow or > not suitable. When you tried tsearch2, did you use a trigger to store the

Re: [GENERAL] Similarity search for sentences

2013-12-05 Thread Rémi Cura
May be totally a bad idea : explode your sentence into(sentence_number, one_word), n times , (makes a big table, you may want to partition) then, classic index on sentence number, and on the one world (btree if you make = comparison , more subtel if you do "like 'word' ") depending on perf, it cou

[GENERAL] Similarity search for sentences

2013-12-05 Thread Janek Sendrowski
Hi,   I have tables with millions of sentences. Each row contains a sentence. It is  natural language and every language is possible, but the sentences of one table have the same language. I have to do a similarity search on them. It has to be very fast, because I have to search for a few hundert