May be totally a bad idea :
explode your sentence into(sentence_number, one_word), n times , (makes a
big table, you may want to partition)
then, classic index on sentence number, and on the one world (btree if you
make = comparison , more subtel if you do "like 'word' ")

depending on perf, it could be wort it to regroup by words :
sentence_number[], on_word
Then you could try array or hstore on sentence_number[] ?

Cheers,

Rémi-C


2013/12/5 Janek Sendrowski <jane...@web.de>

> Hi,
>
> I have tables with millions of sentences. Each row contains a sentence. It
> is natural language and every language is possible, but the sentences of
> one table have the same language.
> I have to do a similarity search on them. It has to be very fast,
> because I have to search for a few hundert sentences many times.
> The search shouldn't be context-based. It should just get sentences with
> similar words(maybe stemmed).
>
> I already had a try with gist/gin-index-based trigramm search (pg_trgm
> extension), fulltextsearch (tsearch2 extension) and a pivot-based indexing
> (Fixed Query Array), but it's all to slow or not suitable.
> Soundex and Metaphone aren't suitable, as well.
>
> I'm already working on this project since a long time, but without any
> success.
> Do any of you have an idea?
>
> I would be very thankful for help.
>
> Janek Sendrowski
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>

Reply via email to