May be totally a bad idea : explode your sentence into(sentence_number, one_word), n times , (makes a big table, you may want to partition) then, classic index on sentence number, and on the one world (btree if you make = comparison , more subtel if you do "like 'word' ")
depending on perf, it could be wort it to regroup by words : sentence_number[], on_word Then you could try array or hstore on sentence_number[] ? Cheers, Rémi-C 2013/12/5 Janek Sendrowski <jane...@web.de> > Hi, > > I have tables with millions of sentences. Each row contains a sentence. It > is natural language and every language is possible, but the sentences of > one table have the same language. > I have to do a similarity search on them. It has to be very fast, > because I have to search for a few hundert sentences many times. > The search shouldn't be context-based. It should just get sentences with > similar words(maybe stemmed). > > I already had a try with gist/gin-index-based trigramm search (pg_trgm > extension), fulltextsearch (tsearch2 extension) and a pivot-based indexing > (Fixed Query Array), but it's all to slow or not suitable. > Soundex and Metaphone aren't suitable, as well. > > I'm already working on this project since a long time, but without any > success. > Do any of you have an idea? > > I would be very thankful for help. > > Janek Sendrowski > > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general >