2016-11-09 11:19 GMT+01:00 Pierre Ducroquet <pierre.ducroq...@people-doc.com >:
> On Wednesday, November 9, 2016 10:40:10 AM CET Francisco Olarte wrote: > > Pierre: > > > > On Wed, Nov 9, 2016 at 10:22 AM, Pierre Ducroquet > > > > <pierre.ducroq...@people-doc.com> wrote: > > > The query does a few joins «after» running a FTS query on a main table. > > > The FTS query returns a few thousand rows, but the estimations are > wrong, > > > leading the optimizer to terrible plans compared to what should happen, > > > and > > > thus creates a far higher execution time. > > > > .... > > > > > but the issue remain the same. The table contains about 295,000 > documents, > > > and > > .... > > > > > Request | Estimated rows | Real rows > > > > > > ----------------------------------+----------------+----------- > > > 'word1' | 38050 | 37500 > > > 'word1 word2' | 4680 | 32000 > > > 'word1 word2 word3' | 270 | 12300 > > > 'word1 word2 word3 word4' | 10 | 9930 > > > 'word1 word2 word3 word4 word5' | 1 | 9930 > > > > > > You can see that with more words in query, the estimation falls far > behind > > > reality. > > > > I'm not really familiar with FTS but, doing a few division of > > estimations and rows it seems it estimates as uncorrelated words, and > > you real rows clearly indicate some of them are clearly correlated ( > > like w1/w2 and w4/s5, and partially w3/w45 ) and very common. > > > > > Is that a known limitation of the FTS indexing ? Am I missing something > > > obvious, or a poor configuration ? > > > > Someone more familiar with it needed for that, but what I've found > > several times is FTS does not mix too well with relational queries at > > the optimizer level ( as FTS terms can have very diverse degrees of > > correlation, which is very difficult to store in the statistics a > > relational optimizer normally uses ). > > Indeed the words in the query are correlated, but I do hope that the FTS > indexing is able to cope with that. Otherwise it makes it far less usable > than > what one would expect since real world queries will often contain > sentences or > related words. Also, PostgreSQL 9.6 introduced phrase search in FTS, and I > don't see how that would work without a working multi-words query. > The PostgreSQL statistics are not multidimensional - so bad estimation is expected :( Regards Pavel