The following bug has been logged on the website: Bug reference: 7510 Logged by: Daniel Farina Email address: dan...@heroku.com PostgreSQL version: 9.1.4 Operating system: Ubuntu 10.04 Description:
Summary: Planner chooses GiST even if GIN is much better. We have a table that we decided to use GiST-based full text search on, but received terrible performance. It's not a very big table, nor are the tsvectors very large -- we FTS a tiny bit of text and throw in a few identifiers at our choosing to enable a search in an application of ours. The root cause of that is that GiST is terrible when using prefix matching operators on tsvectors (the ":*" operator in the search language), which is what one nominally wants for incremental search. Upon doing "EXPLAIN (ANALYZE, BUFFERS, FORMAT YAML)", this is what the the plan looks like: ------------------------------------------------------------------------------------- - Plan: + Node Type: "Limit" + Startup Cost: 0.00 + Total Cost: 14.48 + Plan Rows: 6 + Plan Width: 240 + Actual Startup Time: 499.515 + Actual Total Time: 499.515 + Actual Rows: 0 + Actual Loops: 1 + Shared Hit Blocks: 269246 + Shared Read Blocks: 0 + Shared Written Blocks: 0 + Local Hit Blocks: 0 + Local Read Blocks: 0 + Local Written Blocks: 0 + Temp Read Blocks: 0 + Temp Written Blocks: 0 + Plans: + - Node Type: "Index Scan" + Parent Relationship: "Outer" + Scan Direction: "NoMovement" + Index Name: "resources_text_searchable_idx" + Relation Name: "resources" + Alias: "resources" + Startup Cost: 0.00 + Total Cost: 14.48 + Plan Rows: 6 + Plan Width: 240 + Actual Startup Time: 499.511 + Actual Total Time: 499.511 + Actual Rows: 0 + Actual Loops: 1 + Index Cond: "(search_document @@ '''daniel'':* & ''heroku.c'':*'::tsquery)"+ Shared Hit Blocks: 269246 + Shared Read Blocks: 0 + Shared Written Blocks: 0 + Local Hit Blocks: 0 + Local Read Blocks: 0 + Local Written Blocks: 0 + Temp Read Blocks: 0 + Temp Written Blocks: 0 + Triggers: + Total Runtime: 499.571 What's notable here is that the shared hit blocks is about 2GB worth of data. The index per \di+ is only about 350MB, and the table per \dt+ is only ~400MB, so I'm not sure precisely what the deal there is, but regardless, this GiST index might be no better for I/O than a sequential scan. It is probably worse. If one adds a GIN index and does a fresh analyze, the planner still produce a plan for the GiST index. Because there is no way to disable particular indexes in a session, it's impossible to quickly experiment with a new hypothetical situation with only the GIN index without possibly painting yourself into a corner where you've dropped a needed index. We forked the database and did our experiment there, removing the GiST index as to remove its access pattern. Here's the new EXPLAIN, which, to get to the punch, only scans 4 pages, or 32K, and is very fast as a result. QUERY PLAN ----------------------------------------------------------------------------------- - Plan: + Node Type: "Limit" + Startup Cost: 13.98 + Total Cost: 105.60 + Plan Rows: 50 + Plan Width: 240 + Actual Startup Time: 0.092 + Actual Total Time: 0.303 + Actual Rows: 50 + Actual Loops: 1 + Shared Hit Blocks: 51 + Shared Read Blocks: 0 + Shared Written Blocks: 0 + Local Hit Blocks: 0 + Local Read Blocks: 0 + Local Written Blocks: 0 + Temp Read Blocks: 0 + Temp Written Blocks: 0 + Plans: + - Node Type: "Bitmap Heap Scan" + Parent Relationship: "Outer" + Relation Name: "resources" + Alias: "resources" + Startup Cost: 13.98 + Total Cost: 2352.21 + Plan Rows: 1276 + Plan Width: 240 + Actual Startup Time: 0.089 + Actual Total Time: 0.238 + Actual Rows: 50 + Actual Loops: 1 + Recheck Cond: "(search_document @@ '''dan...@heroku.com'':*'::tsquery)" + Shared Hit Blocks: 51 + Shared Read Blocks: 0 + Shared Written Blocks: 0 + Local Hit Blocks: 0 + Local Read Blocks: 0 + Local Written Blocks: 0 + Temp Read Blocks: 0 + Temp Written Blocks: 0 + Plans: + - Node Type: "Bitmap Index Scan" + Parent Relationship: "Outer" + Index Name: "experiment_ginnish" + Startup Cost: 0.00 + Total Cost: 13.91 + Plan Rows: 1276 + Plan Width: 0 + Actual Startup Time: 0.071 + Actual Total Time: 0.071 + Actual Rows: 75 + Actual Loops: 1 + Index Cond: "(search_document @@ '''dan...@heroku.com'':*'::tsquery)"+ Shared Hit Blocks: 4 + Shared Read Blocks: 0 + Shared Written Blocks: 0 + Local Hit Blocks: 0 + Local Read Blocks: 0 + Local Written Blocks: 0 + Temp Read Blocks: 0 + Temp Written Blocks: 0 + Triggers: + Total Runtime: 0.395 (1 row) So this is an interesting result. Ideally the planner could ask the FTS functions to cost the query against the index before choosing it, but more pressingly it shows that anyone who wants incremental search is in for *much* worse performance than the "roughly 3x" performance difference advertised as a rule of thumb in the docs, and without knowing how to dissect a fairly advanced form of EXPLAIN they'd be very hard pressed to find out, because the index appears to be in use (and in fact, it is, it's just terrible). -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs