Re: [PERFORM] Planner selects slow "Bitmap Heap Scan" when "Index Scan" is faster
On Thu, Apr 5, 2012 at 17:34, Kevin Grittner wrote: > Kim Hansen wrote: > >> I have a query where the planner makes a wrong cost estimate, it >> looks like it underestimates the cost of a "Bitmap Heap Scan" >> compared to an "Index Scan". > >> What can I do to fix the cost estimate? > > Could you try running the query with cpu_tuple_cost = 0.05 and let > us know how that goes? > It looks like it just increased the estimated cost of both queries by about 1000. Regards, Kim === yield=> explain analyze select "filtered_demands"."pol" as "c0" from "demands"."filtered_demands" as "filtered_demands" where ("filtered_demands"."pod" = 'VELAG') group by "filtered_demands"."pol" order by "filtered_demands"."pol" ASC NULLS LAST; QUERY PLAN --- Sort (cost=39540.92..39540.92 rows=2 width=6) (actual time=186.833..186.858 rows=221 loops=1) Sort Key: pol Sort Method: quicksort Memory: 35kB -> HashAggregate (cost=39540.81..39540.91 rows=2 width=6) (actual time=186.643..186.678 rows=221 loops=1) -> Bitmap Heap Scan on filtered_demands (cost=566.23..39479.81 rows=24401 width=6) (actual time=6.154..180.654 rows=18588 loops=1) Recheck Cond: (pod = 'VELAG'::text) -> Bitmap Index Scan on filtered_demands_pod_pol_idx (cost=0.00..560.12 rows=24401 width=0) (actual time=4.699..4.699 rows=18588 loops=1) Index Cond: (pod = 'VELAG'::text) Total runtime: 186.912 ms (9 rows) yield=> set enable_bitmapscan = false; SET yield=> explain analyze select "filtered_demands"."pol" as "c0" from "demands"."filtered_demands" as "filtered_demands" where ("filtered_demands"."pod" = 'VELAG') group by "filtered_demands"."pol" order by "filtered_demands"."pol" ASC NULLS LAST; QUERY PLAN -- Group (cost=0.00..77510.37 rows=2 width=6) (actual time=0.029..20.361 rows=221 loops=1) -> Index Scan using filtered_demands_pod_pol_idx on filtered_demands (cost=0.00..77449.37 rows=24401 width=6) (actual time=0.027..16.859 rows=18588 loops=1) Index Cond: (pod = 'VELAG'::text) Total runtime: 20.410 ms (4 rows) yield=> -- Kim Rydhof Thor Hansen Vadgårdsvej 3, 2. tv. 2860 Søborg Phone: +45 3091 2437 -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] bad plan
On 04/05/2012 21:47, Ants Aasma wrote: On Thu, Apr 5, 2012 at 2:47 PM, Julien Cigar wrote: - http://www.pastie.org/3731956 : with default config - http://www.pastie.org/3731960 : this is with enable_seq_scan = off It looks like the join selectivity of (context_to_context_links, ancestors) is being overestimated by almost two orders of magnitude. The optimizer thinks that there are 564 rows in the context_to_context_links table for each taxon_id, while in fact for this query the number is 9. To confirm that this, you can force the selectivity estimate to be 200x lower by adding a geo_id = geod_id where clause to the subquery. adding a geo_id = geo_id to the subquery helped a little bit with a cpu_tuple_cost of 0.1: http://www.pastie.org/3738224 : without: Index Scan using ltlc_taxon_id_idxoncontext_to_context_links (cost=0.00..146.93 rows=341 width=8) (actual time=0.004..0.019 rows=9 loops=736) with geo_id = geo_id: Index Scan using ltlc_taxon_id_idxoncontext_to_context_links (cost=0.00..148.11 rows=2 width=8) (actual time=0.004..0.020 rows=9 loops=736) If it does help, then the next question would be why is the estimate so much off. It could be either because the stats for context_to_context_links.taxon_id are wrong or because ancestors.taxon_id(subphylum_id = 18830) is a special case. To help figuring this is out, you could run the following to queries and post the results: SELECT floor(log(num,2)) AS nmatch, COUNT(*) AS freq FROM (SELECT COUNT(*) AS num FROM context_to_context_links GROUP BY taxon_id) AS dist GROUP BY 1 ORDER BY 1; SELECT floor(log(num,2)) AS nmatch, COUNT(*) AS freq FROM (SELECT COUNT(*) AS num FROM context_to_context_links WHERE NOT geo_id IS NULL and taxon_id= ANY ( select taxon_id from rab.ancestors where ancestors.subphylum_id = 18830) GROUP BY taxon_id) AS dist GROUP BY 1 ORDER BY 1; I'm sorry but I get an "ERROR: division by zero" for both of your queries.. If the second distribution has a significantly different shape then cross column statistics are necessary to get good plans. As it happens I'm working on adding this functionality to PostgreSQL and would love to hear more details about your use-case to understand if it would be solved by this work. Thank you for your help, Julien Regards, Ants Aasma -- No trees were killed in the creation of this message. However, many electrons were terribly inconvenienced. <> -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Planner selects slow "Bitmap Heap Scan" when "Index Scan" is faster
On Wed, Apr 4, 2012 at 6:47 AM, Kim Hansen wrote: > Hi All > > I have a query where the planner makes a wrong cost estimate, it looks > like it underestimates the cost of a "Bitmap Heap Scan" compared to an > "Index Scan". > > This it the two plans, I have also pasted them below: > Slow (189ms): http://explain.depesz.com/s/2Wq > Fast (21ms): http://explain.depesz.com/s/ThQ Could you do explain (analyze, buffers)? Did you run these queries multiple times in both orders? If you just ran them once each, in the order indicated, then the bitmap scan may have done the hard work of reading all the needed buffers into cache, and the index scan then got to enjoy that cache. Cheers, Jeff -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance