On Sun, Aug 18, 2013 at 10:33 PM, Kevin Grittner <kgri...@ymail.com> wrote:
> Tyler Reese <juke...@gmail.com> wrote: > > Kevin Grittner <kgri...@ymail.com> wrote: > >> Tyler Reese <juke...@gmail.com> wrote: > > >>> mydb=> explain analyze SELECT * FROM "cdr" WHERE > lower("CallingPartyNumber") = '9725551212' order by "key" limit 100; > >>> > >>> Limit (cost=0.00..72882.05 rows=100 width=757) (actual > time=20481.083..30464.960 rows=11 loops=1) > >>> -> Index Scan using cdr_pkey on cdr (cost=0.00..30036152.32 > rows=41212 width=757) (actual time=20481.049..30464.686 rows=11 loops=1) > >>> Filter: (lower("CallingPartyNumber") = '9725551212'::text) > >>> Total runtime: 30465.246 ms > >> > >> It thinks that it will only need to read 1/412th of the table to > >> find 100 matching rows, and using that index it will be able to > >> skip the sort. Since there aren't 100 matching rows, it has to > >> read the whole table through the index. > > > So, since it thinks it needs to read 1/412th of the table is the > > reason why the query planner chooses to use the primary key index > > instead of the callingpartynumber index, like it does in the > > first 3 cases? > > The optimizer compares the estimated cost of reading all matching > rows (but only matching rows) out of order and then sorting them to > the estimated cost of reading them in order and filtering out the > rows that don't match (and stopping when the limit is reached). > Since it though a lot of rows would match, that made the sort look > more expensive and also like it would not reed to read a very large > percentage of the table. > > > I'm curious as to why it says "rows=41212". Is that the estimate > > of the number of rows that meet the filter condition? Where does > > that come from? > > That's based on the distribution observed in the last random sample > when ANALYZE was run (as a command or by autovacuum). When there > is an error in the estimate that bad, either autovacuum is not > configured to be aggressive enough in analyzing or the stample size > was not large enough. > > >> Raising the statistics target and running ANALYZE might allow > >> it to use a more accurate estimate, and thereby make a better > >> choice. > > > > I haven't heard of raising the statistics target, so I'll read up > > on that. > > > http://www.postgresql.org/docs/9.2/interactive/runtime-config-query.html#RUNTIME-CONFIG-QUERY-OTHER > > http://www.postgresql.org/docs/9.2/interactive/sql-altertable.html > > > A few days ago, all 4 cases were responding equally fast. I had > > been messing around with the postgres settings, and I went and > > dropped all of the indexes and recreated them just to see what > > would happen. I wouldn't think that recreating the indexes would > > cause case 4 to go slow, but that's the symptom I am seeing now. > > Should I be running analyze on a table after it has been > > reindexed? > > Only if you have indexes on expressions rather than simple column > names. If you have an index on lower("CallingPartyNumber") that > would qualify as an expression. > > The normal reason to need to get fresh statistics is because of > changes in the distribution of values in a column, particularly > after a bulk load. Also, columns with a large number of distinct > values tend to benefit from a higher statistics target. The down > sides of higher statistics targets are a longer time to ANALYZE and > increased planning time; so it's generally best to use the default > except where a particular problem has been observed, like in this > case. > I performed ANALYZE on the table and now case 4 is fast again: mydb=> explain analyze SELECT * FROM "cdr" WHERE lower("CallingPartyNumber") = '9725551212' order by "key" limit 100; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------ Limit (cost=958.72..958.97 rows=100 width=758) (actual time=0.251..0.310 rows=11 loops=1) -> Sort (cost=958.72..959.32 rows=240 width=758) (actual time=0.244..0.266 rows=11 loops=1) Sort Key: key Sort Method: quicksort Memory: 30kB -> Bitmap Heap Scan on cdr (cost=7.30..949.55 rows=240 width=758) (actual time=0.105..0.162 rows=11 loops=1) Recheck Cond: (lower("CallingPartyNumber") = '9725551212'::text) -> Bitmap Index Scan on callingpartynumber_lower (cost=0.00..7.24 rows=240 width=0) (actual time=0.085..0.085 rows=11 loops=1) Index Cond: (lower("CallingPartyNumber") = '9725551212'::text) Total runtime: 0.517 ms (9 rows) Thanks for the help, Kevin.