Just to follow up on this... Tried increasing stats targets last week + re-analyzing but the query was just as bad. Ended up increasing the prepareThreshold to prevent server-side prepares for now (and thus later generic statements). This 'fixed' the issue and had no noticeable negative effect for our workloads.
I still don't understand why the plan being off makes the query so much slower in this case (the plans I shared in the last email don't look too different, I don't understand how the filter can add on 2mins of execution time to an index-only scan). If anyone does have thoughts on what could be happening I would be very interested to hear, but the main performance problem is effectively solved. Thanks all for the valuable help getting to the bottom of what was happening. On Tue, 5 May 2020 at 22:42, Tom Lane <t...@sss.pgh.pa.us> wrote: > James Thompson <ja...@jthompson.dev> writes: > > The slowness occurs when the prepared statement changes to a generic > plan. > > > Initial plan: > > -> Index Only Scan using table1_typea_include_uniqueid_col16_idx on > table1 > > table1alias2 (cost=0.56..2549.70 rows=70 width=36) (actual > > time=1.901..45.256 rows=65000 loops=1) > > Output: table1alias2.uniqueid > > Index Cond: ((table1alias2.col20 = '12345'::bigint) AND > (table1alias2. > > col8 = ANY ('{c5986b02-3a02-4639-8147-f286972413ba,... > > 98ed24b1-76f5-4b0e-bb94-86cf13a4809c}'::text[]))) > > Heap Fetches: 10 > > Buffers: shared hit=5048 > > > after 5 executions of the statement: > > -> Index Only Scan using table1_typea_include_uniqueid_col16_idx on > table1 > > table1alias2 (cost=0.56..17.23 rows=1 width=36) (actual > > time=125.344..126877.822 rows=65000 loops=1) > > Output: table1alias2.uniqueid > > Index Cond: (table1alias2.col20 = $1001) > > Filter: ((table1alias2.col8)::text = ANY ((ARRAY[$1, ..., > > $1000])::text[])) > > Rows Removed by Filter: 2670023 > > Heap Fetches: 428 > > Buffers: shared hit=45933 read=42060 dirtied=4 > > Yeah, this is a dynamic we've seen before. The rowcount estimate, and > hence the cost estimate, for the plan with explicit parameter values is > way off; but the estimate for the generic plan is even more way off, > causing the system to falsely decide that the latter is cheaper. > > I've speculated about refusing to believe generic cost estimates if they > are > more than epsilon less than the concrete cost estimate, but it's not quite > clear how that should work or whether it'd have its own failure modes. > > The one thing that is totally clear is that these rowcount estimates are > crappy. Can you improve them by increasing the stats target for that > table? Maybe with less-garbage-y inputs, the system would make the right > plan choice here. > > regards, tom lane >