David Rowley <dgrowle...@gmail.com> 於 2024年7月5日週五 上午10:15寫道:
> On Fri, 5 Jul 2024 at 12:50, James Pang <jamespang...@gmail.com> wrote: > > we have a daily vacuumdb and analyze job, generally speaking it's > done in seconds, sometimes it suddenly running more than tens of minutes > with same bind variable values and huge temp space got used and at that > time, explain show "Hash Anti join, Hash Right join" with seq scan two > tables. > > There was talk about adding costing for run-time partition pruning > factors but nothing was ever agreed, so nothing was done. It's just > not that obvious to me how we'd do that. If the Append had 10 > partitions as subnodes, with an equality join condition, you could > assume we'll only match to 1 of those 10, but we've no idea at plan > time which one that'll be and the partitions might drastically vary in > size. The best I think we could do is take the total cost of those 10 > and divide by 10 to get the average cost. It's much harder for range > conditions as those could match anything from 0 to all partitions. The > best suggestion I saw for that was to multiply the costs by > DEFAULT_INEQ_SEL. > > I think for now, you might want to lower the random_page_cost or > increase effective_cache_size to encourage the nested loop -> index > scan plan. Good ranges for effective_cache_size is anywhere between 50 > - 75% of your servers's RAM. However, that might not be ideal if your > server is under memory pressure from other running processes. It also > depends on how large shared_buffers are as a percentage of total RAM. > > David > We already random_page_cost=1.1 and effective_cache_size=75% physical memory in this database server. For this SQL, -> Nested Loop Anti Join (cost=40.32..132168227.57 rows=224338 width=78) Join Filter: (lower((p.ctinfo)::text) = lower((w.ctinfo)::text)) -> Nested Loop Left Join (cost=39.63..398917.29 rows=299118 width=78) -> Append (cost=0.56..22.36 rows=8 width=54) -> Index Scan using wmdata_p0_llid_hhid_stime_idx on wmdata_p0 m_1 (cost=0.5 6..2.79 rows=1 width=54) .... -> Append (cost=39.07..49312.09 rows=54978 width=78) -> Bitmap Heap Scan on wmvtee_p0 w.1 (cost=39.07..1491.06 rows=1669 width=78) Recheck Cond: ((m.partitionkeyid)::text = (partitionkeyid)::text) -> Bitmap Index Scan on wmvtee_p0_partitionkeyid_intid_idx (cost=0.00..38.65 rows=1669 width=0) Index Cond: ((partitionkeyid)::text = (m.partitionkeyid)::text) ... -> Append (cost=0.69..516.96 rows=4010 width=78) -> Index Only Scan using wmpct_p0_partitionkeyid_ctinfo_idx on wmpct_p0 p_1 (cost=0. 69..15.78 rows=124 width=78) ... for nest loop path, since the first one estimated only "8" rows , and they use partitionkeyid as joinkey and all are hash partitions , is it better to estimate cost to 8 (loop times) * 1600 = 12800 (each one loop map to only 1 hash partition bitmap scan ,avg one partition cost), that's much less than 398917.29 of all partitions ? for secondary Nest Loop Anti join could be rows 299118 rows * 15.78(avg index scan cost of one partition) = 4,720,082 that still much less than 132168227.57 ? for Hash Right join, is it possible to estimate by 8 seq partition scan instead of all 32 hash partitions since the first query estimated 8 rows only ? extend statistics may help estimate count(partitionkeyid) based on other columns bind variables, but looks like that did not help table join case. Thanks, James