Hi, I run TPC-DS benchmark for Postgres and find the join size estimation has several problems. For example, Ndistinct is key to join selectivity's estimation, this value does not take restrictions of the rel, I hit some cases in the function eqjoinsel, nd is much larger than vardata.rel->rows.
Accurate estimation need good math model that considering dependency of join var and vars in restriction. But at least, indistinct should not be greater than the number of rows. See the attached patch to adjust nd in eqjoinsel. Best, Zhenghua Lyu
0001-Adjust-ndistinct-with-nrows-in-the-rel-when-estimati.patch
Description: 0001-Adjust-ndistinct-with-nrows-in-the-rel-when-estimati.patch