Tomas Vondra <tomas.von...@2ndquadrant.com> writes: > So I'm not sure I understand what would be the risk with this ... Tom, > can you elaborate why you dislike the patch?
I've got a couple issues with the patch as presented. * As you said, it creates discontinuous behavior for stanullfrac = 1.0 versus stanullfrac = 1.0 - epsilon. That doesn't seem good. * It's not apparent why, if ANALYZE's sample is all nulls, we wouldn't conclude stadistinct = 0 and thus arrive at the desired answer that way. (Since we have a complaint, I'm guessing that ANALYZE might disbelieve its own result and stick in some larger stadistinct. But then maybe that's where to fix this, not here.) * We generally disbelieve edge-case estimates to begin with. The most obvious example is that we don't accept rowcount estimates that are zero. There are also some clamps that disbelieve selectivities approaching 0.0 or 1.0 when estimating from a histogram, and I think we have a couple other similar rules. The reason for this is mainly that taking such estimates at face value creates too much risk of severe relative error due to imprecise or out-of-date statistics. So a special case for stanullfrac = 1.0 seems to go directly against that mindset. I agree that there might be some gold to be mined in this area, as we haven't thought particularly hard about high-stanullfrac situations. One idea is to figure what stanullfrac says about the number of non-null rows, and clamp the get_variable_numdistinct result to be not more than that. But I still would not want to trust an exact zero result. regards, tom lane