Robert Haas <robertmh...@gmail.com> writes: > As far as I can tell, the focus on trying to estimate the number of > tuples per bucket is entirely misguided. Supposing the relation is > mostly unique so that the values don't cluster too much, the right > answer is (of course) NTUP_PER_BUCKET.
But the entire point of that code is to arrive at a sane estimate when the inner relation *isn't* mostly unique and *does* cluster. So I think you're being much too hasty to conclude that it's wrong. > Because the extra tuples that get thrown into the bucket > generally don't have the same hash value (or if they did, they would > have been in the bucket either way...) and get rejected with a simple > integer comparison, which is much cheaper than > hash_qual_cost.per_tuple. Yeah, we are charging more than we ought to for bucket entries that can be rejected on the basis of hashcode comparisons. The difficulty is to arrive at a reasonable guess of what fraction of the bucket entries will be so rejected, versus those that will incur a comparison-function call. I'm leery of assuming there are no hash collisions, which is what you seem to be proposing. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers