Re: [BUG?] estimate_hash_bucket_stats uses wrong ndistinct for avgfreq

Joel Jacobson Wed, 04 Mar 2026 22:18:23 -0800

On Wed, Mar 4, 2026, at 21:50, Tom Lane wrote:
> "Joel Jacobson" <[email protected]> writes:
>> On Tue, Mar 3, 2026, at 16:31, Tom Lane wrote:
>>> This reminds me of the unfinished business at [1].  We really ought
>>> to make it true that nulls never get into the hash table before
>>> we assume that's so in costing.
>
>> Hmm, OK, so there are cases when we don't discard NULLs when we should
>> be able to? I was reading these lines in nodeHash.c and thought we would
>> always be discarding them when possible:
>
>>              if (!isnull)
>>              {
>> ...
>>              }
>>              else if (node->keep_null_tuples)
>>              {
>>                      /* null join key, but we must save tuple to be emitted 
>> later */
>> ...
>>              }
>>              /* else we can discard the tuple immediately */
>
> I'm confused ... that keep_null_tuples bit appears nowhere in HEAD,
> but it does appear in the patch at [1].


Oh, sorry, I was looking at nodeHash.c with [1] applied.
I recalled seeing some `if (!isnull)` code, must have been this code:

                                if (!isnull)
                                        ExecParallelHashTableInsert(hashtable, 
slot, hashvalue);

> Anyway, the short answer is that we discard NULLs if possible, but
> it's not possible when doing an outer join that requires returning
> null-extended rows from the hashed side.

Thanks for explaining.

> I've now pushed the patch we were discussing before, and all that's
> left to worry about (AFAIK) in estimate_hash_bucket_stats is its
> handling of null join keys.

Nice!

> I'd prefer to get the other patch
> in before worrying more about that.

Makes sense.

>
>                       regards, tom lane
>
> [1] 
> https://www.postgresql.org/message-id/flat/3061845.1746486714%40sss.pgh.pa.us

/Joel

Re: [BUG?] estimate_hash_bucket_stats uses wrong ndistinct for avgfreq

Reply via email to