> On Aug 13, 2025, at 17:16, Chao Li <li.evan.c...@gmail.com> wrote:
> 
> I downloaded the patch and tested all join types: inner, left, right, full, 
> semi and anti. Basically my tests all passed. However, I didn't test any case 
> of parallel query.
> 
> I have two nit comments:
> 
> 1. In hashjoin.h, line 76-78, the added comment says "(In the unlikely but 
> supported case of a non-strict join operator, we treat null keys as normal 
> data.)". But I don't see where non-strict join is handled. So, how this patch 
> impact non-strict joins?
> 

I take back this comment, and I get a new comment related.

@@ -1015,11 +1144,19 @@ ExecHashJoinOuterGetTuple(PlanState *outerNode,

                        if (!isnull)
                        {
+                               /* normal case with a non-null join key */
                                /* remember outer relation is not empty for 
possible rescan */
                                hjstate->hj_OuterNotEmpty = true;

                                return slot;
                        }
+                       else if (hjstate->hj_KeepNullTuples)
+                       {
+                               /* null join key, but we must save tuple to be 
emitted later */
+                               if (hjstate->hj_NullOuterTupleStore == NULL)
+                                       hjstate->hj_NullOuterTupleStore = 
ExecHashBuildNullTupleStore(hashtable);
+                               
tuplestore_puttupleslot(hjstate->hj_NullOuterTupleStore, slot);
+                       }

When an outer tuple contains null join key, without this patch, “isnull” flag 
is false, so the tuple will still be returned, and for an outer join, the tuple 
slot will be returned to parent node immediately.

With this patch, “isnull” now becomes true because of the change of strict op. 
Then the outer null join key tuple must be stored in a tuplestore. When an 
outer table contains a lot of null join key tuples, then the tuplestore could 
bump to very large, in that case, it would be hard to say this patch really 
benefits.

I am think that, can we only do the first half of this patch? Only putting 
inner table’s null join key tuple into a tuplestore. So that inner hash table’s 
performance gets improved, and outer table’s logic keeps the same, then overall 
this patch makes a pure improvement without the potential memory burden from 
outerNullTupleStore.

—————————

I also got an idea for improving the hash logic.

                                /*
                                 * If the outer relation is completely empty, 
and it's not
                                 * right/right-anti/full join, we can quit 
without building
                                 * the hash table.  However, for an inner join 
it is only a
                                 * win to check this when the outer relation's 
startup cost is
                                 * less than the projected cost of building the 
hash table.
                                 * Otherwise it's best to build the hash table 
first and see
                                 * if the inner relation is empty.  (When it's 
a left join, we
                                 * should always make this check, since we 
aren't going to be
                                 * able to skip the join on the strength of an 
empty inner
                                 * relation anyway.)
                                 */
                                if (HJ_FILL_INNER(node))
                                {
                                        /* no chance to not build the hash 
table */
                                        node->hj_FirstOuterTupleSlot = NULL;
                                }

Based on this patch, if we are doing a left join, and outer table is empty, 
then all tuples from the inner table should be returned. In that case, we can 
skip building a hash table, instead, we can put all inner table tuples into 
hashtable.innerNullTupleStore. Building a tuplestore should be cheaper than 
building a hash table, so this way makes a little bit more performance 
improvement.

Regards,

Chao Li (Evan)
--------------------
HighGo Software Co., Ltd.
https://www.highgo.com/



Reply via email to