Re: [HACKERS] Getting different number of results when using hashjoin on/off

Mario Weilguni Mon, 28 Nov 2005 09:41:43 -0800

Hello Tom,

Thanks for the quick response, I've tried the patch, but it did not work
as expected. When I set enable_hashjoin to off, everything works as
expected, but with hashjoin on I do not even get results anymore, CPU is
going up to 100% and after 3 minutes I cancelled the query (it normale
would take ~100-500 milliseconds).


I will check the patch on a different machine again and inform you of
the results.

Best regards,
        Mario Weilguni


-----Original Message-----
From: Tom Lane [mailto:[EMAIL PROTECTED] 
Sent: Monday, November 28, 2005 6:09 PM
To: Mario Weilguni
Cc: pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] Getting different number of results when using
hashjoin on/off 

"Mario Weilguni" <[EMAIL PROTECTED]> writes:
> No, I'm using 8.1.0, and tried it on different machines, always the
same results.

I see it, I think: the recent changes to avoid work when one or the
other side of the hash join is empty would exit the hash join leaving
a state that confused ExecReScanHashJoin() into thinking it didn't
have to do anything.  Try the attached patch.

                        regards, tom lane


Index: src/backend/executor/nodeHashjoin.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/executor/nodeHashjoin.c,v
retrieving revision 1.75.2.1
diff -c -r1.75.2.1 nodeHashjoin.c
*** src/backend/executor/nodeHashjoin.c 22 Nov 2005 18:23:09 -0000
1.75.2.1
--- src/backend/executor/nodeHashjoin.c 28 Nov 2005 17:04:43 -0000
***************
*** 152,163 ****
                 * outer join, we can quit without scanning the outer
relation.
                 */
                if (hashtable->totalTuples == 0 && node->js.jointype !=
JOIN_LEFT)
-               {
-                       ExecHashTableDestroy(hashtable);
-                       node->hj_HashTable = NULL;
-                       node->hj_FirstOuterTupleSlot = NULL;
                        return NULL;
-               }
  
                /*
                 * need to remember whether nbatch has increased since
we began
--- 152,158 ----
***************
*** 487,493 ****
        {
                ExecHashTableDestroy(node->hj_HashTable);
                node->hj_HashTable = NULL;
-               node->hj_FirstOuterTupleSlot = NULL;
        }
  
        /*
--- 482,487 ----
***************
*** 805,841 ****
  ExecReScanHashJoin(HashJoinState *node, ExprContext *exprCtxt)
  {
        /*
-        * If we haven't yet built the hash table then we can just
return; nothing
-        * done yet, so nothing to undo.
-        */
-       if (node->hj_HashTable == NULL)
-               return;
- 
-       /*
         * In a multi-batch join, we currently have to do rescans the
hard way,
         * primarily because batch temp files may have already been
released. But
         * if it's a single-batch join, and there is no parameter change
for the
         * inner subnode, then we can just re-use the existing hash
table without
         * rebuilding it.
         */
!       if (node->hj_HashTable->nbatch == 1 &&
!               ((PlanState *) node)->righttree->chgParam == NULL)
!       {
!               /* okay to reuse the hash table; needn't rescan inner,
either */
!       }
!       else
        {
!               /* must destroy and rebuild hash table */
!               ExecHashTableDestroy(node->hj_HashTable);
!               node->hj_HashTable = NULL;
!               node->hj_FirstOuterTupleSlot = NULL;
  
!               /*
!                * if chgParam of subnode is not null then plan will be
re-scanned by
!                * first ExecProcNode.
!                */
!               if (((PlanState *) node)->righttree->chgParam == NULL)
!                       ExecReScan(((PlanState *) node)->righttree,
exprCtxt);
        }
  
        /* Always reset intra-tuple state */
--- 799,830 ----
  ExecReScanHashJoin(HashJoinState *node, ExprContext *exprCtxt)
  {
        /*
         * In a multi-batch join, we currently have to do rescans the
hard way,
         * primarily because batch temp files may have already been
released. But
         * if it's a single-batch join, and there is no parameter change
for the
         * inner subnode, then we can just re-use the existing hash
table without
         * rebuilding it.
         */
!       if (node->hj_HashTable != NULL)
        {
!               if (node->hj_HashTable->nbatch == 1 &&
!                       ((PlanState *) node)->righttree->chgParam ==
NULL)
!               {
!                       /* okay to reuse the hash table; needn't rescan
inner, either */
!               }
!               else
!               {
!                       /* must destroy and rebuild hash table */
!                       ExecHashTableDestroy(node->hj_HashTable);
!                       node->hj_HashTable = NULL;
  
!                       /*
!                        * if chgParam of subnode is not null then plan
will be re-scanned
!                        * by first ExecProcNode.
!                        */
!                       if (((PlanState *) node)->righttree->chgParam ==
NULL)
!                               ExecReScan(((PlanState *)
node)->righttree, exprCtxt);
!               }
        }
  
        /* Always reset intra-tuple state */
***************
*** 847,852 ****
--- 836,842 ----
        node->js.ps.ps_TupFromTlist = false;
        node->hj_NeedNewOuter = true;
        node->hj_MatchedOuter = false;
+       node->hj_FirstOuterTupleSlot = NULL;
  
        /*
         * if chgParam of subnode is not null then plan will be
re-scanned by

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] Getting different number of results when using hashjoin on/off

Reply via email to