Re: Hashjoin startup strategy (was Re: [HACKERS] Getting different number of results when using hashjoin on/off)

Mario Weilguni Tue, 29 Nov 2005 01:05:51 -0800

Hello Tom,

I tried both patches on a different machine (but had to take the patches from 
cvs diff, cut'n paste from the mail-program did not work). Up until now, they 
work like a charm, correct results and fast. I will try on the other machine 
that failed yesterday in the afternoon, maybe it was just a problem with 
patching (or with the machine setup itself, it's a hardened gentoo, but I doubt 
that).


BTW: the difference of the second patch is really noticable, from 990ms down to 
226ms.

Thanks for your quick response! I wish commercial vendors would act that fast 
:-)

Best regards,
        Mario Weilguni



icomedias - Digitale Kommunikation
------------------------------------------------------------------------
Mario Weilguni, Forschung und Entwicklung
[EMAIL PROTECTED], http://www.icomedias.com/

icomedias Österreich Systemhaus GmbH:
  8020 Graz, Entenplatz 1 
  Tel: +43 (316) 721.671-272, Fax: -103  

icomedias Deutschland Systemhaus GmbH:
  10969 Berlin, Alexandrinenstraße 2-3
  Tel: +49 (30) 695.399-272, Fax: -103

-----Original Message-----
From: Tom Lane [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, November 29, 2005 12:50 AM
To: Mario Weilguni; [email protected]
Subject: Re: Hashjoin startup strategy (was Re: [HACKERS] Getting different 
number of results when using hashjoin on/off) 

I wrote:
> For a query like this, where the hash join is being done repeatedly,
> it might be useful for the executor itself to track how often each
> subplan has been seen to be empty.

I implemented a simple form of this, and it made 8.1 faster than 8.0
on the test case I was using.  Give it a try ...

                        regards, tom lane


Index: src/backend/executor/nodeHashjoin.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/executor/nodeHashjoin.c,v
retrieving revision 1.75.2.2
diff -c -r1.75.2.2 nodeHashjoin.c
*** src/backend/executor/nodeHashjoin.c 28 Nov 2005 17:14:47 -0000      1.75.2.2
--- src/backend/executor/nodeHashjoin.c 28 Nov 2005 23:41:28 -0000
***************
*** 120,135 ****
                 * since we aren't going to be able to skip the join on the 
strength
                 * of an empty inner relation anyway.)
                 *
                 * The only way to make the check is to try to fetch a tuple 
from the
                 * outer plan node.  If we succeed, we have to stash it away 
for later
                 * consumption by ExecHashJoinOuterGetTuple.
                 */
!               if (outerNode->plan->startup_cost < 
hashNode->ps.plan->total_cost ||
!                       node->js.jointype == JOIN_LEFT)
                {
                        node->hj_FirstOuterTupleSlot = ExecProcNode(outerNode);
                        if (TupIsNull(node->hj_FirstOuterTupleSlot))
                                return NULL;
                }
                else
                        node->hj_FirstOuterTupleSlot = NULL;
--- 120,147 ----
                 * since we aren't going to be able to skip the join on the 
strength
                 * of an empty inner relation anyway.)
                 *
+                * If we are rescanning the join, we make use of information 
gained
+                * on the previous scan: don't bother to try the prefetch if the
+                * previous scan found the outer relation nonempty.  This is not
+                * 100% reliable since with new parameters the outer relation 
might
+                * yield different results, but it's a good heuristic.
+                *
                 * The only way to make the check is to try to fetch a tuple 
from the
                 * outer plan node.  If we succeed, we have to stash it away 
for later
                 * consumption by ExecHashJoinOuterGetTuple.
                 */
!               if (node->js.jointype == JOIN_LEFT ||
!                       (outerNode->plan->startup_cost < 
hashNode->ps.plan->total_cost &&
!                        !node->hj_OuterNotEmpty))
                {
                        node->hj_FirstOuterTupleSlot = ExecProcNode(outerNode);
                        if (TupIsNull(node->hj_FirstOuterTupleSlot))
+                       {
+                               node->hj_OuterNotEmpty = false;
                                return NULL;
+                       }
+                       else
+                               node->hj_OuterNotEmpty = true;
                }
                else
                        node->hj_FirstOuterTupleSlot = NULL;
***************
*** 159,164 ****
--- 171,183 ----
                 * scanning the outer relation
                 */
                hashtable->nbatch_outstart = hashtable->nbatch;
+ 
+               /*
+                * Reset OuterNotEmpty for scan.  (It's OK if we fetched a tuple
+                * above, because ExecHashJoinOuterGetTuple will immediately
+                * set it again.)
+                */
+               node->hj_OuterNotEmpty = false;
        }
  
        /*
***************
*** 454,459 ****
--- 473,479 ----
        hjstate->js.ps.ps_TupFromTlist = false;
        hjstate->hj_NeedNewOuter = true;
        hjstate->hj_MatchedOuter = false;
+       hjstate->hj_OuterNotEmpty = false;
  
        return hjstate;
  }
***************
*** 546,551 ****
--- 566,574 ----
                        *hashvalue = ExecHashGetHashValue(hashtable, econtext,
                                                                                
          hjstate->hj_OuterHashKeys);
  
+                       /* remember outer relation is not empty for possible 
rescan */
+                       hjstate->hj_OuterNotEmpty = true;
+ 
                        return slot;
                }
  
***************
*** 810,816 ****
                if (node->hj_HashTable->nbatch == 1 &&
                        ((PlanState *) node)->righttree->chgParam == NULL)
                {
!                       /* okay to reuse the hash table; needn't rescan inner, 
either */
                }
                else
                {
--- 833,851 ----
                if (node->hj_HashTable->nbatch == 1 &&
                        ((PlanState *) node)->righttree->chgParam == NULL)
                {
!                       /*
!                        * okay to reuse the hash table; needn't rescan inner, 
either.
!                        *
!                        * What we do need to do is reset our state about the 
emptiness
!                        * of the outer relation, so that the new scan of the 
outer will
!                        * update it correctly if it turns out to be empty this 
time.
!                        * (There's no harm in clearing it now because 
ExecHashJoin won't
!                        * need the info.  In the other cases, where the hash 
table
!                        * doesn't exist or we are destroying it, we leave this 
state
!                        * alone because ExecHashJoin will need it the first 
time
!                        * through.)
!                        */
!                       node->hj_OuterNotEmpty = false;
                }
                else
                {
Index: src/include/nodes/execnodes.h
===================================================================
RCS file: /cvsroot/pgsql/src/include/nodes/execnodes.h,v
retrieving revision 1.139.2.2
diff -c -r1.139.2.2 execnodes.h
*** src/include/nodes/execnodes.h       22 Nov 2005 18:23:28 -0000      
1.139.2.2
--- src/include/nodes/execnodes.h       28 Nov 2005 23:41:28 -0000
***************
*** 1101,1106 ****
--- 1101,1107 ----
   *            hj_FirstOuterTupleSlot  first tuple retrieved from outer plan
   *            hj_NeedNewOuter                 true if need new outer tuple on 
next call
   *            hj_MatchedOuter                 true if found a join match for 
current outer
+  *            hj_OuterNotEmpty                true if outer relation known 
not empty
   * ----------------
   */
  
***************
*** 1125,1130 ****
--- 1126,1132 ----
        TupleTableSlot *hj_FirstOuterTupleSlot;
        bool            hj_NeedNewOuter;
        bool            hj_MatchedOuter;
+       bool            hj_OuterNotEmpty;
  } HashJoinState;
  
  

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

               http://archives.postgresql.org

Re: Hashjoin startup strategy (was Re: [HACKERS] Getting different number of results when using hashjoin on/off)

Reply via email to