Re: Wrong results from Parallel Hash Full Join

Melanie Plageman Wed, 12 Apr 2023 17:31:59 -0700

On Wed, Apr 12, 2023 at 6:50 PM Thomas Munro <[email protected]> wrote:
>
> On Thu, Apr 13, 2023 at 9:48 AM Melanie Plageman
> <[email protected]> wrote:
> > Attached patch includes the fix for ExecParallelHashTableInsert() as
> > well as a test. I toyed with adapting one of the existing parallel full
> > hash join tests to cover this case, however, I think Richard's repro is
> > much more clear. Maybe it is worth throwing in a few updates to the
> > tables in the existing queries to provide coverage for the other
> > HeapTupleHeaderClearMatch() calls in the code, though.
>
> Oof.  Analysis and code LGTM.
>
> I thought about the way non-parallel HJ also clears the match bits
> when re-using the hash table for rescans.  PHJ doesn't keep hash
> tables across rescans.  (There's no fundamental reason why it
> couldn't, but there was some complication and it seemed absurd to have
> NestLoop over Gather over PHJ, forking a new set of workers for every
> tuple, so I didn't implement that in the original PHJ.)  But... there
> is something a little odd about the code in
> ExecHashTableResetMatchFlags(), or the fact that we appear to be
> calling it: it's using the unshared union member unconditionally,
> which wouldn't actually work for PHJ (there should be a variant of
> that function with Parallel in its name if we ever want that to work).
> That's not a bug AFAICT, as in fact we don't actually call it--it
> should be unreachable because the hash table should be gone when we
> rescan--but it's confusing.  I'm wondering if we should put in
> something explicit about that, maybe a comment and an assertion in
> ExecReScanHashJoin().


An assert about it not being a parallel hash join? I support this.

> +-- Ensure that hash join tuple match bits have been cleared before putting 
> them
> +-- into the hashtable.
>
> Could you mention that the match flags steals a bit from the HOT flag,
> ie *why* we're testing a join after an update?

v2 attached has some wordsmithing along these lines.

> And if we're going to
> exercise/test that case, should we do the non-parallel version too?

I've added this. I thought if we were adding the serial case, we might
as well add the multi-batch case as well. However, that proved a bit
more challenging. We can get a HOT tuple in one of the existing tables
with no issues. Doing this and then deleting the reset match bit code
doesn't cause any of the tests to fail, however, because we use this
expression as the join condition when we want to emit NULL-extended
unmatched tuples.

select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);

I don't think we want to add yet another time-consuming test to this
test file. So, I was trying to decide if it was worth changing these
existing tests so that they would fail when the match bit wasn't reset.
I'm not sure.

> For the commit message, I think it's a good idea to use something like
> "Fix ..." for the headline of bug fix commits to make that clearer,
> and to add something like "oversight in commit XYZ" in the body, just
> to help people connect the dots.  (Yeah, I know I failed to reference
> the delinquent commit in the recent assertion-removal commit, my bad.)

I've made these edits and tried to improve the commit message clarity in
general.

>  I think "Discussion:" footers are supposed to use
> https://postgr.es/m/XXX shortened URLs.

Hmm. Is the problem with mine that I included "flat"? Because I did use
postgr.es/m format. The message id is unfortunately long, but I believe
that is on google and not me.

- Melanie

From ee24229a1eeca67dfd91e162502efe2abfead870 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 12 Apr 2023 17:16:28 -0400
Subject: [PATCH v2] Fix PHJ tuple match bit management

Hash join tuples reuse the HOT status bit to indicate match status
during hash join execution. Correct reuse requires clearing the bit in
all tuples. Serial hash join and parallel multi-batch hash join do so
upon inserting the tuple into the hashtable. Single batch parallel hash
join and batch 0 of unexpected multi-batch hash joins forgot to do this.

It hadn't come up before because hashtable tuple match bits are only
used for right and full outer joins and parallel ROJ and FOJ were
unsupported. 11c2d6fdf5 introduced support for parallel ROJ/FOJ but
neglected to ensure the match bits were reset.

Reported-by: Richard Guo

Discussion: https://postgr.es/m/flat/CAMbWs48Nde1Mv%3DBJv6_vXmRKHMuHZm2Q_g4F6Z3_pn%2B3EV6BGQ%40mail.gmail.com
---
 src/backend/executor/nodeHash.c         |  1 +
 src/test/regress/expected/join_hash.out | 37 +++++++++++++++++++++++++
 src/test/regress/sql/join_hash.sql      | 28 ++++++++++++++++++-
 3 files changed, 65 insertions(+), 1 deletion(-)

diff --git a/src/backend/executor/nodeHash.c b/src/backend/executor/nodeHash.c
index a45bd3a315..54c06c5eb3 100644
--- a/src/backend/executor/nodeHash.c
+++ b/src/backend/executor/nodeHash.c
@@ -1724,6 +1724,7 @@ retry:
 		/* Store the hash value in the HashJoinTuple header. */
 		hashTuple->hashvalue = hashvalue;
 		memcpy(HJTUPLE_MINTUPLE(hashTuple), tuple, tuple->t_len);
+		HeapTupleHeaderClearMatch(HJTUPLE_MINTUPLE(hashTuple));
 
 		/* Push it onto the front of the bucket's list */
 		ExecParallelHashPushTuple(&hashtable->buckets.shared[bucketno],
diff --git a/src/test/regress/expected/join_hash.out b/src/test/regress/expected/join_hash.out
index 09376514bb..e892e7cccb 100644
--- a/src/test/regress/expected/join_hash.out
+++ b/src/test/regress/expected/join_hash.out
@@ -955,6 +955,43 @@ $$);
 (1 row)
 
 rollback to settings;
+-- Hash join reuses the HOT status bit to indicate match status. This can only
+-- be guaranteed to produce correct results if all the hash join tuple match
+-- bits are reset before reuse. This is done upon loading them into the
+-- hashtable.
+SAVEPOINT settings;
+SET enable_parallel_hash = on;
+SET min_parallel_table_scan_size = 0;
+SET parallel_setup_cost = 0;
+SET parallel_tuple_cost = 0;
+CREATE TABLE hjtest_matchbits_t1(id int);
+CREATE TABLE hjtest_matchbits_t2(id int);
+INSERT INTO hjtest_matchbits_t1 VALUES (1);
+INSERT INTO hjtest_matchbits_t2 VALUES (2);
+-- Update should create a HOT tuple. If this status bit isn't cleared, we won't
+-- correctly emit the NULL-extended unmatching tuple in full hash join.
+UPDATE hjtest_matchbits_t2 set id = 2;
+SELECT * FROM hjtest_matchbits_t1 t1 FULL JOIN hjtest_matchbits_t2 t2 ON t1.id = t2.id;
+ id | id 
+----+----
+  1 |   
+    |  2
+(2 rows)
+
+-- Test serial full hash join.
+-- Resetting parallel_setup_cost should force a serial plan.
+-- Just to be safe, however, set enable_parallel_hash to off, as parallel full
+-- hash joins are only supported with shared hashtables.
+RESET parallel_setup_cost;
+SET enable_parallel_hash = off;
+SELECT * FROM hjtest_matchbits_t1 t1 FULL JOIN hjtest_matchbits_t2 t2 ON t1.id = t2.id;
+ id | id 
+----+----
+  1 |   
+    |  2
+(2 rows)
+
+ROLLBACK TO settings;
 rollback;
 -- Verify that hash key expressions reference the correct
 -- nodes. Hashjoin's hashkeys need to reference its outer plan, Hash's
diff --git a/src/test/regress/sql/join_hash.sql b/src/test/regress/sql/join_hash.sql
index 179e94941c..06bab7a4c7 100644
--- a/src/test/regress/sql/join_hash.sql
+++ b/src/test/regress/sql/join_hash.sql
@@ -506,8 +506,34 @@ $$
 $$);
 rollback to settings;
 
-rollback;
 
+-- Hash join reuses the HOT status bit to indicate match status. This can only
+-- be guaranteed to produce correct results if all the hash join tuple match
+-- bits are reset before reuse. This is done upon loading them into the
+-- hashtable.
+SAVEPOINT settings;
+SET enable_parallel_hash = on;
+SET min_parallel_table_scan_size = 0;
+SET parallel_setup_cost = 0;
+SET parallel_tuple_cost = 0;
+CREATE TABLE hjtest_matchbits_t1(id int);
+CREATE TABLE hjtest_matchbits_t2(id int);
+INSERT INTO hjtest_matchbits_t1 VALUES (1);
+INSERT INTO hjtest_matchbits_t2 VALUES (2);
+-- Update should create a HOT tuple. If this status bit isn't cleared, we won't
+-- correctly emit the NULL-extended unmatching tuple in full hash join.
+UPDATE hjtest_matchbits_t2 set id = 2;
+SELECT * FROM hjtest_matchbits_t1 t1 FULL JOIN hjtest_matchbits_t2 t2 ON t1.id = t2.id;
+-- Test serial full hash join.
+-- Resetting parallel_setup_cost should force a serial plan.
+-- Just to be safe, however, set enable_parallel_hash to off, as parallel full
+-- hash joins are only supported with shared hashtables.
+RESET parallel_setup_cost;
+SET enable_parallel_hash = off;
+SELECT * FROM hjtest_matchbits_t1 t1 FULL JOIN hjtest_matchbits_t2 t2 ON t1.id = t2.id;
+ROLLBACK TO settings;
+
+rollback;
 
 -- Verify that hash key expressions reference the correct
 -- nodes. Hashjoin's hashkeys need to reference its outer plan, Hash's
-- 
2.37.2

Re: Wrong results from Parallel Hash Full Join

Reply via email to