Re: Improve hash join's handling of tuples with null join keys

2025-08-24 Thread Tom Lane
Marc-Olaf Jaschke writes: > I can confirm that the patch fixes the issue (Bug #19030). The memory usage > remains within the expected range of work_mem. > This also applies to parallel hash joins. > The query also runs significantly faster. > I also tested cases with multiple left joins. > I have

Re: Improve hash join's handling of tuples with null join keys

2025-08-24 Thread Marc-Olaf Jaschke
Tom Lane wrote: > Bug #19030 [1] seems to be a fresh report of the problem this patch > aims to solve. > [1] > https://www.postgresql.org/message-id/flat/19030-944dd78d7ef94c0f%40postgresql.org > I can confirm that the patch fixes the issue (Bug #19030). The memory usage remains within the

Re: Improve hash join's handling of tuples with null join keys

2025-08-23 Thread Tom Lane
Bug #19030 [1] seems to be a fresh report of the problem this patch aims to solve. While answering that, I realized that the v2 patch causes null-keyed inner rows to not be included in EXPLAIN ANALYZE's report of the number of rows output by the Hash node. Now on the one hand, what it's reporting

Re: Improve hash join's handling of tuples with null join keys

2025-08-18 Thread Chao Li
> On Aug 19, 2025, at 05:37, Tom Lane wrote: >> > > Yeah, we could make multi-batch PHJ do this differently from the other > cases, but I don't want to go there: too much complication and risk of > bugs for what is a purely hypothetical performance issue. Besides > which, if the join is large

Re: Improve hash join's handling of tuples with null join keys

2025-08-18 Thread Tom Lane
Chao Li writes: > My comment was trying to say that if there are a lot of null join key tuples > in outer table, then hj_NullOuterTupleStore might use a lot of memory or swap > data to disk, which might lead to performance burden. So, I was thinking we > could keep the original logic for outer

Re: Improve hash join's handling of tuples with null join keys

2025-08-17 Thread Chao Li
> On Aug 16, 2025, at 00:52, Tom Lane wrote: > > Chao Li writes: >> With this patch, “isnull” now becomes true because of the change of strict >> op. Then the outer null join key tuple must be stored in a tuplestore. When >> an outer table contains a lot of null join key tuples, then the tup

Re: Improve hash join's handling of tuples with null join keys

2025-08-15 Thread Tom Lane
Chao Li writes: > With this patch, “isnull” now becomes true because of the change of strict > op. Then the outer null join key tuple must be stored in a tuplestore. When > an outer table contains a lot of null join key tuples, then the tuplestore > could bump to very large, in that case, it wo

Re: Improve hash join's handling of tuples with null join keys

2025-08-13 Thread Chao Li
> On Aug 13, 2025, at 17:16, Chao Li wrote: > > I downloaded the patch and tested all join types: inner, left, right, full, > semi and anti. Basically my tests all passed. However, I didn't test any case > of parallel query. > > I have two nit comments: > > 1. In hashjoin.h, line 76-78, the

Re: Improve hash join's handling of tuples with null join keys

2025-08-13 Thread Chao Li
I downloaded the patch and tested all join types: inner, left, right, full, semi and anti. Basically my tests all passed. However, I didn't test any case of parallel query. I have two nit comments: 1. In hashjoin.h, line 76-78, the added comment says "(In the unlikely but supported case of a n

Re: Improve hash join's handling of tuples with null join keys

2025-06-02 Thread Tom Lane
is way easier to reproduce under FreeBSD than Linux --- scheduler quirk I guess.) v2 attached fixes that, and improves some comments. regards, tom lane From 84db9c67b9fc1b33d12d8249407eeaa578eb33c0 Mon Sep 17 00:00:00 2001 From: Tom Lane Date: Mon, 2 Jun 2025 12:30:15

Re: Improve hash join's handling of tuples with null join keys

2025-05-11 Thread Thomas Munro
On Tue, May 6, 2025 at 12:12 PM Tomas Vondra wrote: > On 5/6/25 01:11, Tom Lane wrote: > > The attached patch is a response to the discussion at [1], where > > it emerged that lots of rows with null join keys can send a hash > > join into too-many-batches hell, if they are on the outer side > > of

Re: Improve hash join's handling of tuples with null join keys

2025-05-05 Thread Tom Lane
Tomas Vondra writes: > My personal experience is that the growEnabled heuristics is overly > sensitive, and probably does not trigger very often. Yeah, it would be good to make it not quite all-or-nothing. > But more importantly, wasn't the issue discussed in [1] about parallel > hash joins? I'

Re: Improve hash join's handling of tuples with null join keys

2025-05-05 Thread Tomas Vondra
On 5/6/25 01:11, Tom Lane wrote: > The attached patch is a response to the discussion at [1], where > it emerged that lots of rows with null join keys can send a hash > join into too-many-batches hell, if they are on the outer side > of the join so that they must be null-extended not just discarded

Improve hash join's handling of tuples with null join keys

2025-05-05 Thread Tom Lane
ed. I've not done any real performance testing, though. regards, tom lane [1] https://www.postgresql.org/message-id/flat/18909-e5e1b702c9441b8a%40postgresql.org From cd1cf545583a328e756a07818669c1667fb51bb8 Mon Sep 17 00:00:00 2001 From: Tom Lane Date: Mon, 5 May 2025 16:02:46 -0400 Subject: