Re: Adjusting hash join memory limit to handle batch explosion

2025-02-25 Thread James Hunter
On Tue, Feb 25, 2025 at 9:39 AM Tomas Vondra wrote: > > On 2/25/25 17:30, James Hunter wrote: > > On Wed, Feb 19, 2025 at 12:22 PM Tomas Vondra wrote: > > -- OK, but the customer *didn't* set their workmem to 32 MB. (If they > > had, we wouldn't need this patch -- but we *do* need this patch, wh

Re: Adjusting hash join memory limit to handle batch explosion

2025-02-25 Thread Tomas Vondra
On 2/25/25 17:30, James Hunter wrote: > On Wed, Feb 19, 2025 at 12:22 PM Tomas Vondra wrote: >> >> I've pushed the first (and main) part of the patch series, after some >> more cleanup and comment polishing. > > Two comments on your merged patch -- > > First, it's easier to see what's going on i

Re: Adjusting hash join memory limit to handle batch explosion

2025-02-25 Thread James Hunter
On Wed, Feb 19, 2025 at 12:22 PM Tomas Vondra wrote: > > I've pushed the first (and main) part of the patch series, after some > more cleanup and comment polishing. Two comments on your merged patch -- First, it's easier to see what's going on if we overlook the logic to round to nearest power o

Re: Adjusting hash join memory limit to handle batch explosion

2025-02-19 Thread Tomas Vondra
I've pushed the first (and main) part of the patch series, after some more cleanup and comment polishing. As explained in my previous message, I'm not sure about 0002. I don't know if we need to worry about it (no reports AFAICS). And while the patch works I'm not sure it's the best fix, or whethe

Re: Adjusting hash join memory limit to handle batch explosion

2025-02-07 Thread Melanie Plageman
On Thu, Feb 6, 2025 at 1:48 PM Tomas Vondra wrote: > > Hi, > > Here's a slightly simplified version of the "balancing" patch. I decided > to stop increasing the nbucket value at runtime, even if the hashtable > grows larger than the memory limit (which is what we used to calculate > the initial nb

Re: Adjusting hash join memory limit to handle batch explosion

2025-01-13 Thread Tomas Vondra
On 1/13/25 17:32, Melanie Plageman wrote: > On Sat, Jan 11, 2025 at 7:42 PM Tomas Vondra wrote: >> >> I had a quiet evening yesterday, so I decided to take a stab at this and >> see how hard would it be, and how bad would the impact be. Attached is >> an experimental patch, doing the *bare* minimu

Re: Adjusting hash join memory limit to handle batch explosion

2025-01-13 Thread Melanie Plageman
On Sat, Jan 11, 2025 at 7:42 PM Tomas Vondra wrote: > > I had a quiet evening yesterday, so I decided to take a stab at this and > see how hard would it be, and how bad would the impact be. Attached is > an experimental patch, doing the *bare* minimum for a simple query: > > 1) It defines a limit

Re: Adjusting hash join memory limit to handle batch explosion

2025-01-11 Thread Tomas Vondra
On 1/10/25 15:54, Melanie Plageman wrote: > On Thu, Jan 9, 2025 at 6:59 PM Tomas Vondra wrote: >> >> ... > >> Robert's idea kept using buffered files, but limited how many we can >> fill at any phase. Say we'd use a limit of 1024 batches, but we actually >> need 1M batches. Then we'd do the build

Re: Adjusting hash join memory limit to handle batch explosion

2025-01-11 Thread Tomas Vondra
On 1/11/25 00:09, Melanie Plageman wrote: > On Fri, Jan 10, 2025 at 11:18 AM Tomas Vondra wrote: >> >> On 1/10/25 15:54, Melanie Plageman wrote: >>> On Thu, Jan 9, 2025 at 6:59 PM Tomas Vondra wrote: >>> I think this is because we get the batch based on >>> >>> *batchno = pg_rotate_right32(has

Re: Adjusting hash join memory limit to handle batch explosion

2025-01-10 Thread Melanie Plageman
On Fri, Jan 10, 2025 at 11:18 AM Tomas Vondra wrote: > > On 1/10/25 15:54, Melanie Plageman wrote: > > On Thu, Jan 9, 2025 at 6:59 PM Tomas Vondra wrote: > > I think this is because we get the batch based on > > > > *batchno = pg_rotate_right32(hashvalue, hashtable->log2_nbuckets) & > > (nbatch -

Re: Adjusting hash join memory limit to handle batch explosion

2025-01-10 Thread Tomas Vondra
On 1/10/25 15:54, Melanie Plageman wrote: > On Thu, Jan 9, 2025 at 6:59 PM Tomas Vondra wrote: >> >> >> >> On 1/9/25 21:42, Melanie Plageman wrote: >>> >>> I was excited about your raw file experiment. As Robert and you point >>> out -- we may need a file per batch, but for most of the hash joi

Re: Adjusting hash join memory limit to handle batch explosion

2025-01-10 Thread Melanie Plageman
On Thu, Jan 9, 2025 at 6:59 PM Tomas Vondra wrote: > > > > On 1/9/25 21:42, Melanie Plageman wrote: > > > > I was excited about your raw file experiment. As Robert and you point > > out -- we may need a file per batch, but for most of the hash join's > > execution we don't need to keep buffers for

Re: Adjusting hash join memory limit to handle batch explosion

2025-01-09 Thread Tomas Vondra
On 1/9/25 21:42, Melanie Plageman wrote: > On Tue, Dec 31, 2024 at 6:07 PM Tomas Vondra wrote: >> >> This means that ultimately it's either (1) or (3), and the more I've >> been looking into this the more I prefer (1), for a couple reasons: >> >> * It's much simpler (it doesn't really change an

Re: Adjusting hash join memory limit to handle batch explosion

2025-01-09 Thread Tomas Vondra
On 1/9/25 23:18, Melanie Plageman wrote: > On Sun, Jan 5, 2025 at 10:00 PM Tomas Vondra wrote: >> >> I think the general idea and formula explained in [1] is right, but >> while working on the PoC patch I started to think about how to formalize >> this. And I ended up creating two tables that I th

Re: Adjusting hash join memory limit to handle batch explosion

2025-01-09 Thread Melanie Plageman
On Sun, Jan 5, 2025 at 10:00 PM Tomas Vondra wrote: > > I think the general idea and formula explained in [1] is right, but > while working on the PoC patch I started to think about how to formalize > this. And I ended up creating two tables that I think visualize is > pretty nicely. > > Imagine a

Re: Adjusting hash join memory limit to handle batch explosion

2025-01-09 Thread Melanie Plageman
On Tue, Dec 31, 2024 at 6:07 PM Tomas Vondra wrote: > > This means that ultimately it's either (1) or (3), and the more I've > been looking into this the more I prefer (1), for a couple reasons: > > * It's much simpler (it doesn't really change anything on the basic > behavior, doesn't introduce a

Re: Adjusting hash join memory limit to handle batch explosion

2025-01-09 Thread Tomas Vondra
On 1/9/25 17:17, Melanie Plageman wrote: > On Tue, Dec 31, 2024 at 6:07 PM Tomas Vondra wrote: >> >> So I decided to revisit the three patches from 2019. Attached are >> rebased and cleaned up versions. A couple comments on each one: >> >> >> 1) v20241231-adjust-limit-0001-Account-for-batch-fil

Re: Adjusting hash join memory limit to handle batch explosion

2025-01-09 Thread Melanie Plageman
On Tue, Dec 31, 2024 at 6:07 PM Tomas Vondra wrote: > > So I decided to revisit the three patches from 2019. Attached are > rebased and cleaned up versions. A couple comments on each one: > > > 1) v20241231-adjust-limit-0001-Account-for-batch-files-in-ha.patch > > I believe this is the way to go,

Re: Adjusting hash join memory limit to handle batch explosion

2025-01-06 Thread Tomas Vondra
On 1/6/25 19:50, Robert Haas wrote: > On Mon, Jan 6, 2025 at 11:51 AM Tomas Vondra wrote: >> I wonder if maybe a better solution would be to allow BufFiles with >> smaller buffers, not just hard-coded 8kB. OTOH I'm not sure how much >> that helps, before the buffering stops being effective as t

Re: Adjusting hash join memory limit to handle batch explosion

2025-01-06 Thread Robert Haas
On Mon, Jan 6, 2025 at 11:51 AM Tomas Vondra wrote: > I wonder if maybe a better solution would be to allow BufFiles with > smaller buffers, not just hard-coded 8kB. OTOH I'm not sure how much > that helps, before the buffering stops being effective as the buffer > gets smaller. I mean, we only ha

Re: Adjusting hash join memory limit to handle batch explosion

2025-01-06 Thread Tomas Vondra
On 1/6/25 16:42, Robert Haas wrote: > Hi Tomas, > > Thanks for working on this. I haven't studied this problem recently, > but here are some ideas that occur to me: > > 1. Try to reduce the per-batch overhead. > Yeah. The "use files without buffering" approach may be seen as an extreme versi

Re: Adjusting hash join memory limit to handle batch explosion

2025-01-06 Thread Robert Haas
Hi Tomas, Thanks for working on this. I haven't studied this problem recently, but here are some ideas that occur to me: 1. Try to reduce the per-batch overhead. 2. Stop increasing the number of batches when the per-batch overhead exceeds a small percentage of work_mem (10%? 5%? 1%?). If you've