subject:"Default setting for enable_hashagg

Re: Default setting for enable_hashagg_disk

2020-07-29 Thread Peter Geoghegan

On Mon, Jul 27, 2020 at 5:55 PM Peter Geoghegan wrote: > Pushed the hashagg_avoid_disk_plan patch -- thanks! Pushed the hash_mem_multiplier patch as well -- thanks again! As I've said before, I am not totally opposed to adding a true escape hatch. That has not proven truly necessary just yet. Fo

Re: Default setting for enable_hashagg_disk

2020-07-29 Thread Jeff Davis

On Sat, 2020-07-25 at 17:52 -0700, Peter Geoghegan wrote: > BTW, your HLL patch ameliorates the problem with my extreme "sorted > vs > random input" test case from this morning [1] (the thing that I just > discussed with Tomas). Without the HLL patch the sorted case had 2424 > batches. With the HLL

Re: Default setting for enable_hashagg_disk

2020-07-27 Thread Peter Geoghegan

On Mon, Jul 27, 2020 at 5:10 PM Jeff Davis wrote: > I noticed that one of the conditionals, "cheapest_total_path != NULL", > was already redundant with the outer conditional before your patch. I > guess that was just a mistake which your patch corrects along the way? Makes sense. > Anyway, the p

Re: Default setting for enable_hashagg_disk

2020-07-27 Thread Jeff Davis

On Mon, 2020-07-27 at 11:30 -0700, Peter Geoghegan wrote: > The v4-0001-Remove-hashagg_avoid_disk_plan-GUC.patch changes are > surprisingly complicated. It would be nice if you could take a look > at > that aspect (or confirm that it's included in your review). I noticed that one of the conditiona

Re: Default setting for enable_hashagg_disk

2020-07-27 Thread Peter Geoghegan

On Mon, Jul 27, 2020 at 12:52 PM Alvaro Herrera wrote: > On 2020-Jul-27, Peter Geoghegan wrote: > > The v4-0001-Remove-hashagg_avoid_disk_plan-GUC.patch changes are > > surprisingly complicated. It would be nice if you could take a look at > > that aspect (or confirm that it's included in your rev

Re: Default setting for enable_hashagg_disk

2020-07-27 Thread Alvaro Herrera

On 2020-Jul-27, Peter Geoghegan wrote: > The v4-0001-Remove-hashagg_avoid_disk_plan-GUC.patch changes are > surprisingly complicated. It would be nice if you could take a look at > that aspect (or confirm that it's included in your review). I think you mean "it replaces surprisingly complicated c

Re: Default setting for enable_hashagg_disk

2020-07-27 Thread Peter Geoghegan

On Mon, Jul 27, 2020 at 11:24 AM Alvaro Herrera wrote: > > Are you proposing that I just put the prototype in miscadmin.h, while > > leaving the implementation where it is (in nodeHash.c)? > > Yes, that's in the part of my reply you didn't quote: > > : It remains strange to have the function in ex

Re: Default setting for enable_hashagg_disk

2020-07-27 Thread Alvaro Herrera

On 2020-Jul-27, Peter Geoghegan wrote: > On Mon, Jul 27, 2020 at 10:30 AM Alvaro Herrera > wrote: > > On 2020-Jul-23, Peter Geoghegan wrote: > > I notice you put the prototype for get_hash_mem in nodeHash.h. This > > would be fine if not for the fact that optimizer needs to call the > > function

Re: Default setting for enable_hashagg_disk

2020-07-27 Thread Peter Geoghegan

On Mon, Jul 27, 2020 at 10:30 AM Alvaro Herrera wrote: > On 2020-Jul-23, Peter Geoghegan wrote: > I notice you put the prototype for get_hash_mem in nodeHash.h. This > would be fine if not for the fact that optimizer needs to call the > function too, which means now optimizer have to include exec

Re: Default setting for enable_hashagg_disk

2020-07-27 Thread Alvaro Herrera

On 2020-Jul-23, Peter Geoghegan wrote: > Attached is v3 of the hash_mem_multiplier patch series, which now has > a preparatory patch that removes hashagg_avoid_disk_plan. I notice you put the prototype for get_hash_mem in nodeHash.h. This would be fine if not for the fact that optimizer needs to

Re: Default setting for enable_hashagg_disk

2020-07-27 Thread Peter Geoghegan

On Sun, Jul 26, 2020 at 11:34 AM Tomas Vondra wrote: > That's 1.6GB, if I read it right. Which is more than 200MB ;-) Sigh. That solves that "mystery": the behavior that my sorted vs random example exhibited is a known limitation in hash aggs that spill (and an acceptable one). The memory usage i

Re: Default setting for enable_hashagg_disk

2020-07-26 Thread Tomas Vondra

On Sat, Jul 25, 2020 at 05:13:00PM -0700, Peter Geoghegan wrote: On Sat, Jul 25, 2020 at 5:05 PM Tomas Vondra wrote: I'm not sure what you mean by "reported memory usage doesn't reflect the space used for transition state"? Surely it does include that, we've built the memory accounting stuff pr

Re: Default setting for enable_hashagg_disk

2020-07-25 Thread Peter Geoghegan

On Sat, Jul 25, 2020 at 5:31 PM Peter Geoghegan wrote: > I'm glad that this better principled approach is possible. It's hard > to judge how much of a problem this really is, though. We'll need to > think about this aspect some more. BTW, your HLL patch ameliorates the problem with my extreme "so

Re: Default setting for enable_hashagg_disk

2020-07-25 Thread Peter Geoghegan

On Sat, Jul 25, 2020 at 4:56 PM Jeff Davis wrote: > I wrote a quick patch to use HyperLogLog to estimate the number of > groups contained in a spill file. It seems to reduce the > overpartitioning effect, and is a more principled approach than what I > was doing before. This pretty much fixes the

Re: Default setting for enable_hashagg_disk

2020-07-25 Thread Peter Geoghegan

On Sat, Jul 25, 2020 at 5:05 PM Tomas Vondra wrote: > I'm not sure what you mean by "reported memory usage doesn't reflect the > space used for transition state"? Surely it does include that, we've > built the memory accounting stuff pretty much exactly to do that. > > I think it's pretty clear wh

Re: Default setting for enable_hashagg_disk

2020-07-25 Thread Tomas Vondra

On Sat, Jul 25, 2020 at 10:07:37AM -0700, Peter Geoghegan wrote: On Sat, Jul 25, 2020 at 9:39 AM Peter Geoghegan wrote: "Peak Memory Usage: 1605334kB" Hash agg avoids spilling entirely (so the planner gets it right this time around). It even uses notably less memory. I guess that this is bec

Re: Default setting for enable_hashagg_disk

2020-07-25 Thread Jeff Davis

On Sat, 2020-07-25 at 13:27 -0700, Peter Geoghegan wrote: > It's not clear to me that overpartitioning is a real problem in > > this > > case -- but I think the fact that it's causing confusion is enough > > reason to see if we can fix it. > > I'm not sure about that either. > > FWIW I notice tha

Re: Default setting for enable_hashagg_disk

2020-07-25 Thread Peter Geoghegan

On Sat, Jul 25, 2020 at 1:10 PM Jeff Davis wrote: > On Sat, 2020-07-25 at 11:05 -0700, Peter Geoghegan wrote: > > What worries me a bit is the sharp discontinuities when spilling with > > significantly less work_mem than the "optimal" amount. For example, > > with Tomas' TPC-H query (against my sm

Re: Default setting for enable_hashagg_disk

2020-07-25 Thread Jeff Davis

On Sat, 2020-07-25 at 11:05 -0700, Peter Geoghegan wrote: > What worries me a bit is the sharp discontinuities when spilling with > significantly less work_mem than the "optimal" amount. For example, > with Tomas' TPC-H query (against my smaller TPC-H dataset), I find > that setting work_mem to 6MB

Re: Default setting for enable_hashagg_disk

2020-07-25 Thread Peter Geoghegan

On Sat, Jul 25, 2020 at 10:23 AM Jeff Davis wrote: > There's also another effect at work that can cause the total number of > batches to be higher for larger work_mem values: when we do recurse, we > again need to estimate the number of partitions needed. Right now, we > overestimate the number of

Re: Default setting for enable_hashagg_disk

2020-07-25 Thread Jeff Davis

On Thu, 2020-07-23 at 19:33 -0700, Peter Geoghegan wrote: > That does make it sound like the costs of the hash agg aren't being > represented. I suppose it isn't clear if this is a costing issue > because it isn't clear if the execution time performance itself is > pathological or is instead someth

Re: Default setting for enable_hashagg_disk

2020-07-25 Thread Jeff Davis

On Fri, 2020-07-24 at 10:40 +0200, Tomas Vondra wrote: > FWIW one more suspicious thing that I forgot to mention is the > behavior > of the "planned partitions" depending on work_mem, which looks like > this: > >2MB Planned Partitions: 64HashAgg Batches: 4160 >4MB

Re: Default setting for enable_hashagg_disk

2020-07-25 Thread Peter Geoghegan

On Sat, Jul 25, 2020 at 9:39 AM Peter Geoghegan wrote: > "Peak Memory Usage: 1605334kB" > > Hash agg avoids spilling entirely (so the planner gets it right this > time around). It even uses notably less memory. I guess that this is because the reported memory usage doesn't reflect the space used

Re: Default setting for enable_hashagg_disk

2020-07-25 Thread Peter Geoghegan

On Fri, Jul 24, 2020 at 12:55 PM Peter Geoghegan wrote: > Could that be caused by clustering in the data? > > If the input data is in totally random order then we have a good > chance of never having to spill skewed "common" values. That is, we're > bound to encounter common values before entering

Re: Default setting for enable_hashagg_disk

2020-07-25 Thread Jeff Davis

On Fri, 2020-07-24 at 21:16 +0200, Tomas Vondra wrote: > Surely more recursive spilling should do more I/O, but the Disk Usage > reported by explain analyze does not show anything like ... I suspect that's because of disk reuse in logtape.c. Regards, Jeff Davis

Re: Default setting for enable_hashagg_disk

2020-07-24 Thread Peter Geoghegan

On Fri, Jul 24, 2020 at 12:16 PM Tomas Vondra wrote: > Maybe, but we're nowhere close to these limits. See this table which I > posted earlier: > >2MB Planned Partitions: 64HashAgg Batches: 4160 >4MB Planned Partitions: 128HashAgg Batches: 16512 >8MB

Re: Default setting for enable_hashagg_disk

2020-07-24 Thread Tomas Vondra

On Fri, Jul 24, 2020 at 11:03:54AM -0700, Peter Geoghegan wrote: On Fri, Jul 24, 2020 at 8:19 AM Robert Haas wrote: This is all really good analysis, I think, but this seems like the key finding. It seems like we don't really understand what's actually getting written. Whether we use hash or so

Re: Default setting for enable_hashagg_disk

2020-07-24 Thread Peter Geoghegan

On Fri, Jul 24, 2020 at 1:40 AM Tomas Vondra wrote: > Maybe, not sure what exactly you think is pathological? The trouble is > hashagg has to spill input tuples but the memory used in no-spill case > represents aggregated groups, so I'm not sure how you could extrapolate > from that ... Yeah, but

Re: Default setting for enable_hashagg_disk

2020-07-24 Thread Peter Geoghegan

On Fri, Jul 24, 2020 at 8:19 AM Robert Haas wrote: > This is all really good analysis, I think, but this seems like the key > finding. It seems like we don't really understand what's actually > getting written. Whether we use hash or sort doesn't seem like it > should have this kind of impact on h

Re: Default setting for enable_hashagg_disk

2020-07-24 Thread Tomas Vondra

On Fri, Jul 24, 2020 at 11:18:48AM -0400, Robert Haas wrote: On Thu, Jul 23, 2020 at 9:22 PM Tomas Vondra wrote: 2MB 4MB8MB64MB256MB --- hash 6.716.70 6.736.44

Re: Default setting for enable_hashagg_disk

2020-07-24 Thread Robert Haas

On Thu, Jul 23, 2020 at 9:22 PM Tomas Vondra wrote: >2MB 4MB8MB64MB256MB > --- > hash 6.716.70 6.736.44 5.81 > hash CP_SMALL_TLIST 5.285.26 5.24

Re: Default setting for enable_hashagg_disk

2020-07-24 Thread Tomas Vondra

On Fri, Jul 24, 2020 at 10:40:47AM +0200, Tomas Vondra wrote: On Thu, Jul 23, 2020 at 07:33:45PM -0700, Peter Geoghegan wrote: On Thu, Jul 23, 2020 at 6:22 PM Tomas Vondra wrote: So let me share some fresh I/O statistics collected on the current code using iosnoop. I've done the tests on two d

Re: Default setting for enable_hashagg_disk

2020-07-24 Thread Tomas Vondra

On Thu, Jul 23, 2020 at 07:33:45PM -0700, Peter Geoghegan wrote: On Thu, Jul 23, 2020 at 6:22 PM Tomas Vondra wrote: So let me share some fresh I/O statistics collected on the current code using iosnoop. I've done the tests on two different machines using the "aggregate part" of TPC-H Q17, i.e.

Re: Default setting for enable_hashagg_disk

2020-07-23 Thread Peter Geoghegan

On Thu, Jul 23, 2020 at 6:22 PM Tomas Vondra wrote: > So let me share some fresh I/O statistics collected on the current code > using iosnoop. I've done the tests on two different machines using the > "aggregate part" of TPC-H Q17, i.e. essentially this: > > SELECT * FROM ( > SELECT >

Re: Default setting for enable_hashagg_disk

2020-07-23 Thread Peter Geoghegan

On Fri, Jul 17, 2020 at 5:13 PM Jeff Davis wrote: > The patch itself looks reasonable to me. I don't see a lot of obvious > dangers, but perhaps someone would like to take a closer look at the > planner changes as you suggest. Attached is v3 of the hash_mem_multiplier patch series, which now has

Re: Default setting for enable_hashagg_disk

2020-07-22 Thread Peter Geoghegan

On Tue, Jul 21, 2020 at 1:30 PM Bruce Momjian wrote: > On Tue, Jul 14, 2020 at 03:49:40PM -0700, Peter Geoghegan wrote: > > Maybe I missed your point here. The problem is not so much that we'll > > get HashAggs that spill -- there is nothing intrinsically wrong with > > that. While it's true that

Re: Default setting for enable_hashagg_disk

2020-07-22 Thread Robert Haas

On Tue, Jul 14, 2020 at 6:49 PM Peter Geoghegan wrote: > Maybe I missed your point here. The problem is not so much that we'll > get HashAggs that spill -- there is nothing intrinsically wrong with > that. While it's true that the I/O pattern is not as sequential as a > similar group agg + sort, t

Re: Default setting for enable_hashagg_disk

2020-07-21 Thread Bruce Momjian

On Tue, Jul 14, 2020 at 03:49:40PM -0700, Peter Geoghegan wrote: > Maybe I missed your point here. The problem is not so much that we'll > get HashAggs that spill -- there is nothing intrinsically wrong with > that. While it's true that the I/O pattern is not as sequential as a > similar group agg

Re: Default setting for enable_hashagg_disk

2020-07-20 Thread Tomas Vondra

On Mon, Jul 20, 2020 at 09:17:21AM -0400, Tom Lane wrote: Tomas Vondra writes: There's a minor problem here, though - these stats were collected before we fixed the tlist issue, so hashagg was spilling about 10x the amount of data compared to sort+groupagg. So maybe that's the first thing we sh

Re: Default setting for enable_hashagg_disk

2020-07-20 Thread Tom Lane

Tomas Vondra writes: > There's a minor problem here, though - these stats were collected before > we fixed the tlist issue, so hashagg was spilling about 10x the amount > of data compared to sort+groupagg. So maybe that's the first thing we > should do, before contemplating changes to the costing

Re: Default setting for enable_hashagg_disk

2020-07-19 Thread Tomas Vondra

On Sun, Jul 19, 2020 at 02:17:15PM -0700, Jeff Davis wrote: On Sat, 2020-07-18 at 21:15 -0400, Tom Lane wrote: Jeff Davis writes: > What is your opinion about pessimizing the HashAgg disk costs (not > affecting HashAgg plans expected to stay in memory)? Tomas Vondra > presented some evidence th

Re: Default setting for enable_hashagg_disk

2020-07-19 Thread Peter Geoghegan

On Sat, Jul 18, 2020 at 3:04 PM Jeff Davis wrote: > > I think the entire discussion > > is way out ahead of any field evidence that we need such a knob. > > In the absence of evidence, our default position ought to be to > > keep it simple, not to accumulate backwards-compatibility kluges. > > Fai

Re: Default setting for enable_hashagg_disk

2020-07-19 Thread Jeff Davis

On Sat, 2020-07-18 at 21:15 -0400, Tom Lane wrote: > Jeff Davis writes: > > What is your opinion about pessimizing the HashAgg disk costs (not > > affecting HashAgg plans expected to stay in memory)? Tomas Vondra > > presented some evidence that Sort had some better IO patterns in > > some > > cas

Re: Default setting for enable_hashagg_disk

2020-07-19 Thread David G. Johnston

On Sun, Jul 19, 2020 at 4:38 AM Stephen Frost wrote: > > (The only reason I'm in favor of heap_mem[_multiplier] is that it > > seems like it might be possible to use it to get *better* plans > > than before. I do not see it as a backwards-compatibility knob.) > > I still don't think a hash_mem-t

Re: Default setting for enable_hashagg_disk

2020-07-19 Thread Tom Lane

Stephen Frost writes: > In other words, if we'd stop trying to shoehorn something in, which > we're doing because we're in beta, we'd very likely be talking about all > of this in a very different way and probably be contemplating something > like a query_mem that provides for an overall memory li

Re: Default setting for enable_hashagg_disk

2020-07-19 Thread Stephen Frost

Greetings, * Tom Lane (t...@sss.pgh.pa.us) wrote: > Jeff Davis writes: > > On Fri, 2020-07-17 at 18:38 -0700, Peter Geoghegan wrote: > >> There is also the separate question of what to do about the > >> hashagg_avoid_disk_plan GUC (this is a separate open item that > >> requires a separate resolu

Re: Default setting for enable_hashagg_disk

2020-07-18 Thread Tom Lane

Jeff Davis writes: > What is your opinion about pessimizing the HashAgg disk costs (not > affecting HashAgg plans expected to stay in memory)? Tomas Vondra > presented some evidence that Sort had some better IO patterns in some > cases that weren't easily reflected in a principled way in the cost

Re: Default setting for enable_hashagg_disk

2020-07-18 Thread Jeff Davis

On Sat, 2020-07-18 at 14:30 -0400, Tom Lane wrote: > You'e being optimistic about it being possible to remove a GUC once > we ship it. That seems to be a hard sell most of the time. If nothing else, a repeat of this thread in a year or two to discuss removing a GUC doesn't seem appealing. > I th

Re: Default setting for enable_hashagg_disk

2020-07-18 Thread Peter Geoghegan

On Sat, Jul 18, 2020 at 11:30 AM Tom Lane wrote: > Jeff Davis writes: > > Yes, I think we should have that GUC (hashagg_avoid_disk_plan) for at > > least one release. > > You'e being optimistic about it being possible to remove a GUC once > we ship it. That seems to be a hard sell most of the ti

Re: Default setting for enable_hashagg_disk

2020-07-18 Thread Tom Lane

Jeff Davis writes: > On Fri, 2020-07-17 at 18:38 -0700, Peter Geoghegan wrote: >> There is also the separate question of what to do about the >> hashagg_avoid_disk_plan GUC (this is a separate open item that >> requires a separate resolution). Tom leans slightly towards removing >> it now. Is your

Re: Default setting for enable_hashagg_disk

2020-07-18 Thread Jeff Davis

On Fri, 2020-07-17 at 18:38 -0700, Peter Geoghegan wrote: > There is also the separate question of what to do about the > hashagg_avoid_disk_plan GUC (this is a separate open item that > requires a separate resolution). Tom leans slightly towards removing > it now. Is your position about the same a

Re: Default setting for enable_hashagg_disk

2020-07-17 Thread Peter Geoghegan

On Fri, Jul 17, 2020 at 5:13 PM Jeff Davis wrote: > The idea is growing on me a bit. It doesn't give exactly v12 behavior, > but it does offer another lever that might tackle a lot of the > practical cases. Cool. > If I were making the decision alone, I'd still choose > the escape hatch based on

Re: Default setting for enable_hashagg_disk

2020-07-17 Thread Jeff Davis

On Tue, 2020-07-14 at 21:12 -0700, Peter Geoghegan wrote: > Attached is a WIP patch implementing hash_mem_multiplier, with 1.0 as > the GUC's default value (i.e. the patch introduces no behavioral > changes by default). The first patch in the series renames some local > variables whose name is made

Re: Default setting for enable_hashagg_disk

2020-07-14 Thread Peter Geoghegan

On Mon, Jul 13, 2020 at 9:47 AM Alvaro Herrera wrote: > I'm in favor of hash_mem_multiplier. I think a >1 default is more > sensible than =1 in the long run, but if strategic vote is what we're > doing, then I support the =1 option. Attached is a WIP patch implementing hash_mem_multiplier, with

Re: Default setting for enable_hashagg_disk

2020-07-14 Thread Peter Geoghegan

On Tue, Jul 14, 2020 at 12:46 PM Robert Haas wrote: > - I thought the problem we were trying to solve here was that, in v12, > if the planner thinks that your hashagg will fit in memory when really > it doesn't, you will get good performance because we'll cheat; in v13, > you'll get VERY bad perfo

Re: Default setting for enable_hashagg_disk

2020-07-14 Thread Robert Haas

On Mon, Jul 13, 2020 at 2:50 PM Peter Geoghegan wrote: > Primarily in favor of escape hatch: > > Jeff, > DavidR, > Pavel, > Andres, > Robert ??, > Amit ?? > > Primarily in favor of hash_mem/hash_mem_multiplier: > > PeterG, > Tom, > Alvaro, > Tomas, > Justin, > DavidG, > Jonathan Katz > > There are

Re: Default setting for enable_hashagg_disk

2020-07-13 Thread Amit Kapila

On Mon, Jul 13, 2020 at 9:50 PM Peter Geoghegan wrote: > > On Mon, Jul 13, 2020 at 6:13 AM Stephen Frost wrote: > > Yes, increasing work_mem isn't unusual, at all. > > It's unusual as a way of avoiding OOMs! > > > Eh? That's not at all what it looks like- they were getting OOM's > > because they

Re: Default setting for enable_hashagg_disk

2020-07-13 Thread Tom Lane

"David G. Johnston" writes: > To be clear, by "escape hatch" you mean "add a GUC that instructs the > PostgreSQL executor to ignore hash_mem when deciding whether to spill the > contents of the hash table to disk - IOW to never spill the contents of a > hash table to disk"? If so that seems separ

Re: Default setting for enable_hashagg_disk

2020-07-13 Thread Peter Geoghegan

On Mon, Jul 13, 2020 at 12:57 PM David G. Johnston wrote: > To be clear, by "escape hatch" you mean "add a GUC that instructs the > PostgreSQL executor to ignore hash_mem when deciding whether to spill the > contents of the hash table to disk - IOW to never spill the contents of a > hash table

Re: Default setting for enable_hashagg_disk

2020-07-13 Thread David G. Johnston

On Mon, Jul 13, 2020 at 11:50 AM Peter Geoghegan wrote: > > Primarily in favor of escape hatch: > > Jeff, > DavidR, > Pavel, > Andres, > Robert ??, > Amit ?? > > To be clear, by "escape hatch" you mean "add a GUC that instructs the PostgreSQL executor to ignore hash_mem when deciding whether to s

Re: Default setting for enable_hashagg_disk

2020-07-13 Thread Peter Geoghegan

On Mon, Jul 13, 2020 at 7:25 AM David Rowley wrote: > I think it would be good if we could try to move towards getting > consensus here rather than reiterating our arguments over and over. +1 > Updated summary: > * For hash_mem = Tomas [7], Justin [16] > * For hash_mem_multiplier with a default

Re: Default setting for enable_hashagg_disk

2020-07-13 Thread Justin Pryzby

On Mon, Jul 13, 2020 at 12:47:36PM -0400, Alvaro Herrera wrote: > On 2020-Jul-13, Jeff Davis wrote: > > > On Tue, 2020-07-14 at 02:25 +1200, David Rowley wrote: > > > Updated summary: > > > * For hash_mem = Tomas [7], Justin [16] > > > * For hash_mem_multiplier with a default > 1.0 = DavidG [21] >

Re: Default setting for enable_hashagg_disk

2020-07-13 Thread Tom Lane

Alvaro Herrera writes: > I'm in favor of hash_mem_multiplier. I think a >1 default is more > sensible than =1 in the long run, but if strategic vote is what we're > doing, then I support the =1 option. FWIW, I also think that we'll eventually end up with >1 default. But the evidence to support t

Re: Default setting for enable_hashagg_disk

2020-07-13 Thread Alvaro Herrera

On 2020-Jul-13, Jeff Davis wrote: > On Tue, 2020-07-14 at 02:25 +1200, David Rowley wrote: > > Updated summary: > > * For hash_mem = Tomas [7], Justin [16] > > * For hash_mem_multiplier with a default > 1.0 = DavidG [21] > > * For hash_mem_multiplier with default = 1.0 = PeterG [15][0], Tom > > [

Re: Default setting for enable_hashagg_disk

2020-07-13 Thread Peter Geoghegan

On Mon, Jul 13, 2020 at 6:13 AM Stephen Frost wrote: > Yes, increasing work_mem isn't unusual, at all. It's unusual as a way of avoiding OOMs! > Eh? That's not at all what it looks like- they were getting OOM's > because they set work_mem to be higher than the actual amount of memory > they had

Re: Default setting for enable_hashagg_disk

2020-07-13 Thread Peter Eisentraut

On 2020-07-13 16:11, Tomas Vondra wrote: Why is running out of disk space worse experience than running out of memory? Sure, it'll take longer and ultimately the query fails (and if it fills the device used by the WAL then it may also cause shutdown of the main instance due to inability to write

Re: Default setting for enable_hashagg_disk

2020-07-13 Thread Jeff Davis

On Tue, 2020-07-14 at 02:25 +1200, David Rowley wrote: > Updated summary: > * For hash_mem = Tomas [7], Justin [16] > * For hash_mem_multiplier with a default > 1.0 = DavidG [21] > * For hash_mem_multiplier with default = 1.0 = PeterG [15][0], Tom > [20][24] I am OK with these options, but I stil

Re: Default setting for enable_hashagg_disk

2020-07-13 Thread David Rowley

7; setting. I think Peter seems to be fine with setting the default to 1.0, per [0]. This thread did split off a while back into "Default setting for enable_hashagg_disk (hash_mem)", I did try and summarise who sits where on this in [19]. I think it would be good if we could try

Re: Default setting for enable_hashagg_disk

2020-07-13 Thread Tomas Vondra

On Mon, Jul 13, 2020 at 01:51:42PM +0200, Peter Eisentraut wrote: On 2020-04-07 20:20, Jeff Davis wrote: Now that we have Disk-based Hash Aggregation, there are a lot more situations where the planner can choose HashAgg. The enable_hashagg_disk GUC, if set to true, chooses HashAgg based on costi

Re: Default setting for enable_hashagg_disk

2020-07-13 Thread Stephen Frost

Greetings, * Peter Geoghegan (p...@bowt.ie) wrote: > On Sat, Jul 11, 2020 at 7:22 AM Stephen Frost wrote: > > * Tom Lane (t...@sss.pgh.pa.us) wrote: > > > Have you got a better proposal that is reasonably implementable for v13? > > > (I do not accept the argument that "do nothing" is a better pro

Re: Default setting for enable_hashagg_disk

2020-07-13 Thread Peter Eisentraut

On 2020-07-13 14:16, David Rowley wrote: Isn't that what temp_file_limit is for? Yeah, I guess that is so rarely used that I had forgotten about it. So maybe that is also something that more users will want to be aware of. -- Peter Eisentraut http://www.2ndQuadrant.com/ Postgre

Re: Default setting for enable_hashagg_disk

2020-07-13 Thread David Rowley

On Mon, 13 Jul 2020 at 23:51, Peter Eisentraut wrote: > I have an anecdote that might be related to this discussion. > > I was running an unrelated benchmark suite. With PostgreSQL 12, one > query ran out of memory. With PostgreSQL 13, the same query instead ran > out of disk space. I bisected

Re: Default setting for enable_hashagg_disk

2020-07-13 Thread Peter Eisentraut

On 2020-04-07 20:20, Jeff Davis wrote: Now that we have Disk-based Hash Aggregation, there are a lot more situations where the planner can choose HashAgg. The enable_hashagg_disk GUC, if set to true, chooses HashAgg based on costing. If false, it only generates a HashAgg path if it thinks it will

Re: Default setting for enable_hashagg_disk

2020-07-13 Thread Amit Kapila

On Sat, Jul 11, 2020 at 3:30 AM Tom Lane wrote: > > Stephen Frost writes: > > I don't see hash_mem as being any kind of proper fix- it's just punting > > to the user saying "we can't figure this out, how about you do it" and, > > worse, it's in conflict with how we already ask the user that quest

Re: Default setting for enable_hashagg_disk

2020-07-12 Thread Tomas Vondra

On Sat, Jul 11, 2020 at 10:26:22PM -0700, David G. Johnston wrote: On Saturday, July 11, 2020, Tom Lane wrote: "David G. Johnston" writes: > On Sat, Jul 11, 2020 at 5:47 PM Tom Lane wrote: >> It seems like a lot of the disagreement here is focused on Peter's >> proposal to make hash_mem_mult

Re: Default setting for enable_hashagg_disk

2020-07-12 Thread Tomas Vondra

On Sat, Jul 11, 2020 at 08:47:54PM -0400, Tom Lane wrote: Tomas Vondra writes: I don't know, but one of the main arguments against simply suggesting people to bump up work_mem (if they're hit by the hashagg spill in v13) was that it'd increase overall memory usage for them. It seems strange to

Re: Default setting for enable_hashagg_disk

2020-07-11 Thread David G. Johnston

On Saturday, July 11, 2020, Tom Lane wrote: > "David G. Johnston" writes: > > On Sat, Jul 11, 2020 at 5:47 PM Tom Lane wrote: > >> It seems like a lot of the disagreement here is focused on Peter's > >> proposal to make hash_mem_multiplier default to 2.0. But it doesn't > >> seem to me that th

Re: Default setting for enable_hashagg_disk

2020-07-11 Thread Tom Lane

"David G. Johnston" writes: > On Sat, Jul 11, 2020 at 5:47 PM Tom Lane wrote: >> It seems like a lot of the disagreement here is focused on Peter's >> proposal to make hash_mem_multiplier default to 2.0. But it doesn't >> seem to me that that's a critical element of the proposal. Why not just >

Re: Default setting for enable_hashagg_disk

2020-07-11 Thread David G. Johnston

On Sat, Jul 11, 2020 at 5:47 PM Tom Lane wrote: > Tomas Vondra writes: > > I don't know, but one of the main arguments against simply suggesting > > people to bump up work_mem (if they're hit by the hashagg spill in v13) > > was that it'd increase overall memory usage for them. It seems strange

Re: Default setting for enable_hashagg_disk

2020-07-11 Thread Thomas Munro

On Sun, Jul 12, 2020 at 2:27 AM Tom Lane wrote: > David Rowley writes: > > hmm yeah. It's unfortunate, but I'm not sure how I'd have implemented > > it differently. The problem is made worse by the fact that we'll only > > release the memory for the hash table during ExecEndHashJoin(). If the >

Re: Default setting for enable_hashagg_disk

2020-07-11 Thread Peter Geoghegan

I would be okay with a default of 1.0. Peter Geoghegan (Sent from my phone)

Re: Default setting for enable_hashagg_disk

2020-07-11 Thread Tom Lane

Tomas Vondra writes: > I don't know, but one of the main arguments against simply suggesting > people to bump up work_mem (if they're hit by the hashagg spill in v13) > was that it'd increase overall memory usage for them. It seems strange > to then propose a new GUC set to a default that would re

Re: Default setting for enable_hashagg_disk

2020-07-11 Thread Tomas Vondra

On Sat, Jul 11, 2020 at 09:02:43AM -0700, David G. Johnston wrote: On Sat, Jul 11, 2020 at 7:22 AM Stephen Frost wrote: There now seems to be some suggestions that not only should we have a new GUC, but we should default to having it not be equal to work_mem (or 1.0 or whatever) and instead by

Re: Default setting for enable_hashagg_disk

2020-07-11 Thread Peter Geoghegan

On Sat, Jul 11, 2020 at 4:23 PM Tomas Vondra wrote: > I find that example rather suspicious. I mean, what exactly in the > GroupAgg plan would consume this memory? Surely it'd have to be some > node below the grouping, but sort shouldn't do that, no? > > Seems strange. Well, I imagine hash aggreg

Re: Default setting for enable_hashagg_disk

2020-07-11 Thread Peter Geoghegan

On Fri, Jul 10, 2020 at 10:00 PM David Rowley wrote: > hmm yeah. It's unfortunate, but I'm not sure how I'd have implemented > it differently. The problem is made worse by the fact that we'll only > release the memory for the hash table during ExecEndHashJoin(). If the > planner had some ability

Re: Default setting for enable_hashagg_disk

2020-07-11 Thread Tomas Vondra

On Sat, Jul 11, 2020 at 09:49:43AM -0700, Peter Geoghegan wrote: On Sat, Jul 11, 2020 at 7:22 AM Stephen Frost wrote: * Tom Lane (t...@sss.pgh.pa.us) wrote: > Have you got a better proposal that is reasonably implementable for v13? > (I do not accept the argument that "do nothing" is a better p

Re: Default setting for enable_hashagg_disk

2020-07-11 Thread Peter Geoghegan

On Sat, Jul 11, 2020 at 7:22 AM Stephen Frost wrote: > * Tom Lane (t...@sss.pgh.pa.us) wrote: > > Have you got a better proposal that is reasonably implementable for v13? > > (I do not accept the argument that "do nothing" is a better proposal.) > So, no, I don't agree that 'do nothing' (except r

Re: Default setting for enable_hashagg_disk

2020-07-11 Thread David G. Johnston

On Sat, Jul 11, 2020 at 7:22 AM Stephen Frost wrote: > There now seems to be some suggestions that not only should we have a > new GUC, but we should default to having it not be equal to work_mem (or > 1.0 or whatever) and instead by higher, to be *twice* or larger whatever > the existing work_me

Re: Default setting for enable_hashagg_disk

2020-07-11 Thread Tom Lane

David Rowley writes: > hmm yeah. It's unfortunate, but I'm not sure how I'd have implemented > it differently. The problem is made worse by the fact that we'll only > release the memory for the hash table during ExecEndHashJoin(). If the > planner had some ability to provide the executor with kno

Re: Default setting for enable_hashagg_disk

2020-07-11 Thread Stephen Frost

Greetings, * Tom Lane (t...@sss.pgh.pa.us) wrote: > Stephen Frost writes: > > I don't see hash_mem as being any kind of proper fix- it's just punting > > to the user saying "we can't figure this out, how about you do it" and, > > worse, it's in conflict with how we already ask the user that quest

Re: Default setting for enable_hashagg_disk

2020-07-10 Thread David Rowley

On Sat, 11 Jul 2020 at 14:02, Peter Geoghegan wrote: > > On Fri, Jul 10, 2020 at 6:19 PM David Rowley wrote: > > If we have to have a new GUC, my preference would be hashagg_mem, > > where -1 means use work_mem and a value between 64 and MAX_KILOBYTES > > would mean use that value. We'd need som

Re: Default setting for enable_hashagg_disk

2020-07-10 Thread Peter Geoghegan

On Fri, Jul 10, 2020 at 6:19 PM David Rowley wrote: > If we get hash_mem > or some variant that is a multiplier of work_mem, then that user is in > exactly the same situation for that plan. i.e there's no ability to > increase the memory allowances for Hash Agg alone. That's true, of course. > I

Re: Default setting for enable_hashagg_disk

2020-07-10 Thread David G. Johnston

On Fri, Jul 10, 2020 at 6:43 PM David Rowley wrote: > On Sat, 11 Jul 2020 at 13:36, David G. Johnston > wrote: > > If we add a setting that defaults to work_mem then the benefit is > severely reduced. You still have to modify individual queries, but the > change can simply be more targeted than

Re: Default setting for enable_hashagg_disk

2020-07-10 Thread David Rowley

On Sat, 11 Jul 2020 at 13:36, David G. Johnston wrote: > If we add a setting that defaults to work_mem then the benefit is severely > reduced. You still have to modify individual queries, but the change can > simply be more targeted than changing work_mem alone. I think the idea is that this i

Re: Default setting for enable_hashagg_disk

2020-07-10 Thread David G. Johnston

On Fri, Jul 10, 2020 at 6:19 PM David Rowley wrote: > If we have to have a new GUC, my preference would be hashagg_mem, > where -1 means use work_mem and a value between 64 and MAX_KILOBYTES > would mean use that value. We'd need some sort of check hook to > disallow 0-63. I really am just faili

Re: Default setting for enable_hashagg_disk

2020-07-10 Thread David Rowley

On Sat, 11 Jul 2020 at 12:47, David G. Johnston wrote: > The multiplier seems strictly better than "rely on work_mem alone, i.e., do > nothing"; the detracting factor being one more GUC. Even if one wants to > argue the solution is ugly or imperfect the current state seems worse and a > more p

Re: Default setting for enable_hashagg_disk

2020-07-10 Thread David G. Johnston

On Fri, Jul 10, 2020 at 5:16 PM David Rowley wrote: > Stephen mentions in [1] that: > > Users who are actually hit by this in a negative way > > have an option- increase work_mem to reflect what was actually happening > > already. > > Peter is not a fan of that idea, which can only be due to the

Re: Default setting for enable_hashagg_disk

2020-07-10 Thread David Rowley

On Sat, 11 Jul 2020 at 10:00, Tom Lane wrote: > > Stephen Frost writes: > > I don't see hash_mem as being any kind of proper fix- it's just punting > > to the user saying "we can't figure this out, how about you do it" and, > > worse, it's in conflict with how we already ask the user that questio

Re: Default setting for enable_hashagg_disk

2020-07-10 Thread Peter Geoghegan

On Fri, Jul 10, 2020 at 2:50 PM Stephen Frost wrote: > Nothing of what you've said thus far has shown me that there were > material bits of the discussion that I've missed. Maybe that's just because you missed those bits too? > No, that other people > feel differently or have made comments suppo

Re: Default setting for enable_hashagg_disk

2020-07-10 Thread Tom Lane

Stephen Frost writes: > I don't see hash_mem as being any kind of proper fix- it's just punting > to the user saying "we can't figure this out, how about you do it" and, > worse, it's in conflict with how we already ask the user that question. > Turning it into a multiplier doesn't change that eit

1 2 3 >

1 - 100 of 243 matches

Mail list logo