Re: [HACKERS] 9.5: Memory-bounded HashAgg

2014-12-28 Thread Jeff Davis
On Thu, 2014-12-11 at 02:46 -0800, Jeff Davis wrote: > On Sun, 2014-08-10 at 14:26 -0700, Jeff Davis wrote: > > This patch is requires the Memory Accounting patch, or something similar > > to track memory usage. > > > > The attached patch enables hashagg to spill to disk, which means that > > hash

Re: [HACKERS] 9.5: Memory-bounded HashAgg

2014-12-11 Thread Tomas Vondra
On 11.12.2014 11:46, Jeff Davis wrote: > > New patch attached. All open items are complete, though the patch may > have a few rough edges. > > Summary of changes: > > * rebased on top of latest memory accounting patch > http://www.postgresql.org/message-id/1417497257.5584.5.camel@jeff-desktop >

Re: [HACKERS] 9.5: Memory-bounded HashAgg

2014-12-11 Thread Jeff Davis
On Sun, 2014-08-10 at 14:26 -0700, Jeff Davis wrote: > This patch is requires the Memory Accounting patch, or something similar > to track memory usage. > > The attached patch enables hashagg to spill to disk, which means that > hashagg will contain itself to work_mem even if the planner makes a >

Re: [HACKERS] 9.5: Memory-bounded HashAgg

2014-09-08 Thread Robert Haas
On Wed, Sep 3, 2014 at 7:34 PM, Tomas Vondra wrote: >> Well, I think you're certainly right that a hash table lookup is more >> expensive than modulo on a 32-bit integer; so much is obvious. But if >> join can increase the number of batches on the fly, but only by >> doubling it, so you might go

Re: [HACKERS] 9.5: Memory-bounded HashAgg

2014-09-03 Thread Tomas Vondra
On 4.9.2014 01:34, Tomas Vondra wrote: > On 20.8.2014 20:32, Robert Haas wrote: >> >> As I see it, the advantage of Jeff's approach is that it doesn't >> really matter whether our estimates are accurate or not. We don't >> have to decide at the beginning how many batches to do, and then >> possibl

Re: [HACKERS] 9.5: Memory-bounded HashAgg

2014-09-03 Thread Tomas Vondra
On 20.8.2014 20:32, Robert Haas wrote: > On Sun, Aug 17, 2014 at 1:17 PM, Tomas Vondra wrote: >> Being able to batch inner and outer relations in a matching way is >> certainly one of the reasons why hashjoin uses that particular scheme. >> There are other reasons, though - for example being able

Re: [HACKERS] 9.5: Memory-bounded HashAgg

2014-09-03 Thread Tomas Vondra
On 4.9.2014 00:42, Tomas Vondra wrote: > > Attached are two CSV files contain both raw results (4 runs per query), > and aggregated results (average of the runs), logs with complete logs > and explain (analyze) plans of the queries for inspection. Of course, I forgot to attach the CSV files ... he

Re: [HACKERS] 9.5: Memory-bounded HashAgg

2014-08-28 Thread Tomas Vondra
On 26.8.2014 21:38, Jeff Davis wrote: > On Tue, 2014-08-26 at 12:39 +0300, Heikki Linnakangas wrote: >> I think this is enough for this commitfest - we have consensus on >> the design. For the next one, please address those open items, and >> resubmit. > > Agreed, return with feedback. > > I need

Re: [HACKERS] 9.5: Memory-bounded HashAgg

2014-08-26 Thread Jeff Davis
On Tue, 2014-08-26 at 12:39 +0300, Heikki Linnakangas wrote: > I think this is enough for this commitfest - we have consensus on the > design. For the next one, please address those open items, and resubmit. Agreed, return with feedback. I need to get the accounting patch in first, which needs t

Re: [HACKERS] 9.5: Memory-bounded HashAgg

2014-08-26 Thread Heikki Linnakangas
Summary of this thread so far: There was a lot of discussion comparing this with Tomas Vondra's Hash Join patch. The conclusion was that while it would be nice to be able to dump transition state to disk, for aggregates like array_agg, the patch is fine as it is. Dumping transition states woul

Re: [HACKERS] 9.5: Memory-bounded HashAgg

2014-08-21 Thread Jeff Davis
On Wed, 2014-08-20 at 14:32 -0400, Robert Haas wrote: > Well, I think you're certainly right that a hash table lookup is more > expensive than modulo on a 32-bit integer; so much is obvious. But if > the load factor is not too large, I think that it's not a *lot* more > expensive, so it could be w

Re: [HACKERS] 9.5: Memory-bounded HashAgg

2014-08-20 Thread Robert Haas
On Sun, Aug 17, 2014 at 1:17 PM, Tomas Vondra wrote: > Being able to batch inner and outer relations in a matching way is > certainly one of the reasons why hashjoin uses that particular scheme. > There are other reasons, though - for example being able to answer 'Does > this group belong to this

Re: [HACKERS] 9.5: Memory-bounded HashAgg

2014-08-19 Thread Tomas Vondra
On 19 Srpen 2014, 9:52, Jeff Davis wrote: > On Fri, 2014-08-15 at 13:53 -0400, Robert Haas wrote: >> I think that's right, and I rather like your (Jeff's) approach. It's >> definitely true that we could do better if we have a mechanism for >> serializing and deserializing group states, but (1) I t

Re: [HACKERS] 9.5: Memory-bounded HashAgg

2014-08-19 Thread Jeff Davis
On Fri, 2014-08-15 at 13:53 -0400, Robert Haas wrote: > I think that's right, and I rather like your (Jeff's) approach. It's > definitely true that we could do better if we have a mechanism for > serializing and deserializing group states, but (1) I think an awful > lot of cases would get an awful

Re: [HACKERS] 9.5: Memory-bounded HashAgg

2014-08-17 Thread Tomas Vondra
On 15.8.2014 19:53, Robert Haas wrote: > On Thu, Aug 14, 2014 at 2:21 PM, Jeff Davis wrote: >> On Thu, 2014-08-14 at 12:53 -0400, Tom Lane wrote: >>> Oh? So if we have aggregates like array_agg whose memory footprint >>> increases over time, the patch completely fails to avoid bloat? >> >> Yes, i

Re: [HACKERS] 9.5: Memory-bounded HashAgg

2014-08-17 Thread Tomas Vondra
On 10.8.2014 23:26, Jeff Davis wrote: > This patch is requires the Memory Accounting patch, or something similar > to track memory usage. > > The attached patch enables hashagg to spill to disk, which means that > hashagg will contain itself to work_mem even if the planner makes a > bad misestimat

Re: [HACKERS] 9.5: Memory-bounded HashAgg

2014-08-15 Thread Robert Haas
On Thu, Aug 14, 2014 at 2:21 PM, Jeff Davis wrote: > On Thu, 2014-08-14 at 12:53 -0400, Tom Lane wrote: >> Oh? So if we have aggregates like array_agg whose memory footprint >> increases over time, the patch completely fails to avoid bloat? > > Yes, in its current form. > >> I might think a patch

Re: [HACKERS] 9.5: Memory-bounded HashAgg

2014-08-14 Thread Tomas Vondra
On 14.8.2014 21:47, Tomas Vondra wrote: > On 14.8.2014 18:54, Jeff Davis wrote: >> On Thu, 2014-08-14 at 16:17 +0200, Tomas Vondra wrote: >>> Either it belongs to the current batch (and either it's in the hash >>> table, or you add it there), or it's not - in that case write it to a >>> temp file.

Re: [HACKERS] 9.5: Memory-bounded HashAgg

2014-08-14 Thread Tomas Vondra
On 14.8.2014 18:54, Jeff Davis wrote: > On Thu, 2014-08-14 at 16:17 +0200, Tomas Vondra wrote: >> Either it belongs to the current batch (and either it's in the hash >> table, or you add it there), or it's not - in that case write it to a >> temp file. > > I think the part you left out is that you

Re: [HACKERS] 9.5: Memory-bounded HashAgg

2014-08-14 Thread Jeff Davis
On Thu, 2014-08-14 at 12:53 -0400, Tom Lane wrote: > Oh? So if we have aggregates like array_agg whose memory footprint > increases over time, the patch completely fails to avoid bloat? Yes, in its current form. > I might think a patch with such a limitation was useful, if it weren't > for the f

Re: [HACKERS] 9.5: Memory-bounded HashAgg

2014-08-14 Thread Atri Sharma
On Thu, Aug 14, 2014 at 10:21 PM, Tomas Vondra wrote: > On 14 Srpen 2014, 18:02, Atri Sharma wrote: > > On Thursday, August 14, 2014, Jeff Davis wrote: > > > >> On Thu, 2014-08-14 at 10:06 -0400, Tom Lane wrote: > >> > If you're following the HashJoin model, then what you do is the same > >> thi

Re: [HACKERS] 9.5: Memory-bounded HashAgg

2014-08-14 Thread Jeff Davis
On Thu, 2014-08-14 at 16:17 +0200, Tomas Vondra wrote: > Either it belongs to the current batch (and either it's in the hash > table, or you add it there), or it's not - in that case write it to a > temp file. I think the part you left out is that you need two files per batch: one for the dumped-o

Re: [HACKERS] 9.5: Memory-bounded HashAgg

2014-08-14 Thread Tom Lane
"Tomas Vondra" writes: > On 14 Srpen 2014, 18:12, Tom Lane wrote: >> Not sure that I follow your point. You're going to have to deal with that >> no matter what, no? > That is not how the patch work. Once the memory consumption hits work_mem, > it keeps the already existing groups in memory, and

Re: [HACKERS] 9.5: Memory-bounded HashAgg

2014-08-14 Thread Tomas Vondra
On 14 Srpen 2014, 18:02, Atri Sharma wrote: > On Thursday, August 14, 2014, Jeff Davis wrote: > >> On Thu, 2014-08-14 at 10:06 -0400, Tom Lane wrote: >> > If you're following the HashJoin model, then what you do is the same >> thing >> > it does: you write the input tuple back out to the pending b

Re: [HACKERS] 9.5: Memory-bounded HashAgg

2014-08-14 Thread Tomas Vondra
On 14 Srpen 2014, 18:12, Tom Lane wrote: > Jeff Davis writes: >> HashJoin only deals with tuples. With HashAgg, you have to deal with a >> mix of tuples and partially-computed aggregate state values. Not >> impossible, but it is a little more awkward than HashJoin. > > Not sure that I follow your

Re: [HACKERS] 9.5: Memory-bounded HashAgg

2014-08-14 Thread Tom Lane
Jeff Davis writes: > HashJoin only deals with tuples. With HashAgg, you have to deal with a > mix of tuples and partially-computed aggregate state values. Not > impossible, but it is a little more awkward than HashJoin. Not sure that I follow your point. You're going to have to deal with that no

Re: [HACKERS] 9.5: Memory-bounded HashAgg

2014-08-14 Thread Atri Sharma
On Thursday, August 14, 2014, Jeff Davis wrote: > On Thu, 2014-08-14 at 10:06 -0400, Tom Lane wrote: > > If you're following the HashJoin model, then what you do is the same > thing > > it does: you write the input tuple back out to the pending batch file for > > the hash partition that now conta

Re: [HACKERS] 9.5: Memory-bounded HashAgg

2014-08-14 Thread Jeff Davis
On Thu, 2014-08-14 at 10:06 -0400, Tom Lane wrote: > If you're following the HashJoin model, then what you do is the same thing > it does: you write the input tuple back out to the pending batch file for > the hash partition that now contains key 1001, whence it will be processed > when you get to

Re: [HACKERS] 9.5: Memory-bounded HashAgg

2014-08-14 Thread Tomas Vondra
On 14 Srpen 2014, 9:22, Jeff Davis wrote: > I think the hash-join like approach is reasonable, but I also think > you're going to run into a lot of challenges that make it more complex > for HashAgg. For instance, let's say you have the query: > > SELECT x, array_agg(y) FROM foo GROUP BY x; > > S

Re: [HACKERS] 9.5: Memory-bounded HashAgg

2014-08-14 Thread Tom Lane
Jeff Davis writes: > I think the hash-join like approach is reasonable, but I also think > you're going to run into a lot of challenges that make it more complex > for HashAgg. For instance, let's say you have the query: > SELECT x, array_agg(y) FROM foo GROUP BY x; > Say the transition state

Re: [HACKERS] 9.5: Memory-bounded HashAgg

2014-08-14 Thread Jeff Davis
I think the hash-join like approach is reasonable, but I also think you're going to run into a lot of challenges that make it more complex for HashAgg. For instance, let's say you have the query: SELECT x, array_agg(y) FROM foo GROUP BY x; Say the transition state is an array (for the sake of s

Re: [HACKERS] 9.5: Memory-bounded HashAgg

2014-08-13 Thread Tomas Vondra
On 13.8.2014 12:31, Tomas Vondra wrote: > On 13 Srpen 2014, 7:02, Jeff Davis wrote: >> On Tue, 2014-08-12 at 14:58 +0200, Tomas Vondra wrote: >>> >>>(b) bad estimate of required memory -> this is common for aggregates >>>passing 'internal' state (planner uses some quite high defaults) >

Re: [HACKERS] 9.5: Memory-bounded HashAgg

2014-08-13 Thread Tomas Vondra
On 13 Srpen 2014, 7:02, Jeff Davis wrote: > On Tue, 2014-08-12 at 14:58 +0200, Tomas Vondra wrote: >> CREATE AGGREGATE myaggregate ( >> ... >> SERIALIZE_FUNC = 'dump_data', >> DESERIALIZE_FUNC = 'read_data', >> ... >> ); > > Seems reasonable. > >> I don't see why it should get messy

Re: [HACKERS] 9.5: Memory-bounded HashAgg

2014-08-12 Thread Jeff Davis
On Tue, 2014-08-12 at 14:58 +0200, Tomas Vondra wrote: > CREATE AGGREGATE myaggregate ( > ... > SERIALIZE_FUNC = 'dump_data', > DESERIALIZE_FUNC = 'read_data', > ... > ); Seems reasonable. > I don't see why it should get messy? In the end, you have a chunk of > data and a hash for

Re: [HACKERS] 9.5: Memory-bounded HashAgg

2014-08-12 Thread Tomas Vondra
On 12 Srpen 2014, 7:06, Jeff Davis wrote: > On Mon, 2014-08-11 at 01:29 +0200, Tomas Vondra wrote: >> On 10.8.2014 23:26, Jeff Davis wrote: >> > This patch is requires the Memory Accounting patch, or something >> > similar to track memory usage. >> >> I think the patch you sent actually includes th

Re: [HACKERS] 9.5: Memory-bounded HashAgg

2014-08-11 Thread Jeff Davis
On Mon, 2014-08-11 at 01:29 +0200, Tomas Vondra wrote: > On 10.8.2014 23:26, Jeff Davis wrote: > > This patch is requires the Memory Accounting patch, or something > > similar to track memory usage. > > I think the patch you sent actually includes the accounting patch. Is > that on purpose, or by

Re: [HACKERS] 9.5: Memory-bounded HashAgg

2014-08-10 Thread Tomas Vondra
Hi, it's 1AM here, so only a few comments after quickly reading the patch. On 10.8.2014 23:26, Jeff Davis wrote: > This patch is requires the Memory Accounting patch, or something > similar to track memory usage. I think the patch you sent actually includes the accounting patch. Is that on purpo

[HACKERS] 9.5: Memory-bounded HashAgg

2014-08-10 Thread Jeff Davis
This patch is requires the Memory Accounting patch, or something similar to track memory usage. The attached patch enables hashagg to spill to disk, which means that hashagg will contain itself to work_mem even if the planner makes a bad misestimate of the cardinality. This is a well-known concep