On Thu, 2014-12-11 at 02:46 -0800, Jeff Davis wrote:
> On Sun, 2014-08-10 at 14:26 -0700, Jeff Davis wrote:
> > This patch is requires the Memory Accounting patch, or something similar
> > to track memory usage.
> >
> > The attached patch enables hashagg to spill to disk, which means that
> > hash
On 11.12.2014 11:46, Jeff Davis wrote:
>
> New patch attached. All open items are complete, though the patch may
> have a few rough edges.
>
> Summary of changes:
>
> * rebased on top of latest memory accounting patch
> http://www.postgresql.org/message-id/1417497257.5584.5.camel@jeff-desktop
>
On Sun, 2014-08-10 at 14:26 -0700, Jeff Davis wrote:
> This patch is requires the Memory Accounting patch, or something similar
> to track memory usage.
>
> The attached patch enables hashagg to spill to disk, which means that
> hashagg will contain itself to work_mem even if the planner makes a
>
On Wed, Sep 3, 2014 at 7:34 PM, Tomas Vondra wrote:
>> Well, I think you're certainly right that a hash table lookup is more
>> expensive than modulo on a 32-bit integer; so much is obvious. But if
>> join can increase the number of batches on the fly, but only by
>> doubling it, so you might go
On 4.9.2014 01:34, Tomas Vondra wrote:
> On 20.8.2014 20:32, Robert Haas wrote:
>>
>> As I see it, the advantage of Jeff's approach is that it doesn't
>> really matter whether our estimates are accurate or not. We don't
>> have to decide at the beginning how many batches to do, and then
>> possibl
On 20.8.2014 20:32, Robert Haas wrote:
> On Sun, Aug 17, 2014 at 1:17 PM, Tomas Vondra wrote:
>> Being able to batch inner and outer relations in a matching way is
>> certainly one of the reasons why hashjoin uses that particular scheme.
>> There are other reasons, though - for example being able
On 4.9.2014 00:42, Tomas Vondra wrote:
>
> Attached are two CSV files contain both raw results (4 runs per query),
> and aggregated results (average of the runs), logs with complete logs
> and explain (analyze) plans of the queries for inspection.
Of course, I forgot to attach the CSV files ... he
On 26.8.2014 21:38, Jeff Davis wrote:
> On Tue, 2014-08-26 at 12:39 +0300, Heikki Linnakangas wrote:
>> I think this is enough for this commitfest - we have consensus on
>> the design. For the next one, please address those open items, and
>> resubmit.
>
> Agreed, return with feedback.
>
> I need
On Tue, 2014-08-26 at 12:39 +0300, Heikki Linnakangas wrote:
> I think this is enough for this commitfest - we have consensus on the
> design. For the next one, please address those open items, and resubmit.
Agreed, return with feedback.
I need to get the accounting patch in first, which needs t
Summary of this thread so far:
There was a lot of discussion comparing this with Tomas Vondra's Hash
Join patch. The conclusion was that while it would be nice to be able to
dump transition state to disk, for aggregates like array_agg, the patch
is fine as it is. Dumping transition states woul
On Wed, 2014-08-20 at 14:32 -0400, Robert Haas wrote:
> Well, I think you're certainly right that a hash table lookup is more
> expensive than modulo on a 32-bit integer; so much is obvious. But if
> the load factor is not too large, I think that it's not a *lot* more
> expensive, so it could be w
On Sun, Aug 17, 2014 at 1:17 PM, Tomas Vondra wrote:
> Being able to batch inner and outer relations in a matching way is
> certainly one of the reasons why hashjoin uses that particular scheme.
> There are other reasons, though - for example being able to answer 'Does
> this group belong to this
On 19 Srpen 2014, 9:52, Jeff Davis wrote:
> On Fri, 2014-08-15 at 13:53 -0400, Robert Haas wrote:
>> I think that's right, and I rather like your (Jeff's) approach. It's
>> definitely true that we could do better if we have a mechanism for
>> serializing and deserializing group states, but (1) I t
On Fri, 2014-08-15 at 13:53 -0400, Robert Haas wrote:
> I think that's right, and I rather like your (Jeff's) approach. It's
> definitely true that we could do better if we have a mechanism for
> serializing and deserializing group states, but (1) I think an awful
> lot of cases would get an awful
On 15.8.2014 19:53, Robert Haas wrote:
> On Thu, Aug 14, 2014 at 2:21 PM, Jeff Davis wrote:
>> On Thu, 2014-08-14 at 12:53 -0400, Tom Lane wrote:
>>> Oh? So if we have aggregates like array_agg whose memory footprint
>>> increases over time, the patch completely fails to avoid bloat?
>>
>> Yes, i
On 10.8.2014 23:26, Jeff Davis wrote:
> This patch is requires the Memory Accounting patch, or something similar
> to track memory usage.
>
> The attached patch enables hashagg to spill to disk, which means that
> hashagg will contain itself to work_mem even if the planner makes a
> bad misestimat
On Thu, Aug 14, 2014 at 2:21 PM, Jeff Davis wrote:
> On Thu, 2014-08-14 at 12:53 -0400, Tom Lane wrote:
>> Oh? So if we have aggregates like array_agg whose memory footprint
>> increases over time, the patch completely fails to avoid bloat?
>
> Yes, in its current form.
>
>> I might think a patch
On 14.8.2014 21:47, Tomas Vondra wrote:
> On 14.8.2014 18:54, Jeff Davis wrote:
>> On Thu, 2014-08-14 at 16:17 +0200, Tomas Vondra wrote:
>>> Either it belongs to the current batch (and either it's in the hash
>>> table, or you add it there), or it's not - in that case write it to a
>>> temp file.
On 14.8.2014 18:54, Jeff Davis wrote:
> On Thu, 2014-08-14 at 16:17 +0200, Tomas Vondra wrote:
>> Either it belongs to the current batch (and either it's in the hash
>> table, or you add it there), or it's not - in that case write it to a
>> temp file.
>
> I think the part you left out is that you
On Thu, 2014-08-14 at 12:53 -0400, Tom Lane wrote:
> Oh? So if we have aggregates like array_agg whose memory footprint
> increases over time, the patch completely fails to avoid bloat?
Yes, in its current form.
> I might think a patch with such a limitation was useful, if it weren't
> for the f
On Thu, Aug 14, 2014 at 10:21 PM, Tomas Vondra wrote:
> On 14 Srpen 2014, 18:02, Atri Sharma wrote:
> > On Thursday, August 14, 2014, Jeff Davis wrote:
> >
> >> On Thu, 2014-08-14 at 10:06 -0400, Tom Lane wrote:
> >> > If you're following the HashJoin model, then what you do is the same
> >> thi
On Thu, 2014-08-14 at 16:17 +0200, Tomas Vondra wrote:
> Either it belongs to the current batch (and either it's in the hash
> table, or you add it there), or it's not - in that case write it to a
> temp file.
I think the part you left out is that you need two files per batch: one
for the dumped-o
"Tomas Vondra" writes:
> On 14 Srpen 2014, 18:12, Tom Lane wrote:
>> Not sure that I follow your point. You're going to have to deal with that
>> no matter what, no?
> That is not how the patch work. Once the memory consumption hits work_mem,
> it keeps the already existing groups in memory, and
On 14 Srpen 2014, 18:02, Atri Sharma wrote:
> On Thursday, August 14, 2014, Jeff Davis wrote:
>
>> On Thu, 2014-08-14 at 10:06 -0400, Tom Lane wrote:
>> > If you're following the HashJoin model, then what you do is the same
>> thing
>> > it does: you write the input tuple back out to the pending b
On 14 Srpen 2014, 18:12, Tom Lane wrote:
> Jeff Davis writes:
>> HashJoin only deals with tuples. With HashAgg, you have to deal with a
>> mix of tuples and partially-computed aggregate state values. Not
>> impossible, but it is a little more awkward than HashJoin.
>
> Not sure that I follow your
Jeff Davis writes:
> HashJoin only deals with tuples. With HashAgg, you have to deal with a
> mix of tuples and partially-computed aggregate state values. Not
> impossible, but it is a little more awkward than HashJoin.
Not sure that I follow your point. You're going to have to deal with that
no
On Thursday, August 14, 2014, Jeff Davis wrote:
> On Thu, 2014-08-14 at 10:06 -0400, Tom Lane wrote:
> > If you're following the HashJoin model, then what you do is the same
> thing
> > it does: you write the input tuple back out to the pending batch file for
> > the hash partition that now conta
On Thu, 2014-08-14 at 10:06 -0400, Tom Lane wrote:
> If you're following the HashJoin model, then what you do is the same thing
> it does: you write the input tuple back out to the pending batch file for
> the hash partition that now contains key 1001, whence it will be processed
> when you get to
On 14 Srpen 2014, 9:22, Jeff Davis wrote:
> I think the hash-join like approach is reasonable, but I also think
> you're going to run into a lot of challenges that make it more complex
> for HashAgg. For instance, let's say you have the query:
>
> SELECT x, array_agg(y) FROM foo GROUP BY x;
>
> S
Jeff Davis writes:
> I think the hash-join like approach is reasonable, but I also think
> you're going to run into a lot of challenges that make it more complex
> for HashAgg. For instance, let's say you have the query:
> SELECT x, array_agg(y) FROM foo GROUP BY x;
> Say the transition state
I think the hash-join like approach is reasonable, but I also think
you're going to run into a lot of challenges that make it more complex
for HashAgg. For instance, let's say you have the query:
SELECT x, array_agg(y) FROM foo GROUP BY x;
Say the transition state is an array (for the sake of s
On 13.8.2014 12:31, Tomas Vondra wrote:
> On 13 Srpen 2014, 7:02, Jeff Davis wrote:
>> On Tue, 2014-08-12 at 14:58 +0200, Tomas Vondra wrote:
>>>
>>>(b) bad estimate of required memory -> this is common for aggregates
>>>passing 'internal' state (planner uses some quite high defaults)
>
On 13 Srpen 2014, 7:02, Jeff Davis wrote:
> On Tue, 2014-08-12 at 14:58 +0200, Tomas Vondra wrote:
>> CREATE AGGREGATE myaggregate (
>> ...
>> SERIALIZE_FUNC = 'dump_data',
>> DESERIALIZE_FUNC = 'read_data',
>> ...
>> );
>
> Seems reasonable.
>
>> I don't see why it should get messy
On Tue, 2014-08-12 at 14:58 +0200, Tomas Vondra wrote:
> CREATE AGGREGATE myaggregate (
> ...
> SERIALIZE_FUNC = 'dump_data',
> DESERIALIZE_FUNC = 'read_data',
> ...
> );
Seems reasonable.
> I don't see why it should get messy? In the end, you have a chunk of
> data and a hash for
On 12 Srpen 2014, 7:06, Jeff Davis wrote:
> On Mon, 2014-08-11 at 01:29 +0200, Tomas Vondra wrote:
>> On 10.8.2014 23:26, Jeff Davis wrote:
>> > This patch is requires the Memory Accounting patch, or something
>> > similar to track memory usage.
>>
>> I think the patch you sent actually includes th
On Mon, 2014-08-11 at 01:29 +0200, Tomas Vondra wrote:
> On 10.8.2014 23:26, Jeff Davis wrote:
> > This patch is requires the Memory Accounting patch, or something
> > similar to track memory usage.
>
> I think the patch you sent actually includes the accounting patch. Is
> that on purpose, or by
Hi,
it's 1AM here, so only a few comments after quickly reading the patch.
On 10.8.2014 23:26, Jeff Davis wrote:
> This patch is requires the Memory Accounting patch, or something
> similar to track memory usage.
I think the patch you sent actually includes the accounting patch. Is
that on purpo
This patch is requires the Memory Accounting patch, or something similar
to track memory usage.
The attached patch enables hashagg to spill to disk, which means that
hashagg will contain itself to work_mem even if the planner makes a
bad misestimate of the cardinality.
This is a well-known concep
38 matches
Mail list logo