On 18 January 2016 at 16:44, Robert Haas <robertmh...@gmail.com> wrote:
> On Sun, Jan 17, 2016 at 9:26 PM, David Rowley > <david.row...@2ndquadrant.com> wrote: > > hmm, so wouldn't that mean that the transition function would need to > (for > > each input tuple): > > > > 1. Parse that StringInfo into tokens. > > 2. Create a new aggregate state object. > > 3. Populate the new aggregate state based on the tokenised StringInfo, > this > > would perhaps require that various *_in() functions are called on each > > token. > > 4. Add the new tuple to the aggregate state. > > 5. Build a new StringInfo based on the aggregate state modified in 4. > > > > ? > > I don't really know what you mean by parse the StringInfo into tokens. > The whole point of the expanded-object interface is to be able to keep > things in an expanded internal form so that you *don't* have to > repeatedly construct and deconstruct internal data structures. That was a response to Haribabu proposal, although perhaps I misunderstood that. However I'm not sure how a PolyNumAggState could be converted to a string and back again without doing any string parsing. I worked up an example of this approach using string_agg(), which I > attach here. This changes the transition type of string_agg() from > internal to text. The same code would work for bytea_string_agg(), > which would allow removal of some other code, but this patch doesn't > do that, because the point of this is to elucidate the approach. > > Many thanks for working up that patch. I was clearly missing what you meant previously. I understand it much better now. Thank you for taking the time on that. > In my tests, this seems to be slightly slower than what we're doing > today; worst of all, it adds a handful of cycles to > advance_transition_function() even when the aggregate is not an > expanded object or, indeed, not even pass-by-reference. Some of this > might be able to be fixed by a little massaging - in particular, > DatumIsReadWriteExpandedObject() seems like it could be partly or > entirely inlined, and maybe there's some other way to improve the > coding here. > It also seems that an expanded object requires a new memory context which must be malloc()d and free()d. This has added quite an overhead in my testing. I assume that we must be doing that so that we can ensure that all memory is properly free()d once we're done with the expanded-object. create table ab(a int, b text); insert into ab select x,'aaaaaaaaaaaaaaaaaaaaaaaaaaa' from generate_Series(1,1000000) x(x); set work_mem='1GB'; vacuum analyze ab; explain analyze select a%1000000,length(string_agg(b,',')) from ab group by 1; Patched 1521.045 ms 1515.905 ms 1530.920 ms Master 932.457 ms 959.285 ms 991.021 ms Although performance of the patched version is closer to master when we have less groups, I felt it necessary to show the extreme case. I feel this puts a bit of a dampener on the future of this idea, as I've previously had patches rejected for adding 2-5% on planning time alone, adding over 50% execution time seems like a dead-end. I've run profiles on this and malloc() and free() are both top of the profile with the patched version with about 30% CPU time each. > Generally, I think finding a way to pass expanded objects through > nodeAgg.c would be a good thing to pursue, if we can make it work. > The immediate impetus for changing things this way would be that we > wouldn't need to add a mechanism for serializing and deserializing > internal functions just to pass around partial aggregates. But > there's another advantage, too: right now, > advance_transition_function() does a lot of data copying to move data > from per-call context to the per-aggregate context. When an expanded > object is in use, this can be skipped. The part I quite liked about the serialize/deserialize is that there's no need to add any overhead at all for serializing and deserializing the states when the aggregation is done in a single backend process. We'd simply just have the planner pass the make_agg()'s serializeStates as false when we're working within a single backend. This does not appear possible with your proposed implementation, since it makes changes to each transition function. It is my understanding that we normally bend over backwards with new code to try and stop any performance regression. I'm not quite sure the expanded-object idea works well in this regard, but I do agree your approach seems neater. I just don't want to waste my time writing all the code required to replace all INTERNAL aggregate states when I know the performance regression is going to be unacceptable. I also witnessed another regression with your patch when testing on another machine. It caused the plan to change to a HashAgg instead of GroupAgg causing a significant slowdown. Unpatched # explain analyze select a%1000000,length(string_agg(b,',')) from ab group by 1; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------- GroupAggregate (cost=119510.84..144510.84 rows=1000000 width=32) (actual time=538.938..1015.278 rows=1000000 loops=1) Group Key: ((a % 1000000)) -> Sort (cost=119510.84..122010.84 rows=1000000 width=32) (actual time=538.917..594.194 rows=1000000 loops=1) Sort Key: ((a % 1000000)) Sort Method: quicksort Memory: 102702kB -> Seq Scan on ab (cost=0.00..19853.00 rows=1000000 width=32) (actual time=0.016..138.964 rows=1000000 loops=1) Planning time: 0.146 ms Execution time: 1047.511 ms Patched # explain analyze select a%1000000,length(string_agg(b,',')) from ab group by 1; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------ HashAggregate (cost=24853.00..39853.00 rows=1000000 width=32) (actual time=8072.346..144424.872 rows=1000000 loops=1) Group Key: (a % 1000000) -> Seq Scan on ab (cost=0.00..19853.00 rows=1000000 width=32) (actual time=0.025..481.332 rows=1000000 loops=1) Planning time: 0.164 ms Execution time: 263288.332 ms -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services