On Mon, Dec 21, 2015 at 4:02 AM, David Rowley <david.row...@2ndquadrant.com> wrote: > Ok, so it seems that my mindset was not in parallel process space when I > was thinking about this. I think having the pointer in the Tuple is > probably going to be fine for this multiple stage aggregation when that's > occurring in a single backend process, but obviously if the memory that the > pointer points to belongs to a worker process in a parallel aggregate > situation, then bad things will happen. > > Now, there has been talk of this previously, on various threads, but I don't > believe any final decisions were made on how exactly it should be done. At > the moment I plan to make changes as follows: > > Add 3 new columns to pg_aggregate, aggserialfn, aggdeserialfn and > aggserialtype These will only be required when aggtranstype is INTERNAL. > Perhaps we should disallow CREATE AGGREAGET from accepting them for any > other type... The return type of aggserialfn should be aggserialtype, and it > should take a single parameter of aggtranstype. aggdeserialfn will be the > reverse of that. > Add a new bool field to nodeAgg's state named serialStates. If this is field > is set to true then when we're in finalizeAgg = false mode, we'll call the > serialfn on the agg state instead of the finalfn. This will allow the > serialized state to be stored in the tuple and sent off to the main backend. > The combine agg node should also be set to serialStates = true, so that it > knows to deserialize instead of just assuming that the agg state is of type > aggtranstype. > > I believe this should allow us to not cause any performance regressions by > moving away from INTERNAL agg states. It should also be very efficient for > internal states such as Int8TransTypeData, as this struct merely has 2 int64 > fields which should be very simple to stuff into a bytea varlena type. We > don't need to mess around converting the ->count and ->sum into a strings or > anything. > > Then in order for the planner to allow parallel aggregation all aggregates > must: > > Not have a DISTINCT or ORDER BY clause > Have a combinefn > If aggtranstype = INTERNAL, must have a aggserialfn and aggdeserialfn. > > We can relax the requirement on 3 if we're using 2-stage aggregation, but > not parallel aggregation. > > Any objections, or better ideas?
Can we use Tom's expanded-object stuff instead of introducing aggserialfn and aggdeserialfn? In other words, if you have a aggtranstype = INTERNAL, then what we do is: 1. Create a new data type that represents the transition state. 2. Use expanded-object notation for that data type when we're just within a single process, and flatten it when we need to send it between processes. One thing to keep in mind is that we also want to be able to support a plan that involves having one or more remote servers do partial aggregation, send us the partial values, combine them across servers and possibly also with locally computed-values, and the finalize the aggregation. So it would be nice if there were a way to invoke the aggregate function from SQL and get a transition value back rather than a final value. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers