On 18 January 2016 at 14:36, Haribabu Kommi <kommi.harib...@gmail.com> wrote:
> On Sat, Jan 16, 2016 at 12:00 PM, David Rowley > <david.row...@2ndquadrant.com> wrote: > > On 16 January 2016 at 03:03, Robert Haas <robertmh...@gmail.com> wrote: > >> > >> On Tue, Dec 29, 2015 at 7:39 PM, David Rowley > >> <david.row...@2ndquadrant.com> wrote: > >> >> No, the idea I had in mind was to allow it to continue to exist in > the > >> >> expanded format until you really need it in the varlena format, and > >> >> then serialize it at that point. You'd actually need to do the > >> >> opposite: if you get an input that is not in expanded format, expand > >> >> it. > >> > > >> > Admittedly I'm struggling to see how this can be done. I've spent a > good > >> > bit > >> > of time analysing how the expanded object stuff works. > >> > > >> > Hypothetically let's say we can make it work like: > >> > > >> > 1. During partial aggregation (finalizeAggs = false), in > >> > finalize_aggregates(), where we'd normally call the final function, > >> > instead > >> > flatten INTERNAL states and store the flattened Datum instead of the > >> > pointer > >> > to the INTERNAL state. > >> > 2. During combining aggregation (combineStates = true) have all the > >> > combine > >> > functions written in such a ways that the INTERNAL states expand the > >> > flattened states before combining the aggregate states. > >> > > >> > Does that sound like what you had in mind? > >> > >> More or less. But what I was really imagining is that we'd get rid of > >> the internal states and replace them with new datatypes built to > >> purpose. So, for example, for string_agg(text, text) you could make a > >> new datatype that is basically a StringInfo. In expanded form, it > >> really is a StringInfo. When you flatten it, you just get the string. > >> When somebody expands it again, they again have a StringInfo. So the > >> RW pointer to the expanded form supports append cheaply. > > > > > > That sounds fine in theory, but where and how do you suppose we determine > > which expand function to call? Nothing exists in the catalogs to decide > this > > currently. > > I am thinking of transition function returns and accepts the StringInfoData > instead of PolyNumAggState internal data for int8_avg_accum for example. > hmm, so wouldn't that mean that the transition function would need to (for each input tuple): 1. Parse that StringInfo into tokens. 2. Create a new aggregate state object. 3. Populate the new aggregate state based on the tokenised StringInfo, this would perhaps require that various *_in() functions are called on each token. 4. Add the new tuple to the aggregate state. 5. Build a new StringInfo based on the aggregate state modified in 4. ? Currently the transition function only does 4, and performs 2 only if it's the first Tuple. Is that what you mean? as I'd say that would slow things down significantly! To get a gauge on how much more CPU work that would be for some aggregates, have a look at how simple int8_avg_accum() is currently when we have HAVE_INT128 defined. For the case of AVG(BIGINT) we just really have: state->sumX += newval; state->N++; The above code is step 4 only. So unless I've misunderstood you, that would need to turn into steps 1-5 above. Step 4 here is probably just a handful of instructions right now, but adding code for steps 1,2,3 and 5 would turn that into hundreds. I've been trying to avoid any overhead by adding the serializeStates flag to make_agg() so that we can maintain the same performance when we're just passing internal states around in the same process. This keeps the conversions between internal state and serialised state to a minimum. The StringInfoData is formed with the members of the PolyNumAggState > structure data. The input given StringInfoData is transformed into > PolyNumAggState data and finish the calculation and again form the > StringInfoData and return. Similar changes needs to be done for final > functions input type also. I am not sure whether this approach may have > some impact on performance? -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services