Hi, On 2017-11-27 16:31:21 -0800, Andres Freund wrote: > this is part of my work to make expression evaluation JITable. In a lot > of analytics queries the major bottleneck is transition function > invocation (makes sense, hardly anyone wants to see billions of > rows). Therefore for JITing to be really valuable transition function > stuff needs to be JITable. > > Excerpt from the preliminary commit message: > > Previously aggregate transition and combination functions were invoked > by special case code in nodeAgg.c, evaluting input and filters > separately using the expression evaluation machinery. That turns out > to not be great for performance for several reasons: > - repeated expression evaluations have some cost > - the transition functions invocations are poorly predicted > - filter and input computation had to be done separately > - the special case code made it hard to implement JITing of the whole > transition function invocation > > Address this by building one large expression that computes input, > evaluates filters, and invokes transition functions. > > This leads to moderate speedups in queries bottlenecked by aggregate > computations, and enables large speedups for similar cases once JITing > is done.
> While this gets rid of a substantial amount of duplication between the > infrastructure for transition and combine functions, it still increases > codesize a bit. There's still two callers of advance_transition_function() left, namely process_ordered_aggregate_{single,multi}. Rearchitecting this so they also go through expression-ified transition invocation seems like material for a separate patch, this is complicated enough... > Todo / open Questions: > - Location of transition function building functions. Currently they're > in execExpr.c. That allows not to expose a bunch of functions local to > it, but requires exposing some aggregate structs to the world. We > could go the other way round as well. I've left this as is. > - Right now we waste a bunch of time by having to access transition > states indexed by both grouping set number and the transition state > offset therein. It'd be nicer if we could cheaply reduce the number of > indirections, but I can't quite see how without adding additional > complications. I've left this as is. Here's a considerably polished variant of this patch. I plan to do another round of polishing next week, and then push it, unless somebody else has comments. Regards, Andres