patchOn Thu, 1 Jul 2021 at 13:00, David Rowley <dgrowle...@gmail.com> wrote: > I've only looked at the numeric-agg-sumX2-overflow-v2.patch one and it > all looks mostly ok.
I forgot to mention earlier, after looking at the code a bit more I wondered if we should just serialise the NumericSumAccum instead of the NumericVar. If we're changing this, then maybe it's worth just doing it once and making it as optimal as possible. Right now we translate the NumericSumAccum to a NumericVar in the serial function, only to translate it back again in the deserial function. This is a bit of a waste of CPU effort, although it might be fewer bytes to copy over to the main process. Since deserial can be pretty hot if we've got a lot of workers throwing serialised aggregate states at the main process as fast as they all can go. Reducing the overheads in the serial part of the query could provide some big wins. I played about with the following case: create table num (a int not null, b numeric not null, c numeric not null, d numeric not null, e numeric not null, f numeric not null, g numeric not null, h numeric not null); insert into num select x,y,y,y,y,y,y,y from generate_series(1,1000000) x, generate_Series(1000000000,1000000099) y order by x; explain analyze select a,sum(b),sum(c),sum(d),sum(e),sum(f),sum(g),sum(h) from num group by a; To try and load up the main process as much as possible, I set 128 workers to run. The profile of the main process looked like: Master: 14.10% postgres [.] AllocSetAlloc 7.03% postgres [.] accum_sum_carry.part.0 4.87% postgres [.] ExecInterpExpr 3.72% postgres [.] numeric_sum 3.52% postgres [.] accum_sum_copy 3.06% postgres [.] pq_getmsgint 2.95% postgres [.] palloc 2.82% postgres [.] ExecStoreMinimalTuple 2.62% postgres [.] accum_sum_add 2.60% postgres [.] make_result_opt_error 2.58% postgres [.] numeric_avg_deserialize 2.21% [kernel] [k] copy_user_generic_string 2.08% postgres [.] tuplehash_insert_hash_internal So it is possible to get this stuff to show up. Your numeric-agg-sumX2-overflow-v2.patch patch speeds this up quite a bit already, so it makes me think it might be worth doing the extra work to further reduce the overhead. Master @ 3788c6678 Execution Time: 8306.319 ms Execution Time: 8407.785 ms Execution Time: 8491.056 ms Master + numeric-agg-sumX2-overflow-v2.patch Execution Time: 6633.278 ms Execution Time: 6657.350 ms Execution Time: 6568.184 ms A possible reason we might not want to do this is that we currently don't have a NumericSumAccum for some functions when the compiler has a working int128 type. At the moment we translate the int128 accumulator into a NumericVar. We could just serialize the int128 type in those cases. It would just mean the serialised format is not as consistent between different builds. We currently have nothing that depends on them matching across different machines. Do you think it's worth doing this? David