On Thu, Nov 28, 2019 at 2:08 AM Konstantin Knizhnik <k.knizh...@postgrespro.ru> wrote: > calls float4_accum for each row of T, the same query in VOPS will call > vops_float4_avg_accumulate for each tile which contains 64 elements. > So vops_float4_avg_accumulate is called 64 times less than float4_accum. > And inside it contains straightforward loop: > > for (i = 0; i < TILE_SIZE; i++) { > sum += opd->payload[i]; > } > > which can be optimized by compiler (loop unrolling, use of SIMD > instructions,...).
Part of the reason why the compiler can optimize that so well is probably related to the fact that it includes no overflow checks. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company