On 06.12.2019 19:52, Konstantin Knizhnik wrote:
On 06.12.2019 18:53, Robert Haas wrote:
On Thu, Nov 28, 2019 at 2:08 AM Konstantin Knizhnik
<k.knizh...@postgrespro.ru> wrote:
calls float4_accum for each row of T, the same query in VOPS will call
vops_float4_avg_accumulate for each tile which contains 64 elements.
So vops_float4_avg_accumulate is called 64 times less than
float4_accum.
And inside it contains straightforward loop:
for (i = 0; i < TILE_SIZE; i++) {
sum += opd->payload[i];
}
which can be optimized by compiler (loop unrolling, use of SIMD
instructions,...).
Part of the reason why the compiler can optimize that so well is
probably related to the fact that it includes no overflow checks.
May it makes sense to use in aggregate transformation function which
is not checking for overflow and perform this check only in final
function?
NaN and Inf values will be preserved in any case...
I have tried to comment check_float8_val in float4_pl/float8_pl and get
completely no difference in performance.
But if I replace query
select
sum(l_quantity) as sum_qty,
sum(l_extendedprice) as sum_base_price,
sum(l_extendedprice*(1-l_discount)) as sum_disc_price,
sum(l_extendedprice*(1-l_discount)*(1+l_tax)) as sum_charge,
sum(l_quantity) as avg_qty,
sum(l_extendedprice) as avg_price,
sum(l_discount) as avg_disc,
count(*) as count_order
from lineitem_inmem;
with
select sum(l_quantity + l_extendedprice + l_discount + l_tax) from
lineitem_inmem;
then time is reduced from 3686 to 1748 msec.
So at least half of this time we spend in expression evaluations and
aggregates accumulation.
--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company