On Wed, Nov 27, 2019 at 06:38:45PM +0300, Konstantin Knizhnik wrote:
On 25.11.2019 18:24, Merlin Moncure wrote:
On Mon, Nov 25, 2019 at 9:09 AM Konstantin Knizhnik
<k.knizh...@postgrespro.ru> wrote:
JIT was not able to significantly (times) increase speed on Q1 query?
Experiment with VOPS shows that used aggregation algorithm itself is not
a bottleneck.
Profile also give no answer for this question.
Any ideas?
Well, in the VOPS variant around 2/3 of the time is spent in routines
that are obviously aggregation. In the JIT version, it's around 20%.
So this suggests that the replacement execution engine is more
invasive. I would also guess (!) that the VOPS engine optimizes fewer
classes of query plan. ExecScan for example, looks to be completely
optimized out VOPS but is still utilized in the JIT engine.
The difference in fraction of time spent in aggregate calculation is
not so large (2 times vs. 10 times).
I suspected that a lot of time is spent in relation traversal code,
tuple unpacking and visibility checks.
To check this hypothesis I have implement in-memory table access
method which stores tuples in unpacked form and
doesn't perform any visibility checks at all.
Results were not so existed. I have to disable parallel execution
(because it is not possible for tuples stored in backend private
memory).
Results are the following:
lineitem: 13736 msec
inmem_lineitem: 10044 msec
vops_lineitem: 1945 msec
The profile of inmem_lineitem is the following:
16.79% postgres postgres [.] float4_accum
12.86% postgres postgres [.] float8_accum
5.83% postgres postgres [.] TupleHashTableHash.isra.8
4.44% postgres postgres [.] lookup_hash_entries
3.37% postgres postgres [.] check_float8_array
3.11% postgres postgres [.] tuplehash_insert
2.91% postgres postgres [.] hash_uint32
2.83% postgres postgres [.] ExecScan
2.56% postgres postgres [.] inmem_getnextslot
2.22% postgres postgres [.] FunctionCall1Coll
2.14% postgres postgres [.] LookupTupleHashEntry
1.95% postgres postgres [.] TupleHashTableMatch.isra.9
1.76% postgres postgres [.] pg_detoast_datum
1.58% postgres postgres [.] AggCheckCallContext
1.57% postgres postgres [.] tts_minimal_clear
1.35% postgres perf-3054.map [.] 0x00007f558db60010
1.23% postgres postgres [.] fetch_input_tuple
1.15% postgres postgres [.] SeqNext
1.06% postgres postgres [.] ExecAgg
1.00% postgres postgres [.] tts_minimal_store_tuple
So now fraction of time spent in aggregation is increased to 30% (vs.
20% for lineitem and 42% for vops_lineitem).
Looks like the main bottleneck now is hashagg. VOPS is accessing hash
about 10 times less (because it accumulates values for the whole
tile).
And it explains still large difference bwtween vops_lineitem and
inmem_lineitem.
If we remove aggregation and rewrite Q1 query as:
select
avg(l_quantity) as sum_qty,
avg(l_extendedprice) as sum_base_price,
avg(l_extendedprice*(1-l_discount)) as sum_disc_price,
avg(l_extendedprice*(1-l_discount)*(1+l_tax)) as sum_charge,
avg(l_quantity) as avg_qty,
avg(l_extendedprice) as avg_price,
avg(l_discount) as avg_disc,
count(*) as count_order
from
inmem_lineitem
where
l_shipdate <= '1998-12-01';
then results are the following:
lineitem: 9805 msec
inmem_lineitem: 6257 msec
vops_lineitem: 1865 msec
and now profile of inmem_lineitem is:
25.27% postgres postgres [.] float4_accum
21.86% postgres postgres [.] float8_accum
5.49% postgres postgres [.] check_float8_array
4.57% postgres postgres [.] ExecScan
2.61% postgres postgres [.] AggCheckCallContext
2.30% postgres postgres [.] pg_detoast_datum
2.10% postgres postgres [.] inmem_getnextslot
1.81% postgres postgres [.] SeqNext
1.73% postgres postgres [.] fetch_input_tuple
1.61% postgres postgres [.] ExecAgg
1.23% postgres postgres [.] MemoryContextReset
But still more than 3 times difference with VOPS!
Something is wrong here...
I have no idea what VOPS does, but IIRC one of the bottlenecks compared
to various column stores is our iterative execution model, which makes
it difficult/imposible to vectorize operations. That's likely why the
accum functions are so high in the CPU profile.
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services