Hi Konstantin,

On Tue, Feb 25, 2020 at 6:44 PM Konstantin Knizhnik <
k.knizh...@postgrespro.ru> wrote:

>
>
> On 25.02.2020 11:06, Hubert Zhang wrote:
>
> Hi Konstantin,
>
> I checkout your branch pg13 in repo
> https://github.com/zhangh43/vectorize_engine
> After I fixed some compile error, I tested Q1 on TPCH-10G
> The result is different from yours and vectorize version is too slow. Note
> that I disable parallel worker by default.
> no JIT no Vectorize:  36 secs
> with JIT only:             23 secs
> with Vectorize only:   33 secs
> JIT + Vectorize:         29 secs
>
> My config option is `CFLAGS='-O3 -g -march=native'
> --prefix=/usr/local/pgsql/ --disable-cassert --enable-debug --with-llvm`
> I will do some spike on why vectorized is so slow. Could you please
> provide your compile option and the TPCH dataset size and your
> queries(standard Q1?) to help me to debug on it.
>
>
>
> Hi, Hubert
>
> Sorry, looks like I have used slightly deteriorated snapshot of master so
> I have not noticed some problems.
> Fixes are committed.
>
> Most of the time is spent in unpacking heap tuple
> (tts_buffer_heap_getsomeattrs):
>
>   24.66%  postgres  postgres             [.] tts_buffer_heap_getsomeattrs
>    8.28%  postgres  vectorize_engine.so  [.] VExecStoreColumns
>    5.94%  postgres  postgres             [.] HeapTupleSatisfiesVisibility
>    4.21%  postgres  postgres             [.] bpchareq
>    4.12%  postgres  vectorize_engine.so  [.] vfloat8_accum
>
>
> In my version of nodeSeqscan I do not keep all fetched 1024 heap tuples
> but stored there attribute values in vector columns immediately.
> But to avoid extraction of useless data it is necessary to know list of
> used columns.
> The same problem is solved in zedstore, but unfortunately there is no
> existed method in Postgres to get list
> of used attributes. I have done it but my last implementation contains
> error which cause loading of all columns.
> Fixed version is committed.
>
> Now profile without JIT is:
>
>  15.52%  postgres  postgres             [.] tts_buffer_heap_getsomeattrs
>   10.25%  postgres  postgres             [.] ExecInterpExpr
>    6.54%  postgres  postgres             [.] HeapTupleSatisfiesVisibility
>    5.12%  postgres  vectorize_engine.so  [.] VExecStoreColumns
>    4.86%  postgres  postgres             [.] bpchareq
>    4.80%  postgres  vectorize_engine.so  [.] vfloat8_accum
>    3.78%  postgres  postgres             [.] tts_minimal_getsomeattrs
>    3.66%  postgres  vectorize_engine.so  [.] VExecAgg
>    3.38%  postgres  postgres             [.] hashbpchar
>
> and with JIT:
>
>  13.88%  postgres  postgres             [.] tts_buffer_heap_getsomeattrs
>    7.15%  postgres  vectorize_engine.so  [.] vfloat8_accum
>    6.03%  postgres  postgres             [.] HeapTupleSatisfiesVisibility
>    5.55%  postgres  postgres             [.] bpchareq
>    4.42%  postgres  vectorize_engine.so  [.] VExecStoreColumns
>    4.19%  postgres  postgres             [.] hashbpchar
>    4.09%  postgres  vectorize_engine.so  [.] vfloat8pl
>
>
I also tested Q1 with your latest code. Result of vectorized is still slow.
PG13 native: 38 secs
PG13 Vec: 30 secs
PG13 JIT: 23 secs
PG13 JIT+Vec: 27 secs

My perf result is as belows. There are three parts:
1. lookup_hash_entry(43.5%) this part is not vectorized yet.
2. scan part: fetch_input_tuple(36%)
3. vadvance_aggregates part(20%)
I also perfed on PG96 vectorized version and got similar perf results and
running time of vectorized PG96 and PG13 are also similar. But PG13 is much
faster than PG96. So I just wonder whether we merge all the latest executor
code of PG13 into the vectorized PG13 branch?

- agg_fill_hash_table ◆ - 43.50% lookup_hash_entry (inlined) ▒ + 39.07%
LookupTupleHashEntry ▒ 0.56% ExecClearTuple (inlined) ▒ - 36.06%
fetch_input_tuple ▒ - ExecProcNode (inlined) ▒ - 36.03% VExecScan ▒ -
34.60% ExecScanFetch (inlined) ▒ - ExecScanFetch (inlined) ▒ - VSeqNext ▒ +
16.64% table_scan_getnextslot (inlined) ▒ - 10.29% slot_getsomeattrs
(inlined) ▒ - 10.17% slot_getsomeattrs_int ▒ + tts_buffer_heap_getsomeattrs
▒ 7.14% VExecStoreColumns ▒ + 1.38% ExecQual (inlined) ▒ - 20.30%
Vadvance_aggregates (inlined) ▒ - 17.46% Vadvance_transition_function
(inlined) ▒ + 11.95% vfloat8_accum ▒ + 4.74% vfloat8pl ▒ 0.75% vint8inc_any
▒ + 2.77% ExecProject (inlined)

-- 
Thanks

Hubert Zhang

Reply via email to