Hi Konstantin, On Tue, Feb 25, 2020 at 6:44 PM Konstantin Knizhnik < k.knizh...@postgrespro.ru> wrote:
> > > On 25.02.2020 11:06, Hubert Zhang wrote: > > Hi Konstantin, > > I checkout your branch pg13 in repo > https://github.com/zhangh43/vectorize_engine > After I fixed some compile error, I tested Q1 on TPCH-10G > The result is different from yours and vectorize version is too slow. Note > that I disable parallel worker by default. > no JIT no Vectorize: 36 secs > with JIT only: 23 secs > with Vectorize only: 33 secs > JIT + Vectorize: 29 secs > > My config option is `CFLAGS='-O3 -g -march=native' > --prefix=/usr/local/pgsql/ --disable-cassert --enable-debug --with-llvm` > I will do some spike on why vectorized is so slow. Could you please > provide your compile option and the TPCH dataset size and your > queries(standard Q1?) to help me to debug on it. > > > > Hi, Hubert > > Sorry, looks like I have used slightly deteriorated snapshot of master so > I have not noticed some problems. > Fixes are committed. > > Most of the time is spent in unpacking heap tuple > (tts_buffer_heap_getsomeattrs): > > 24.66% postgres postgres [.] tts_buffer_heap_getsomeattrs > 8.28% postgres vectorize_engine.so [.] VExecStoreColumns > 5.94% postgres postgres [.] HeapTupleSatisfiesVisibility > 4.21% postgres postgres [.] bpchareq > 4.12% postgres vectorize_engine.so [.] vfloat8_accum > > > In my version of nodeSeqscan I do not keep all fetched 1024 heap tuples > but stored there attribute values in vector columns immediately. > But to avoid extraction of useless data it is necessary to know list of > used columns. > The same problem is solved in zedstore, but unfortunately there is no > existed method in Postgres to get list > of used attributes. I have done it but my last implementation contains > error which cause loading of all columns. > Fixed version is committed. > > Now profile without JIT is: > > 15.52% postgres postgres [.] tts_buffer_heap_getsomeattrs > 10.25% postgres postgres [.] ExecInterpExpr > 6.54% postgres postgres [.] HeapTupleSatisfiesVisibility > 5.12% postgres vectorize_engine.so [.] VExecStoreColumns > 4.86% postgres postgres [.] bpchareq > 4.80% postgres vectorize_engine.so [.] vfloat8_accum > 3.78% postgres postgres [.] tts_minimal_getsomeattrs > 3.66% postgres vectorize_engine.so [.] VExecAgg > 3.38% postgres postgres [.] hashbpchar > > and with JIT: > > 13.88% postgres postgres [.] tts_buffer_heap_getsomeattrs > 7.15% postgres vectorize_engine.so [.] vfloat8_accum > 6.03% postgres postgres [.] HeapTupleSatisfiesVisibility > 5.55% postgres postgres [.] bpchareq > 4.42% postgres vectorize_engine.so [.] VExecStoreColumns > 4.19% postgres postgres [.] hashbpchar > 4.09% postgres vectorize_engine.so [.] vfloat8pl > > I also tested Q1 with your latest code. Result of vectorized is still slow. PG13 native: 38 secs PG13 Vec: 30 secs PG13 JIT: 23 secs PG13 JIT+Vec: 27 secs My perf result is as belows. There are three parts: 1. lookup_hash_entry(43.5%) this part is not vectorized yet. 2. scan part: fetch_input_tuple(36%) 3. vadvance_aggregates part(20%) I also perfed on PG96 vectorized version and got similar perf results and running time of vectorized PG96 and PG13 are also similar. But PG13 is much faster than PG96. So I just wonder whether we merge all the latest executor code of PG13 into the vectorized PG13 branch? - agg_fill_hash_table ◆ - 43.50% lookup_hash_entry (inlined) ▒ + 39.07% LookupTupleHashEntry ▒ 0.56% ExecClearTuple (inlined) ▒ - 36.06% fetch_input_tuple ▒ - ExecProcNode (inlined) ▒ - 36.03% VExecScan ▒ - 34.60% ExecScanFetch (inlined) ▒ - ExecScanFetch (inlined) ▒ - VSeqNext ▒ + 16.64% table_scan_getnextslot (inlined) ▒ - 10.29% slot_getsomeattrs (inlined) ▒ - 10.17% slot_getsomeattrs_int ▒ + tts_buffer_heap_getsomeattrs ▒ 7.14% VExecStoreColumns ▒ + 1.38% ExecQual (inlined) ▒ - 20.30% Vadvance_aggregates (inlined) ▒ - 17.46% Vadvance_transition_function (inlined) ▒ + 11.95% vfloat8_accum ▒ + 4.74% vfloat8pl ▒ 0.75% vint8inc_any ▒ + 2.77% ExecProject (inlined) -- Thanks Hubert Zhang