Hi hackers,

Vertical (columnar) storage mode is most optimal for analytic and this is why 
it is widely used in databases oriented on OLAP, such as Vertica, HyPer,KDB,...
In Postgres we have cstore extension which is not able to provide all benefits 
of vertical model because of lack of support of vector operations in executor.
Situation can be changed if we will have pluggable storage API with support of 
vectorized execution.

But veritcal model is not so good for updates and load of data (because data is 
mostly imported in horizontal format).
This is why in most of the existed systems data is presentin both formats (at 
least for some time).

I want to announce new model, "diagonal storage" which combines benefits of 
both approaches.
The idea is very simple: we first store column 1 of first record, then column 2 
of second record, ... and so on until we reach the last column.
After it we store second column of first record, third column of the second 
record,...

Profiling of TPC-H queries shows that mode of the time of query exectution 
(about 17%) is spent is heap_deform_tuple.
New format will allow to significantly reduce time of heap deforming, because 
there is just of column if the particular record in each tile.
Moreover over we can perform deforming of many tuples in parallel, which ids 
especially efficient at quantum computers.

Attach please find patch with first prototype implementation. It provides about 
3.14 times improvement of performance at most of TPC-H queries.


--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment: diagonal.patch.gz
Description: GNU Zip compressed data

Reply via email to