Hi all,

We have opened an issue about vectorization in Flink (FLINK-13053
<https://issues.apache.org/jira/browse/FLINK-13053>). Would you please give
your valuable feedback? Thank you in advance.

Vectorization is a popular technique in SQL engines today. Compared with
traditional row-based approach, it has some distinct advantages, for
example:



1)      Better use of CPU resources (e.g. SIMD)

2)      More compact memory layout

3)      More friendly to compressed data format.



Currently, Flink is based on a row-based SQL engine for both stream and
batch workloads. To enjoy the above benefits, we want to bring
vectorization to Flink. This involves substantial changes to the existing
code base. Therefore, we give a plan to carry out such changes in small,
incremental steps, in order not to affect existing features. We want to
apply it to batch workload first. The details can be found in our proposal.



For the past months, we have developed an initial implementation of the
above ideas. Initial performance evaluations on TPC-H benchmarks show that
substantial performance improvements can be obtained by vectorization (see
the figure below). More details can be found in our proposal.



[image:
https://lh5.googleusercontent.com/hjXkXGImWOjaiB8zF0SKIMoItY6VCBm-BmJWWEXRo0ZPHdwLgKzCmIoNKef1YPCaAA7NXN6RvO-nwBBXBee52KeAtBjyIvh_NcAuChvW3BEtQuZGL5GPddqxL_iMV7HvEVCC6k-m]



Special thanks to @Kurt Young’s team for all the kind help.

Special thanks to @Piotr Nowojski for all the valuable feedback and help
suggestions.


Best,

Liya Fan

Reply via email to