Re: [DISCUSS] Vectorization Support in Flink

Fan Liya Tue, 02 Jul 2019 00:40:36 -0700

Performance chart. FYI.

Best,
Liya Fan
[image: image.png]


On Tue, Jul 2, 2019 at 3:37 PM Fan Liya <liya.fa...@gmail.com> wrote:

> Hi all,
>
> We have opened an issue about vectorization in Flink (FLINK-13053
> <https://issues.apache.org/jira/browse/FLINK-13053>). Would you please
> give your valuable feedback? Thank you in advance.
>
> Vectorization is a popular technique in SQL engines today. Compared with
> traditional row-based approach, it has some distinct advantages, for
> example:
>
>
>
> 1)      Better use of CPU resources (e.g. SIMD)
>
> 2)      More compact memory layout
>
> 3)      More friendly to compressed data format.
>
>
>
> Currently, Flink is based on a row-based SQL engine for both stream and
> batch workloads. To enjoy the above benefits, we want to bring
> vectorization to Flink. This involves substantial changes to the existing
> code base. Therefore, we give a plan to carry out such changes in small,
> incremental steps, in order not to affect existing features. We want to
> apply it to batch workload first. The details can be found in our proposal.
>
>
>
> For the past months, we have developed an initial implementation of the
> above ideas. Initial performance evaluations on TPC-H benchmarks show that
> substantial performance improvements can be obtained by vectorization (see
> the figure below). More details can be found in our proposal.
>
>
>
> [image:
> https://lh5.googleusercontent.com/hjXkXGImWOjaiB8zF0SKIMoItY6VCBm-BmJWWEXRo0ZPHdwLgKzCmIoNKef1YPCaAA7NXN6RvO-nwBBXBee52KeAtBjyIvh_NcAuChvW3BEtQuZGL5GPddqxL_iMV7HvEVCC6k-m]
>
>
>
> Special thanks to @Kurt Young’s team for all the kind help.
>
> Special thanks to @Piotr Nowojski for all the valuable feedback and help
> suggestions.
>
>
> Best,
>
> Liya Fan
>

Re: [DISCUSS] Vectorization Support in Flink

Reply via email to