Performance chart. FYI. Best, Liya Fan [image: image.png]
On Tue, Jul 2, 2019 at 3:37 PM Fan Liya <liya.fa...@gmail.com> wrote: > Hi all, > > We have opened an issue about vectorization in Flink (FLINK-13053 > <https://issues.apache.org/jira/browse/FLINK-13053>). Would you please > give your valuable feedback? Thank you in advance. > > Vectorization is a popular technique in SQL engines today. Compared with > traditional row-based approach, it has some distinct advantages, for > example: > > > > 1) Better use of CPU resources (e.g. SIMD) > > 2) More compact memory layout > > 3) More friendly to compressed data format. > > > > Currently, Flink is based on a row-based SQL engine for both stream and > batch workloads. To enjoy the above benefits, we want to bring > vectorization to Flink. This involves substantial changes to the existing > code base. Therefore, we give a plan to carry out such changes in small, > incremental steps, in order not to affect existing features. We want to > apply it to batch workload first. The details can be found in our proposal. > > > > For the past months, we have developed an initial implementation of the > above ideas. Initial performance evaluations on TPC-H benchmarks show that > substantial performance improvements can be obtained by vectorization (see > the figure below). More details can be found in our proposal. > > > > [image: > https://lh5.googleusercontent.com/hjXkXGImWOjaiB8zF0SKIMoItY6VCBm-BmJWWEXRo0ZPHdwLgKzCmIoNKef1YPCaAA7NXN6RvO-nwBBXBee52KeAtBjyIvh_NcAuChvW3BEtQuZGL5GPddqxL_iMV7HvEVCC6k-m] > > > > Special thanks to @Kurt Young’s team for all the kind help. > > Special thanks to @Piotr Nowojski for all the valuable feedback and help > suggestions. > > > Best, > > Liya Fan >