It is a good point that flink integrates apache arrow as a format. Arrow can take advantage of SIMD-specific or vectorized optimizations, which should be of great benefit to batch tasks. However, as mentioned in the issue you listed, it may take a lot of work and the community's consideration for integrating Arrow.
I think you can try to make a simple poc for verification and some specific plans. Best Regards, Ran Tao Aitozi <gjying1...@gmail.com> 于2023年3月29日周三 19:12写道: > Hi guys > I'm opening this thread to discuss supporting the Apache Arrow format > in Flink. > Arrow is a language-independent columnar memory format that has become > widely used in different systems, and It can also serve as an > inter-exchange format between other systems. > So, using it directly in the Flink system will be nice. We also received > some requests from slack[1][2] and jira[3]. > In our company's internal usage, we have used flink-python moudle's > ArrowReader and ArrowWriter to support this work. But it still can not > integrate with the current flink-formats framework closely. > So, I'd like to introduce the flink-arrow formats module to support the > arrow format naturally. > Looking forward to some suggestions. > > > Best, > Aitozi > > > [1]: https://apache-flink.slack.com/archives/C03GV7L3G2C/p1677915016551629 > > [2]: https://apache-flink.slack.com/archives/C03GV7L3G2C/p1666326443152789 > > [3]: https://issues.apache.org/jira/browse/FLINK-10929 >