It is a good point that flink integrates apache arrow as a format.
Arrow can take advantage of SIMD-specific or vectorized optimizations,
which should be of great benefit to batch tasks.
However, as mentioned in the issue you listed, it may take a lot of work
and the community's consideration for integrating Arrow.

I think you can try to make a simple poc for verification and some specific
plans.


Best Regards,
Ran Tao


Aitozi <gjying1...@gmail.com> 于2023年3月29日周三 19:12写道:

> Hi guys
>      I'm opening this thread to discuss supporting the Apache Arrow format
> in Flink.
>      Arrow is a language-independent columnar memory format that has become
> widely used in different systems, and It can also serve as an
> inter-exchange format between other systems.
> So, using it directly in the Flink system will be nice. We also received
> some requests from slack[1][2] and jira[3].
>      In our company's internal usage, we have used flink-python moudle's
> ArrowReader and ArrowWriter to support this work. But it still can not
> integrate with the current flink-formats framework closely.
> So, I'd like to introduce the flink-arrow formats module to support the
> arrow format naturally.
>      Looking forward to some suggestions.
>
>
> Best,
> Aitozi
>
>
> [1]: https://apache-flink.slack.com/archives/C03GV7L3G2C/p1677915016551629
>
> [2]: https://apache-flink.slack.com/archives/C03GV7L3G2C/p1666326443152789
>
> [3]: https://issues.apache.org/jira/browse/FLINK-10929
>

Reply via email to