Hi all, How do Flink formats relate to or interact with Paimon (formerly Flink-Table-Store)? If the Flink format interface is used there, then it may be useful to consider Arrow along with other columnar formats.
Separately, from previous experience, I've seen the Arrow format be useful as an output format for clients to read efficiently. Arrow does support returning batches of records, so there may be some options to use the format in a streaming situation where a sufficient collection of records can be gathered. Cheers, Jim On Thu, Mar 30, 2023 at 8:32 AM Martijn Visser <martijnvis...@apache.org> wrote: > Hi, > > To be honest, I haven't seen that much demand for supporting the Arrow > format directly in Flink as a flink-format. I'm wondering if there's really > much benefit for the Flink project to add another file format, over > properly supporting the format that we already have in the project. > > Best regards, > > Martijn > > On Thu, Mar 30, 2023 at 2:21 PM Ran Tao <chucheng...@gmail.com> wrote: > > > It is a good point that flink integrates apache arrow as a format. > > Arrow can take advantage of SIMD-specific or vectorized optimizations, > > which should be of great benefit to batch tasks. > > However, as mentioned in the issue you listed, it may take a lot of work > > and the community's consideration for integrating Arrow. > > > > I think you can try to make a simple poc for verification and some > specific > > plans. > > > > > > Best Regards, > > Ran Tao > > > > > > Aitozi <gjying1...@gmail.com> 于2023年3月29日周三 19:12写道: > > > > > Hi guys > > > I'm opening this thread to discuss supporting the Apache Arrow > > format > > > in Flink. > > > Arrow is a language-independent columnar memory format that has > > become > > > widely used in different systems, and It can also serve as an > > > inter-exchange format between other systems. > > > So, using it directly in the Flink system will be nice. We also > received > > > some requests from slack[1][2] and jira[3]. > > > In our company's internal usage, we have used flink-python > moudle's > > > ArrowReader and ArrowWriter to support this work. But it still can not > > > integrate with the current flink-formats framework closely. > > > So, I'd like to introduce the flink-arrow formats module to support the > > > arrow format naturally. > > > Looking forward to some suggestions. > > > > > > > > > Best, > > > Aitozi > > > > > > > > > [1]: > > https://apache-flink.slack.com/archives/C03GV7L3G2C/p1677915016551629 > > > > > > [2]: > > https://apache-flink.slack.com/archives/C03GV7L3G2C/p1666326443152789 > > > > > > [3]: https://issues.apache.org/jira/browse/FLINK-10929 > > > > > >