Re: [DISCUSS] Add support for Apache Arrow format

2023-04-12 Thread Aitozi
> Which connectors would be commonly used when reading in Arrow format? Filesystem? Currently, yes. The better way is it can be combined used with different connector, but I have not figured out how to integrate the Arrow format deserializer with the `DecodingFormat` or `DeserializationSchema` int

Re: [DISCUSS] Add support for Apache Arrow format

2023-04-12 Thread Martijn Visser
Which connectors would be commonly used when reading in Arrow format? Filesystem? On Wed, Apr 12, 2023 at 4:27 AM Jacky Lau wrote: > Hi >I also think arrow format will be useful when reading/writing with > message queue. >Arrow defines a language-independent columnar memory format for f

Re: [DISCUSS] Add support for Apache Arrow format

2023-04-11 Thread Jacky Lau
Hi I also think arrow format will be useful when reading/writing with message queue. Arrow defines a language-independent columnar memory format for flat and hierarchical data, organized for efficient analytic operations on modern hardware like CPUs and GPUs. The Arrow memory format also sup

Re: [DISCUSS] Add support for Apache Arrow format

2023-04-02 Thread Aitozi
Hi all, Thanks for your input. @Ran > However, as mentioned in the issue you listed, it may take a lot of work and the community's consideration for integrating Arrow. To clarify, this proposal solely aims to introduce flink-arrow as a new format, similar to flink-csv and flink-protobuf. It w

Re: [DISCUSS] Add support for Apache Arrow format

2023-03-30 Thread Jim Hughes
Hi all, How do Flink formats relate to or interact with Paimon (formerly Flink-Table-Store)? If the Flink format interface is used there, then it may be useful to consider Arrow along with other columnar formats. Separately, from previous experience, I've seen the Arrow format be useful as an ou

Re: [DISCUSS] Add support for Apache Arrow format

2023-03-30 Thread Martijn Visser
Hi, To be honest, I haven't seen that much demand for supporting the Arrow format directly in Flink as a flink-format. I'm wondering if there's really much benefit for the Flink project to add another file format, over properly supporting the format that we already have in the project. Best regar

Re: [DISCUSS] Add support for Apache Arrow format

2023-03-30 Thread Ran Tao
It is a good point that flink integrates apache arrow as a format. Arrow can take advantage of SIMD-specific or vectorized optimizations, which should be of great benefit to batch tasks. However, as mentioned in the issue you listed, it may take a lot of work and the community's consideration for i

[DISCUSS] Add support for Apache Arrow format

2023-03-29 Thread Aitozi
Hi guys I'm opening this thread to discuss supporting the Apache Arrow format in Flink. Arrow is a language-independent columnar memory format that has become widely used in different systems, and It can also serve as an inter-exchange format between other systems. So, using it directly i

Re: [Discuss] Add support for Apache Arrow

2019-04-11 Thread Flavio Pompermaier
Very BIG +1 for adoption of Apache Arrow. This would simplify a lot the integration with other tools On Thu, Apr 11, 2019 at 2:21 PM Run wrote: > Hi guys, > > > Apache Arrow provides a cross-language, standardized, columnar, memory > format for data. > So it is highly desirable to import Arrow t

[Discuss] Add support for Apache Arrow

2019-04-11 Thread Run
Hi guys, Apache Arrow provides a cross-language, standardized, columnar, memory format for data. So it is highly desirable to import Arrow to Flink, and make use of its memory layout and memory management facilities. More background on this can be found in https://issues.apache.org/jira/brows