Re: Parquet batch table sink in Flink 1.11

Flavio Pompermaier Mon, 27 Jul 2020 05:23:47 -0700

I think that's not true when you need to integrate Flink into an existing
data-lake..I think it should be very straightforward (in my opinion) to
read/ write Parquet data with objects serialized with
avro/thrift/protobuf...or at least reuse hadoop input/output formats with
table API. At the moment I have to pass through a lot of custom code that
uses the Hadoop formats and is a lto of code just to read and write thrift
or avro serialized objects in parquet folders.


On Wed, Jul 22, 2020 at 3:35 AM Jingsong Li <jingsongl...@gmail.com> wrote:

> In table/SQL,
>
> I think we don't need a source/sink for `AvroParquetOutputFormat`, because
> the data structure is always Row or RowData, should not be a avro object.
>
> Best,
> Jingsong
>
> On Tue, Jul 21, 2020 at 8:09 PM Flavio Pompermaier <pomperma...@okkam.it>
> wrote:
>
>> This is what I actually do but I was hoping to be able to get rid of the
>> HadoopOutputForma and be able to use a  more comfortable Source/Sink
>> implementation.
>>
>> On Tue, Jul 21, 2020 at 12:38 PM Jingsong Li <jingsongl...@gmail.com>
>> wrote:
>>
>>> Hi Flavio,
>>>
>>> AvroOutputFormat only supports writing Avro files.
>>> I think you can use `AvroParquetOutputFormat` as a hadoop output format,
>>> and wrap it through Flink `HadoopOutputFormat`.
>>>
>>> Best,
>>> Jingsong
>>>
>>> On Fri, Jul 17, 2020 at 11:59 PM Flavio Pompermaier <
>>> pomperma...@okkam.it> wrote:
>>>
>>>> Hi to all,
>>>> is there a way to write out Parquet-Avro data using
>>>> BatchTableEnvironment with Flink 1.11?
>>>> At the moment I'm using the hadoop ParquetOutputFormat but I hope to be
>>>> able to get rid of it sooner or later..I saw that there's the
>>>> AvroOutputFormat but no support for it using Parquet.
>>>>
>>>> Best,
>>>> Flavio
>>>>
>>>
>>>
>>> --
>>> Best, Jingsong Lee
>>>
>>
>
> --
> Best, Jingsong Lee
>

Re: Parquet batch table sink in Flink 1.11

Reply via email to