I think that's not true when you need to integrate Flink into an existing data-lake..I think it should be very straightforward (in my opinion) to read/ write Parquet data with objects serialized with avro/thrift/protobuf...or at least reuse hadoop input/output formats with table API. At the moment I have to pass through a lot of custom code that uses the Hadoop formats and is a lto of code just to read and write thrift or avro serialized objects in parquet folders.
On Wed, Jul 22, 2020 at 3:35 AM Jingsong Li <jingsongl...@gmail.com> wrote: > In table/SQL, > > I think we don't need a source/sink for `AvroParquetOutputFormat`, because > the data structure is always Row or RowData, should not be a avro object. > > Best, > Jingsong > > On Tue, Jul 21, 2020 at 8:09 PM Flavio Pompermaier <pomperma...@okkam.it> > wrote: > >> This is what I actually do but I was hoping to be able to get rid of the >> HadoopOutputForma and be able to use a more comfortable Source/Sink >> implementation. >> >> On Tue, Jul 21, 2020 at 12:38 PM Jingsong Li <jingsongl...@gmail.com> >> wrote: >> >>> Hi Flavio, >>> >>> AvroOutputFormat only supports writing Avro files. >>> I think you can use `AvroParquetOutputFormat` as a hadoop output format, >>> and wrap it through Flink `HadoopOutputFormat`. >>> >>> Best, >>> Jingsong >>> >>> On Fri, Jul 17, 2020 at 11:59 PM Flavio Pompermaier < >>> pomperma...@okkam.it> wrote: >>> >>>> Hi to all, >>>> is there a way to write out Parquet-Avro data using >>>> BatchTableEnvironment with Flink 1.11? >>>> At the moment I'm using the hadoop ParquetOutputFormat but I hope to be >>>> able to get rid of it sooner or later..I saw that there's the >>>> AvroOutputFormat but no support for it using Parquet. >>>> >>>> Best, >>>> Flavio >>>> >>> >>> >>> -- >>> Best, Jingsong Lee >>> >> > > -- > Best, Jingsong Lee >