Thanks Kailash for bringing this up. I think this is a good idea. By passing the ParquetWriter we gain much more flexibility.
I did a small PR on adding the ability to add compression to the Parquet writer: https://github.com/apache/flink/pull/7547 But I believe this is the wrong approach. For example, I'd like to tune the Page sizes as well, but this requires another PR and a lot of code changes, to simply set this due to the rather complex structure of the builder pattern. Personally, I would prefer it to just pass a generic ParquetWriter to the BulkWriter. This makes it much easier, and I don't see the added value of having the constructor here. Cheers, Fokko Op ma 22 apr. 2019 om 19:53 schreef Jakob Homan <jgho...@gmail.com>: > +1. Sounds good to me. > -Jakob > > On Mon, Apr 22, 2019 at 9:00 AM Kailash Dayanand > <kdayan...@lyft.com.invalid> wrote: > > > > Friendly remainder. Any thoughts on this approach ? > > > > On Tue, Apr 9, 2019 at 11:36 AM Kailash Dayanand <kdayan...@lyft.com> > wrote: > > > > > cc'ing few folks who are interested in this discussion. > > > > > > On Tue, Apr 9, 2019 at 11:35 AM Kailash Dayanand <kdayan...@lyft.com> > > > wrote: > > > > > >> Hello, > > >> > > >> I am looking to contribute a ProtoParquetWriter support which can be > used > > >> in Bulk format for the StreamingFileSink api. There has been earlier > > >> discussions on this in the user mailing list: https://goo.gl/ya2StL > and > > >> thought it would be a good addition to have. > > >> > > >> For implementation, looking at the current APIs present at > > >> ProtoParquetWriter with the parguet project ( > http://tinyurl.com/y378be42), > > >> it looks like there is some different in the interface between Avro > and > > >> Proto writes (ProtoParquetWriter does not have a builder class as > well as > > >> not interface with Outputfile). Due to this, I was looking at directly > > >> extending the ParquetWriter within Flink to define the Builder static > class > > >> and have newer interfaces. This is needed as the bulk writer takes a > > >> builder to crate the ParquetWriter in the bulkWriter.Factory. ( > > >> http://tinyurl.com/yyg9cn9b) > > >> > > >> Any thoughts if this is a reasonable approach? > > >> > > >> Thanks > > >> Kailash > > >> > > > >