Thanks Kailash for bringing this up. I think this is a good idea. By
passing the ParquetWriter we gain much more flexibility.

I did a small PR on adding the ability to add compression to the Parquet
writer: https://github.com/apache/flink/pull/7547 But I believe this is the
wrong approach. For example, I'd like to tune the Page sizes as well, but
this requires another PR and a lot of code changes, to simply set this due
to the rather complex structure of the builder pattern. Personally, I would
prefer it to just pass a generic ParquetWriter to the BulkWriter. This
makes it much easier, and I don't see the added value of having the
constructor here.

Cheers, Fokko

Op ma 22 apr. 2019 om 19:53 schreef Jakob Homan <jgho...@gmail.com>:

> +1.  Sounds good to me.
> -Jakob
>
> On Mon, Apr 22, 2019 at 9:00 AM Kailash Dayanand
> <kdayan...@lyft.com.invalid> wrote:
> >
> > Friendly remainder. Any thoughts on this approach ?
> >
> > On Tue, Apr 9, 2019 at 11:36 AM Kailash Dayanand <kdayan...@lyft.com>
> wrote:
> >
> > > cc'ing few folks who are interested in this discussion.
> > >
> > > On Tue, Apr 9, 2019 at 11:35 AM Kailash Dayanand <kdayan...@lyft.com>
> > > wrote:
> > >
> > >> Hello,
> > >>
> > >> I am looking to contribute a ProtoParquetWriter support which can be
> used
> > >> in Bulk format for the StreamingFileSink api. There has been earlier
> > >> discussions on this in the user mailing list: https://goo.gl/ya2StL
> and
> > >> thought it would be a good addition to have.
> > >>
> > >> For implementation, looking at the current APIs present at
> > >> ProtoParquetWriter with the parguet project (
> http://tinyurl.com/y378be42),
> > >> it looks like there is some different in the interface between Avro
> and
> > >> Proto writes (ProtoParquetWriter does not have a builder class as
> well as
> > >> not interface with Outputfile). Due to this, I was looking at directly
> > >> extending the ParquetWriter within Flink to define the Builder static
> class
> > >> and have newer interfaces. This is needed as the bulk writer takes a
> > >> builder to crate the ParquetWriter in the bulkWriter.Factory. (
> > >> http://tinyurl.com/yyg9cn9b)
> > >>
> > >> Any thoughts if this is a reasonable approach?
> > >>
> > >> Thanks
> > >> Kailash
> > >>
> > >
>

Reply via email to