Hi Clayton,

It seems like an interesting improvement.  Given that Parquet is
columnar, you would expect some space savings.  I guess the big question
is, would each batch of records become a single parquet file?  And how
does this integrate with the existing logic, which might assume that
each record can be serialized on its own?

best,
Colin


On Sun, May 7, 2017, at 02:36, Clayton Wohl wrote:
> With the Kafka Connect S3 sink, I can choose Avro or JSON output format.
> Is
> there any chance that Parquet will be supported?
> 
> For record at a time processing, Parquet isn't a good fit. But for
> reading/writing batches of records, which is what the Kafka Connect Sink
> is
> writing, Parquet is generally better than Avro.
> 
> Would attempting writing support for this be wise to try or not?

Reply via email to