Hi Clayton, It seems like an interesting improvement. Given that Parquet is columnar, you would expect some space savings. I guess the big question is, would each batch of records become a single parquet file? And how does this integrate with the existing logic, which might assume that each record can be serialized on its own?
best, Colin On Sun, May 7, 2017, at 02:36, Clayton Wohl wrote: > With the Kafka Connect S3 sink, I can choose Avro or JSON output format. > Is > there any chance that Parquet will be supported? > > For record at a time processing, Parquet isn't a good fit. But for > reading/writing batches of records, which is what the Kafka Connect Sink > is > writing, Parquet is generally better than Avro. > > Would attempting writing support for this be wise to try or not?