Re: Use Flink to write a Kafka topic to s3 as parquet files

2021-06-29 Thread Arvid Heise
Hi Thomas, The usual way with Avro would be to generate a class from your schema [1]. Then PlaySession would already be a SpecificRecord and you would avoid the extra step. I'm quite positive that the same way works with ParquetAvroWriters. Note that you would need to use ParquetAvroWriters#forSp

Use Flink to write a Kafka topic to s3 as parquet files

2021-06-22 Thread Thomas Wang
Hi, I'm trying to tail a Kafka topic and copy the data to s3 as parquet files. I'm using StreamingFileSink with ParquetAvroWriters. It works just fine. However, it looks like I have to generate the Avro schema and convert my POJO class to GenericRecord first (i.e. convert DataStream to DataStream)