Hi all,

I am writing some some jobs intended to run using the DataStream API using
a Kafka source. However we also have a lot of data in Avro archives (of the
same Kafka source). I would like to be able to run the processing code over
parts of the archive so I can generate some "example output".

I've written the transformations needed to read the data from the archives
and process the data, but now I'm trying to figure out the best way to
write the results of this to some storage.

At the moment I can easily write to Json or CSV using the bucketing sink
(although I'm curious about using the watermark time rather than system
time to name the buckets), but I'd really like to store to something
smaller like Avro.

However I'm not sure this make sense. Writing to a compressed file format
in this way from a streaming job doesn't sound intuitively right. What
would make the most sense. I could write to some temporary database and
then pipe that into an archive, but this seems like a lot of trouble. Is
there a way to pipe the output directly into the batch API of flink?

Thanks

Reply via email to