Re: Output batch to Kafka

2018-06-05 Thread Stephan Ewen
You could go with Chesnay's suggestion, which might be the quickest fix. Creating a KafkaOutputFormat (possibly wrapping the KafkaProducer) would be a bit cleaner. Would be happy to have that as a contribution, actually ;-) If you care about producing "exactly once" using Kafka Transactions (Kaf

Re: Output batch to Kafka

2018-06-04 Thread Chesnay Schepler
This depends a little bit on your requirements. If it just about reading data from HDFS and writing it into Kafka, then it should be possible to simply wrap a KafkaProducer in a RichMapFunction that you use as a sink in your DataSet program. However you could also use the Streaming API for tha

Output batch to Kafka

2018-06-04 Thread Oleksandr Nitavskyi
Hello Squirrels, Flink has a wonderful Kafka connector. We need to move data from HDFS to Kafka. Confluent is proposing to use Kafka-connect for this, but probably it can be easier to use Flink for such task, much higher abstraction, less details to manage, easier for our context. Do you know