Hi, Did anybody think of (mis-) using Flink streaming as an alternative to Apache Flume just for ingesting data from Kafka (or other streaming sources) to HDFS? Knowing that Flink can read from Kafka and write to hdfs I assume it should be possible, but Is this a good idea to do?
Flume basically is about consuming data from somewhere, peeking into each record and then directing it to a specific directory/file in HDFS reliably. I've seen there is a FlumeSink, but would it be possible to get the same functionality with Flink alone? I've skimmed through the documentation and found the option to split the output by key and the possibility to add multiple sinks. As I understand, Flink programs are generally static, so it would not be possible to add/remove sinks at runtime? So you would need to implement a custom sink directing the records to different files based on a key (e.g. date)? Would it be difficult to implement things like rolling outputs etc? Or better just use Flume? Best, Hans-Peter