Hi,

Did anybody think of (mis-) using Flink streaming as an alternative to
Apache Flume just for ingesting data from Kafka (or other streaming
sources) to HDFS? Knowing that Flink can read from Kafka and write to hdfs
I assume it should be possible, but Is this a good idea to do?

Flume basically is about consuming data from somewhere, peeking into each
record and then directing it to a specific directory/file in HDFS reliably.
I've seen there is a FlumeSink, but would it be possible to get the same
functionality with
Flink alone?

I've skimmed through the documentation and found the option to split the
output by key and the possibility to add multiple sinks. As I understand,
Flink programs are generally static, so it would not be possible to
add/remove sinks at runtime?
So you would need to implement a custom sink directing the records to
different files based on a key (e.g. date)? Would it be difficult to
implement things like rolling outputs etc? Or better just use Flume?

Best,
Hans-Peter

Reply via email to