I was wondering if anybody has already tried to mirror a kafka topic to hdfs just copying the log files from the topic directory of the broker (like 00000000000023244237.log).
The file format is very simple : https://twitter.com/amiorin/status/576448691139121152/photo/1 Implementing an InputFormat should not be so difficult. Any drawbacks?