Currently I have a single flume agent that converts apache logs into Avro and writes to HDFS sink. I'm looking for ways to create tiered topology and want to have the Avro records available to other flume agents. I used Kafka channel/sink to write these Avro records but was running into this error when using the Kafka source to read the records:
Caused by: java.io.IOException: Not a data file. at org.apache.avro.file.DataFileStream.initialize(DataFileStream.java:105) at org.apache.avro.file.DataFileReader.<init>(DataFileReader.java:97) For using tiered topology, should I be using Avro sink and write to host/port for other flume agent to read using Avro source? or is there any other data format that I should consider if I want to stick with Kafka as the channel/sink? Thanks!