Thanks Gonzalo. You are correct about the topology in that I'm using Kafka channel as a source. Based on this thread <http://search-hadoop.com/m/z1pWR1tSM5S1qr5Bt>, I was under the impression that Kafka sink is redundant.
Heres the topology: Agent#1: spooldir source -> morphlines (transforms to avro) -> kafka channel (topic 'K1') Agent#2: kafka source (topic 'K1') -> File channel -> HDFS sink Please let me know if Agent#1 should be writing to a Kafka sink as well for Agent#2 to use that as source and what is the difference? Thanks! On Tue, Sep 15, 2015 at 11:47 AM, Gonzalo Herreros <gherre...@gmail.com> wrote: > I'm not sure if I understand your topology and what you mean exactly by > "used Kafka channel/sink", it would help if you send the configuration. > > My best guess about the error is that you are pointing the kafka source to > a topic that is used by a channel and not by a kafka sink > > Regards, > Gonzalo > > > On Sep 15, 2015 6:42 PM, "Buntu Dev" <buntu...@gmail.com> wrote: > >> Currently I have a single flume agent that converts apache logs into Avro >> and writes to HDFS sink. I'm looking for ways to create tiered topology and >> want to have the Avro records available to other flume agents. I used Kafka >> channel/sink to write these Avro records but was running into this error >> when using the Kafka source to read the records: >> >> Caused by: java.io.IOException: Not a data file. >> at >> org.apache.avro.file.DataFileStream.initialize(DataFileStream.java:105) >> at org.apache.avro.file.DataFileReader.<init>(DataFileReader.java:97) >> >> >> For using tiered topology, should I be using Avro sink and write to >> host/port for other flume agent to read using Avro source? or is there any >> other data format that I should consider if I want to stick with Kafka as >> the channel/sink? >> >> Thanks! >> >