Thanks Gonzalo. You are correct about the topology in that I'm using Kafka
channel as a source. Based on this thread
<http://search-hadoop.com/m/z1pWR1tSM5S1qr5Bt>, I was under the impression
that Kafka sink is redundant.

Heres the topology:
Agent#1: spooldir source -> morphlines (transforms to avro) -> kafka
channel (topic 'K1')
Agent#2: kafka source (topic 'K1') -> File channel -> HDFS sink

Please let me know if Agent#1 should be writing to a Kafka sink as well for
Agent#2 to use that as source and what is the difference?

Thanks!


On Tue, Sep 15, 2015 at 11:47 AM, Gonzalo Herreros <gherre...@gmail.com>
wrote:

> I'm not sure if I understand your topology and what you mean exactly by
> "used Kafka channel/sink", it would help if you send the configuration.
>
> My best guess about the error is that you are pointing the kafka source to
> a topic that is used by a channel and not by a kafka sink
>
> Regards,
> Gonzalo
>
>
> On Sep 15, 2015 6:42 PM, "Buntu Dev" <buntu...@gmail.com> wrote:
>
>> Currently I have a single flume agent that converts apache logs into Avro
>> and writes to HDFS sink. I'm looking for ways to create tiered topology and
>> want to have the Avro records available to other flume agents. I used Kafka
>> channel/sink to write these Avro records but was running into this error
>> when using the Kafka source to read the records:
>>
>>  Caused by: java.io.IOException: Not a data file.
>>     at
>> org.apache.avro.file.DataFileStream.initialize(DataFileStream.java:105)
>>     at org.apache.avro.file.DataFileReader.<init>(DataFileReader.java:97)
>>
>>
>> For using tiered topology, should I be using Avro sink and write to
>> host/port for other flume agent to read using Avro source? or is there any
>> other data format that I should consider if I want to stick with Kafka as
>> the channel/sink?
>>
>> Thanks!
>>
>

Reply via email to