Thanks for the reply.  My understanding of the current avro sink/source is that 
the schema is just a simple header/body schema.  What I ultimately want to do 
is read a log file and write it into an avro file on a HDFS at sink B.  
However, before that I would like to parse the log file at source A first and 
come up with a more detailed schema which might include for example date, 
priority, subsystem, and log message fields.  I am trying to “normalize’ the 
events so that we could also potentially filter certain log file fields at the 
source.  Does this make sense?

On Sep 6, 2014, at 11:52 AM, Ashish <paliwalash...@gmail.com> wrote:

> I am not sure I understand the question correctly, let me try to answer based 
> on my understanding
> 
> source A -> channel A -> sink A ———> source B -> channel B -> sink B
> 
> For the scenario, Sink A has to be an Avro sink and Source B has to be an 
> Avro Source for the flow to work.
> Flume would use avro for RPC (look at flume-ng-sdk/src/main/avro/flume.avdl). 
> It defines how Flume would send Event(s) across using Avro RPC. 
> 
> 1. Source A (spooled dir) would read files and create Events from it and 
> insert into channel
> 2. Sink A (Avro sink) would read from Channel, and would translate Event into 
> AvroFlumeEvent for sending to Source B (Avro Source)
> 3. Source B would read from AvroFlumeEvent and create an Event and insert 
> into channel, which shall be processed by Sink B
> 
> It's not that event would be wrapped into events as it traverse down the 
> chain. Avro encoding would just exist between Sink A and Source B. 
> 
> Based on my understanding, you are looking at encoding log file lines using 
> avro. In that case, the avro encoded log file lines would be part of Event 
> body, rest would be same as Step 1-3
> 
> HTH !
> 
> 
> On Fri, Sep 5, 2014 at 12:58 AM, Ed Judge <ejud...@gmail.com> wrote:
> Ok, I have looked over the source and it is making a little more sense.
> 
> I think what I ultimately want to do is this:
> 
> source A -> channel A -> sink A ———> source B -> channel B -> sink B
> 
> source A will be looking at a log file.  Each line of the log file will have 
> a certain format/schema. I would write Source A such that it could write the 
> schema/line as an event into the channel and pass that through the system all 
> the way ultimately to sink B so that it would know the schema also.  
> I was thinking Avro would be a good format for source A to use when writing 
> into it’s channel.  If Sink A is an existing Avro Sink and Source B is an 
> exiting Avro source, would this still work?  Does this mean I would have 2 
> Avro headers (one encapsulating the other) which wasteful or can the existing 
> Avro source and sink deal with this unmodified?  Is there a better way to 
> accomplish what I want to do?  Just looking for some guidance.
> 
> Thanks,
> Ed
> 
> On Sep 4, 2014, at 4:44 AM, Ashish <paliwalash...@gmail.com> wrote:
> 
>> Avro records shall have the schema embedded with them. Have a look at 
>> source, that shall help a bit
>> 
>> 
>> On Wed, Sep 3, 2014 at 10:30 PM, Ed Judge <ejud...@gmail.com> wrote:
>> That’s helpful but isn’t there some type of Avro schema negotiation that 
>> occurs?
>> 
>> -Ed
>> 
>> On Sep 3, 2014, at 12:02 AM, Jeff Lord <jl...@cloudera.com> wrote:
>> 
>>> Ed,
>>> 
>>> Did you take a look at the javadoc in the source?
>>> Basically the source uses netty as a server and the sink is just an rpc 
>>> client.
>>> If you read over the doc which is in the two links below and take a look at 
>>> the developer guide and still have questions just ask away and someone will 
>>> help to answer.
>>> 
>>> https://github.com/apache/flume/blob/trunk/flume-ng-core/src/main/java/org/apache/flume/source/AvroSource.java
>>> 
>>> https://github.com/apache/flume/blob/trunk/flume-ng-core/src/main/java/org/apache/flume/sink/AvroSink.java
>>> 
>>> https://flume.apache.org/FlumeDeveloperGuide.html#transaction-interface
>>> 
>>> -Jeff
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> On Tue, Sep 2, 2014 at 6:36 PM, Ed Judge <ejud...@gmail.com> wrote:
>>> Does anyone know of any good documentation that talks about the 
>>> protocol/negotiation used between an Avro source and sink?
>>> 
>>> Thanks,
>>> Ed
>>> 
>>> 
>> 
>> 
>> 
>> 
>> -- 
>> thanks
>> ashish
>> 
>> Blog: http://www.ashishpaliwal.com/blog
>> My Photo Galleries: http://www.pbase.com/ashishpaliwal
> 
> 
> 
> 
> -- 
> thanks
> ashish
> 
> Blog: http://www.ashishpaliwal.com/blog
> My Photo Galleries: http://www.pbase.com/ashishpaliwal

Reply via email to