Thanks for the reply. My understanding of the current avro sink/source is that the schema is just a simple header/body schema. What I ultimately want to do is read a log file and write it into an avro file on a HDFS at sink B. However, before that I would like to parse the log file at source A first and come up with a more detailed schema which might include for example date, priority, subsystem, and log message fields. I am trying to “normalize’ the events so that we could also potentially filter certain log file fields at the source. Does this make sense?
On Sep 6, 2014, at 11:52 AM, Ashish <paliwalash...@gmail.com> wrote: > I am not sure I understand the question correctly, let me try to answer based > on my understanding > > source A -> channel A -> sink A ———> source B -> channel B -> sink B > > For the scenario, Sink A has to be an Avro sink and Source B has to be an > Avro Source for the flow to work. > Flume would use avro for RPC (look at flume-ng-sdk/src/main/avro/flume.avdl). > It defines how Flume would send Event(s) across using Avro RPC. > > 1. Source A (spooled dir) would read files and create Events from it and > insert into channel > 2. Sink A (Avro sink) would read from Channel, and would translate Event into > AvroFlumeEvent for sending to Source B (Avro Source) > 3. Source B would read from AvroFlumeEvent and create an Event and insert > into channel, which shall be processed by Sink B > > It's not that event would be wrapped into events as it traverse down the > chain. Avro encoding would just exist between Sink A and Source B. > > Based on my understanding, you are looking at encoding log file lines using > avro. In that case, the avro encoded log file lines would be part of Event > body, rest would be same as Step 1-3 > > HTH ! > > > On Fri, Sep 5, 2014 at 12:58 AM, Ed Judge <ejud...@gmail.com> wrote: > Ok, I have looked over the source and it is making a little more sense. > > I think what I ultimately want to do is this: > > source A -> channel A -> sink A ———> source B -> channel B -> sink B > > source A will be looking at a log file. Each line of the log file will have > a certain format/schema. I would write Source A such that it could write the > schema/line as an event into the channel and pass that through the system all > the way ultimately to sink B so that it would know the schema also. > I was thinking Avro would be a good format for source A to use when writing > into it’s channel. If Sink A is an existing Avro Sink and Source B is an > exiting Avro source, would this still work? Does this mean I would have 2 > Avro headers (one encapsulating the other) which wasteful or can the existing > Avro source and sink deal with this unmodified? Is there a better way to > accomplish what I want to do? Just looking for some guidance. > > Thanks, > Ed > > On Sep 4, 2014, at 4:44 AM, Ashish <paliwalash...@gmail.com> wrote: > >> Avro records shall have the schema embedded with them. Have a look at >> source, that shall help a bit >> >> >> On Wed, Sep 3, 2014 at 10:30 PM, Ed Judge <ejud...@gmail.com> wrote: >> That’s helpful but isn’t there some type of Avro schema negotiation that >> occurs? >> >> -Ed >> >> On Sep 3, 2014, at 12:02 AM, Jeff Lord <jl...@cloudera.com> wrote: >> >>> Ed, >>> >>> Did you take a look at the javadoc in the source? >>> Basically the source uses netty as a server and the sink is just an rpc >>> client. >>> If you read over the doc which is in the two links below and take a look at >>> the developer guide and still have questions just ask away and someone will >>> help to answer. >>> >>> https://github.com/apache/flume/blob/trunk/flume-ng-core/src/main/java/org/apache/flume/source/AvroSource.java >>> >>> https://github.com/apache/flume/blob/trunk/flume-ng-core/src/main/java/org/apache/flume/sink/AvroSink.java >>> >>> https://flume.apache.org/FlumeDeveloperGuide.html#transaction-interface >>> >>> -Jeff >>> >>> >>> >>> >>> >>> >>> On Tue, Sep 2, 2014 at 6:36 PM, Ed Judge <ejud...@gmail.com> wrote: >>> Does anyone know of any good documentation that talks about the >>> protocol/negotiation used between an Avro source and sink? >>> >>> Thanks, >>> Ed >>> >>> >> >> >> >> >> -- >> thanks >> ashish >> >> Blog: http://www.ashishpaliwal.com/blog >> My Photo Galleries: http://www.pbase.com/ashishpaliwal > > > > > -- > thanks > ashish > > Blog: http://www.ashishpaliwal.com/blog > My Photo Galleries: http://www.pbase.com/ashishpaliwal