Yeah, I think that it's what I'm doing. How about: channel1 -> sink1 (hdfs raw data) Agent 1src --> replicate + Interceptor1 -->sink3 channel2 --> sink2 avro --> agent2 src Avro --> multiplexing + interceptor2
-->sink4 Could it be possible to apply the interceptor1 just for channel1?? I know that interceptors apply to source level. Interceptor1 doesn't modify too much the data, I could feed channel2 with those little transformations but ideally I would like it. So, if I want to do it, it looks like I'd have to create another level with more channels, etc, etc... Something like this: channel1 -> *sink1 avro -> scr1 avro + interceptor1 -> channel -> sink1 (hdfs raw data)* Agent 1src --> replicate -->sink3 channel2 --> sink2 avro --> agent2 src Avro --> multiplexing + interceptor2 -->sink4 The point is that in sink4 my flow continues and I have other structure that it's similiar that all the previously, So, that means 8 channels in total. I don't know if it's possible to simplify this. 2014-08-19 0:09 GMT+02:00 terrey shih <terreys...@gmail.com>: > something like this > > channel 1 -> sink 1 (raw event sink) > agent 1src -> replicate > > -> > sink 3 > channel 2 - sink 2 -> agent 2 src -> multiplexer > > -> sink 4 > > In fact, I tried not having agent 2, but directly connecting sink2 to src > 2, I was not able to do due to RPCClient exception. > > I am just going to try to have 2 agents. > > terrey > > > On Mon, Aug 18, 2014 at 3:06 PM, terrey shih <terreys...@gmail.com> wrote: > >> Well, I am actually doing similar things as you do. I also need to feed >> that data to different sinks, one just raw data and the other ones are >> Hbase sinks using the multiplexer. >> >> >> channel 1 -> sink 1 (raw event sink) >> agent 1src -> replicate >> channel 2 - sink 2 -> agent 2 src -> multiplexer >> >> channel 2 - sink 2 -> agent 2 src -> multiplexer >> >> >> >> >> On Mon, Aug 18, 2014 at 1:35 PM, Guillermo Ortiz <konstt2...@gmail.com> >> wrote: >> >>> On my test, everything is in the same VM. Later, I'll have another flow >>> which is just spooling or tailing a file and send through Avro to another >>> Source on my system. >>> >>> Do I really need to do that replicating step? I think that I have too >>> many channel and that means too resources and too configuration. >>> >>> >>> 2014-08-18 19:51 GMT+02:00 terrey shih <terreys...@gmail.com>: >>> >>> Hi, >>>> >>>> Your 2 sources (spooling) and source Avro (from sink 2) are in two >>>> different JVMs/machines ? >>>> >>>> thx >>>> >>>> >>>> On Mon, Aug 18, 2014 at 9:53 AM, Guillermo Ortiz <konstt2...@gmail.com> >>>> wrote: >>>> >>>>> Hi, >>>>> >>>>> I have build a flow with Flume and I don't know if it's the way to do >>>>> it, or there is something better. I am spooling a directory and need those >>>>> data in three different paths in HDFS with different formats, so I have >>>>> created two interceptors. >>>>> >>>>> Source(Spooling) + Replication + Interceptor1 --> to C1 and C2 >>>>> C1 -> Sink1 to HDFS Path1 (It's like a historic) >>>>> C2 --> Sink2 to Avro --> Source Avro + Multiplexing + Interceptor2 --> >>>>> C3 and C4 >>>>> C3 --> Sink3 to HDFS Path2 >>>>> C4 --> Sink4 to HDFS Path3 >>>>> >>>>> Interceptor1 doesn't make too much with the data, it's just to save as >>>>> they are, it's like to store an history of the original data. >>>>> >>>>> Interceptor2 configure an selector and a header. It processes the data >>>>> and configure the selector to redirect to Sink3 or Sink4. But this >>>>> interceptor change the original data. >>>>> >>>>> I tried to do all the process without replicating data, but I could >>>>> not. Now, it seems like too many steps just because I want to store the >>>>> original data in HDFS like a historic. >>>>> >>>> >>>> >>> >> >