Re: Flow in Flume, could it make better?

Guillermo Ortiz Mon, 18 Aug 2014 13:35:41 -0700

On my test, everything is in the same VM. Later, I'll have another flow
which is just spooling or tailing a file and send through Avro to another
Source on my system.


Do I really need to do that replicating step? I think that I have too many
channel and that means too resources and too configuration.


2014-08-18 19:51 GMT+02:00 terrey shih <terreys...@gmail.com>:

> Hi,
>
> Your 2 sources (spooling) and source Avro (from sink 2) are in two
> different JVMs/machines ?
>
> thx
>
>
> On Mon, Aug 18, 2014 at 9:53 AM, Guillermo Ortiz <konstt2...@gmail.com>
> wrote:
>
>> Hi,
>>
>> I have build a flow with Flume and I don't know if it's the way to do it,
>> or there is something better. I am spooling a directory and need those data
>> in three different paths in HDFS with different formats, so I have created
>> two interceptors.
>>
>> Source(Spooling) + Replication + Interceptor1 --> to C1 and C2
>> C1 -> Sink1 to HDFS Path1 (It's like a historic)
>> C2 --> Sink2 to Avro --> Source Avro + Multiplexing + Interceptor2 --> C3
>> and C4
>> C3 --> Sink3 to HDFS Path2
>> C4 --> Sink4 to HDFS Path3
>>
>> Interceptor1 doesn't make too much with the data, it's just to save as
>> they are, it's like to store an history of the original data.
>>
>> Interceptor2 configure an selector and a header. It processes the data
>> and configure the selector to redirect to Sink3 or Sink4. But this
>> interceptor change the original data.
>>
>> I tried to do all the process without replicating data, but I could not.
>> Now, it seems like too many steps just because I want to store the original
>> data in HDFS like a historic.
>>
>
>

Re: Flow in Flume, could it make better?

Reply via email to