Hi,

I have build a flow with Flume and I don't know if it's the way to do it,
or there is something better. I am spooling a directory and need those data
in three different paths in HDFS with different formats, so I have created
two interceptors.

Source(Spooling) + Replication + Interceptor1 --> to C1 and C2
C1 -> Sink1 to HDFS Path1 (It's like a historic)
C2 --> Sink2 to Avro --> Source Avro + Multiplexing + Interceptor2 --> C3
and C4
C3 --> Sink3 to HDFS Path2
C4 --> Sink4 to HDFS Path3

Interceptor1 doesn't make too much with the data, it's just to save as they
are, it's like to store an history of the original data.

Interceptor2 configure an selector and a header. It processes the data and
configure the selector to redirect to Sink3 or Sink4. But this interceptor
change the original data.

I tried to do all the process without replicating data, but I could not.
Now, it seems like too many steps just because I want to store the original
data in HDFS like a historic.

Reply via email to