Hi, I have build a flow with Flume and I don't know if it's the way to do it, or there is something better. I am spooling a directory and need those data in three different paths in HDFS with different formats, so I have created two interceptors.
Source(Spooling) + Replication + Interceptor1 --> to C1 and C2 C1 -> Sink1 to HDFS Path1 (It's like a historic) C2 --> Sink2 to Avro --> Source Avro + Multiplexing + Interceptor2 --> C3 and C4 C3 --> Sink3 to HDFS Path2 C4 --> Sink4 to HDFS Path3 Interceptor1 doesn't make too much with the data, it's just to save as they are, it's like to store an history of the original data. Interceptor2 configure an selector and a header. It processes the data and configure the selector to redirect to Sink3 or Sink4. But this interceptor change the original data. I tried to do all the process without replicating data, but I could not. Now, it seems like too many steps just because I want to store the original data in HDFS like a historic.