What about adding in the data from MySQL as a small batch job after flume sinks to S3? You could then delete the raw data that flume sank. I would worry that the database connection would be relatively slow and unreliable and may slow the Flume throughput.
Andrew On Sep 4, 2014, at 7:53 PM, Kevin Warner <kevinwarner7...@gmail.com> wrote: > Hello All, > We have the following configuration: > Source->Channel->Sink > > Now, the source is pointing to a folder that has lots of json files. The > channel is file based so that there is fault tolerance and the Sink is > putting CSV files on S3. > > Now, there is code written in Sink that takes the JSON events and does some > MySQL database lookup and generates CSV files to be put into S3. > > The question is, is it the right place for the code or should the code be > running in channel as the ACID gaurantees is present in Channel. Please > advise. > > -Kev >