What about adding in the data from MySQL as a small batch job after flume sinks 
to S3? You could then delete the raw data that flume sank. I would worry that 
the database connection would be relatively slow and unreliable and may slow 
the Flume throughput. 

Andrew

On Sep 4, 2014, at 7:53 PM, Kevin Warner <kevinwarner7...@gmail.com> wrote:

> Hello All,
> We have the following configuration:
> Source->Channel->Sink
> 
> Now, the source is pointing to a folder that has lots of json files. The 
> channel is file based so that there is fault tolerance and the Sink is 
> putting CSV files on S3.
> 
> Now, there is code written in Sink that takes the JSON events and does some 
> MySQL database lookup and generates CSV files to be put into S3. 
> 
> The question is, is it the right place for the code or should the code be 
> running in channel as the ACID gaurantees is present in Channel. Please 
> advise.
> 
> -Kev
>  


Reply via email to