Can you guys please let me know if the following scenario is supported: I have a system in which there are Tomcat machines which have small JSON files of 2K size each. The goal is to take those files, convert them to CSV format and upload them to S3. Then from S3 they are loaded in parallel to Redshift.
My idea of the architecture was that: TomcatServer1 -------------- | TomcatServer2 --------------> Flume---->S3 Is it possbile in Flume we can do the conversion from the JSON file to CSV files. The idea is that we need to take the contents of the JSON file, do some database lookup, fetch the id and then create the CSV file out of that. Is it possible to do this processing in Flume. Also, what will the HA architecture of Flume look like. Any links etc. Thanks, Sid