You should be able to accomplish this with the Morplhines intercepter[1]. It will let you build a configuration file that converts from JSON to CSV. There's a similar example, though the target is Avro rather than JSON, in the Kite project[2]. The full docs for Morphlines will also be helpful[3].
-Joey [1] http://flume.apache.org/FlumeUserGuide.html#morphline-interceptor [2] https://github.com/kite-sdk/kite-examples/tree/master/json [3] http://kitesdk.org/docs/current/kite-morphlines/index.html On Wed, Sep 3, 2014 at 4:26 PM, Sid Ray <s...@fractalsciences.com> wrote: > Can you guys please let me know if the following scenario is supported: > I have a system in which there are Tomcat machines which have small JSON > files of 2K size each. The goal is to take those files, convert them to CSV > format and upload them to S3. Then from S3 they are loaded in parallel to > Redshift. > > My idea of the architecture was that: > > TomcatServer1 -------------- > | > TomcatServer2 --------------> Flume---->S3 > > > Is it possbile in Flume we can do the conversion from the JSON file to CSV > files. The idea is that we need to take the contents of the JSON file, do > some database lookup, fetch the id and then create the CSV file out of that. > Is it possible to do this processing in Flume. > > Also, what will the HA architecture of Flume look like. Any links etc. > > Thanks, > Sid -- Joey Echeverria