Sunita, Depending on your level of comfort, you can do one of the following:
1. Use Python to fetch your data and then send the events via HTTP to the Flume HTTP Source [1] 2. Use Java to create a custom source [6] in Flume that handles the data fetching and then puts it in a channel [3] so that it can be funneled into the sinks [4] and [5] Option 1 would be easier for you since you can get the data in Python and just stream it down via HTTP to Flume. Option 2 will be more involved since you need to write code that communicates with external endpoints. References [1] http://goo.gl/5lHlg [2] http://goo.gl/GnVbE [3] http://goo.gl/t31Xh [4] http://goo.gl/G9xS8 [5] http://goo.gl/Wn4W5 [6] http://goo.gl/Q0yyn *Author and Instructor for the Upcoming Book and Lecture Series* *Massive Log Data Aggregation, Processing, Searching and Visualization with Open Source Software* *http://massivelogdata.com* On 18 July 2013 13:38, Sunita Arvind <[email protected]> wrote: > Hello friends, > > I am new to flume and have written a python script to fetch some data from > social media. My response is JSON. I am seeking help on following issues: > 1. I am finding it hard to make python and flume talk. Is it just my > ignorance or it is indeed a long route? AFAIK, I need to understand thrift > API and Avro etc to achieve this. I also read about pipes. Would this be a > simple implementation > > 2. I am equally comfortable (uncomfortable) in java. Hence wondering if > its better to re-write my application in Java so that I can easily > integrate it with flume. Are there any advantages of having a java > application, as all of hadoop is java? > > 3. I need to schedule the agent to run on a daily basis. Which of the > above approaches would help me achieve this easily? > > 4. Going by this - > http://mail-archives.apache.org/mod_mbox/flume-user/201306.mbox/%[email protected]%3Elooks > like we need to manually clean up disk space even with flume. I am > not clear on the advantages I would have with flume over using a simple > cron job to do the task. I can manually write statements like "hadoop fs > -put <location of output file on local> <location on hdfs>" in the cron job > instead. > > Appreciate your help and guidance > > regards, > Sunita >
