Re: Repetitive pig scripts...

Daniel Dai Mon, 07 Feb 2011 23:05:09 -0800

Also take a look of http://wiki.apache.org/pig/TuringCompletePig. Youcan embed Pig into Python script. This feature already checked in intotrunk and will be available in 0.9.


Daniel


Alex McLintock wrote:

I'm trying to understand the best way of setting up repeated processing of
continuously generated data - like logs.

I can manually copy files from normal FS to HDFS and kick off pig scripts
but ideally I want something automatic - preferably every hour, or possibly
more often. I also want to process a day or a month's worth of data rather
than just the most recent file.

Is there a best practice way of doing this documented anywhere? I believe
that I should be looking at Flume for transferring files into HDFS and Oozie
for some kind of workflow of pig jobs. Is that right? Any example setups?

Cheers

Alex

Re: Repetitive pig scripts...

Reply via email to