Hi Russell, This might be a bit late, but here's an example of how you can load a file in python and pass the results back to Pig: https://github.com/mortarcode/python-files
It's a Mortar project but the pig script ( https://github.com/mortarcode/python-files/blob/master/pigscripts/python-files.pig) and python udf file ( https://github.com/mortarcode/python-files/blob/master/udfs/python/python-files.py) should work fine without Mortar as long as you explicitly set the AWS key parameters in the Pig script and have boto installed. This example uses a small file - if you want to read a larger file you'll need to handle boto/s3 issues with downloading large files or have Python read directly from hdfs. I've found s3 actually works pretty well though for small files like this. Reading larger files in Python doesn't work very well because you have to worry about running out of memory when passing everything back from Python to Java. Jeremy Karn / Lead Developer MORTAR DATA / 519 277 4391 / www.mortardata.com On Sun, Jul 20, 2014 at 5:14 PM, Russell Jurney <[email protected]> wrote: > I need to load a file and loop through it during the execution of a python > UDF. Is this possible? How? > > -- > Russell Jurney twitter.com/rjurney [email protected] > datasyndrome.com > ᐧ >
