There is some work to make this work on yarn at https://github.com/aniket486/pig. (So, compile pig with ant -Dhadoopversion=23)
You can look at https://github.com/aniket486/pig/blob/spork/pig-spark to find out what sort of env variables you need (sorry, I haven't been able to clean this up- in-progress). There are few known issues with this, I will work on fixing them soon. Known issues- 1. Limit does not work (spork-fix) 2. Foreach requires to turn off schema-tuple-backend (should be a pig-jira) 3. Algebraic udfs dont work (spork-fix in-progress) 4. Group by rework (to avoid OOMs) 5. UDF Classloader issue (requires SPARK-1053, then you can put pig-withouthadoop.jar as SPARK_JARS in SparkContext along with udf jars) ~Aniket On Thu, Mar 6, 2014 at 1:36 PM, Tom Graves <tgraves...@yahoo.com> wrote: > I had asked a similar question on the dev mailing list a while back (Jan > 22nd). > > See the archives: > http://mail-archives.apache.org/mod_mbox/spark-dev/201401.mbox/browser -> > look for spork. > > Basically Matei said: > > Yup, that was it, though I believe people at Twitter picked it up again > recently. I'd suggest > asking Dmitriy if you know him. I've seen interest in this from several other > groups, and > if there's enough of it, maybe we can start another open source repo to track > it. The work > in that repo you pointed to was done over one week, and already had most of > Pig's operators > working. (I helped out with this prototype over Twitter's hack week.) That > work also calls > the Scala API directly, because it was done before we had a Java API; it > should be easier > with the Java one. > > > Tom > > > > On Thursday, March 6, 2014 3:11 PM, Sameer Tilak <ssti...@live.com> > wrote: > Hi everyone, > > We are using to Pig to build our data pipeline. I came across Spork -- Pig > on Spark at: https://github.com/dvryaboy/pig and not sure if it is still > active. > > Can someone please let me know the status of Spork or any other effort > that will let us run Pig on Spark? We can significantly benefit by using > Spark, but we would like to keep using the existing Pig scripts. > > > -- "...:::Aniket:::... Quetzalco@tl"