Bam !!! http://docs.sigmoidanalytics.com/index.php/Setting_up_spork_with_spark_0.8.1
Mayur Rustagi Ph: +1 (760) 203 3257 http://www.sigmoidanalytics.com @mayur_rustagi <https://twitter.com/mayur_rustagi> On Thu, Apr 10, 2014 at 3:07 AM, Konstantin Kudryavtsev < kudryavtsev.konstan...@gmail.com> wrote: > Hi Mayur, > > I wondered if you could share your findings in some way (github, blog > post, etc). I guess your experience will be very interesting/useful for > many people > > sent from Lenovo YogaTablet > On Apr 8, 2014 8:48 PM, "Mayur Rustagi" <mayur.rust...@gmail.com> wrote: > >> Hi Ankit, >> Thanx for all the work on Pig. >> Finally got it working. Couple of high level bugs right now: >> >> - Getting it working on Spark 0.9.0 >> - Getting UDF working >> - Getting generate functionality working >> - Exhaustive test suite on Spark on Pig >> >> are you maintaining a Jira somewhere? >> >> I am currently trying to deploy it on 0.9.0. >> >> Regards >> Mayur >> >> Mayur Rustagi >> Ph: +1 (760) 203 3257 >> http://www.sigmoidanalytics.com >> @mayur_rustagi <https://twitter.com/mayur_rustagi> >> >> >> >> On Fri, Mar 14, 2014 at 1:37 PM, Aniket Mokashi <aniket...@gmail.com>wrote: >> >>> We will post fixes from our side at - https://github.com/twitter/pig. >>> >>> Top on our list are- >>> 1. Make it work with pig-trunk (execution engine interface) (with 0.8 or >>> 0.9 spark). >>> 2. Support for algebraic udfs (this mitigates the group by oom problems). >>> >>> Would definitely love more contribution on this. >>> >>> Thanks, >>> Aniket >>> >>> >>> On Fri, Mar 14, 2014 at 12:29 PM, Mayur Rustagi <mayur.rust...@gmail.com >>> > wrote: >>> >>>> Dam I am off to NY for Structure Conf. Would it be possible to meet >>>> anytime after 28th March? >>>> I am really interested in making it stable & production quality. >>>> >>>> Regards >>>> Mayur Rustagi >>>> Ph: +1 (760) 203 3257 >>>> http://www.sigmoidanalytics.com >>>> @mayur_rustagi <https://twitter.com/mayur_rustagi> >>>> >>>> >>>> >>>> On Fri, Mar 14, 2014 at 11:53 AM, Julien Le Dem <jul...@twitter.com>wrote: >>>> >>>>> Hi Mayur, >>>>> Are you going to the Pig meetup this afternoon? >>>>> http://www.meetup.com/PigUser/events/160604192/ >>>>> Aniket and I will be there. >>>>> We would be happy to chat about Pig-on-Spark >>>>> >>>>> >>>>> >>>>> On Tue, Mar 11, 2014 at 8:56 AM, Mayur Rustagi < >>>>> mayur.rust...@gmail.com> wrote: >>>>> >>>>>> Hi Lin, >>>>>> We are working on getting Pig on spark functional with 0.8.0, have >>>>>> you got it working on any spark version ? >>>>>> Also what all functionality works on it? >>>>>> Regards >>>>>> Mayur >>>>>> >>>>>> Mayur Rustagi >>>>>> Ph: +1 (760) 203 3257 >>>>>> http://www.sigmoidanalytics.com >>>>>> @mayur_rustagi <https://twitter.com/mayur_rustagi> >>>>>> >>>>>> >>>>>> >>>>>> On Mon, Mar 10, 2014 at 11:00 PM, Xiangrui Meng <men...@gmail.com>wrote: >>>>>> >>>>>>> Hi Sameer, >>>>>>> >>>>>>> Lin (cc'ed) could also give you some updates about Pig on Spark >>>>>>> development on her side. >>>>>>> >>>>>>> Best, >>>>>>> Xiangrui >>>>>>> >>>>>>> On Mon, Mar 10, 2014 at 12:52 PM, Sameer Tilak <ssti...@live.com> >>>>>>> wrote: >>>>>>> > Hi Mayur, >>>>>>> > We are planning to upgrade our distribution MR1> MR2 (YARN) and >>>>>>> the goal is >>>>>>> > to get SPROK set up next month. I will keep you posted. Can you >>>>>>> please keep >>>>>>> > me informed about your progress as well. >>>>>>> > >>>>>>> > ________________________________ >>>>>>> > From: mayur.rust...@gmail.com >>>>>>> > Date: Mon, 10 Mar 2014 11:47:56 -0700 >>>>>>> > >>>>>>> > Subject: Re: Pig on Spark >>>>>>> > To: user@spark.apache.org >>>>>>> > >>>>>>> > >>>>>>> > Hi Sameer, >>>>>>> > Did you make any progress on this. My team is also trying it out >>>>>>> would love >>>>>>> > to know some detail so progress. >>>>>>> > >>>>>>> > Mayur Rustagi >>>>>>> > Ph: +1 (760) 203 3257 >>>>>>> > http://www.sigmoidanalytics.com >>>>>>> > @mayur_rustagi >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > On Thu, Mar 6, 2014 at 2:20 PM, Sameer Tilak <ssti...@live.com> >>>>>>> wrote: >>>>>>> > >>>>>>> > Hi Aniket, >>>>>>> > Many thanks! I will check this out. >>>>>>> > >>>>>>> > ________________________________ >>>>>>> > Date: Thu, 6 Mar 2014 13:46:50 -0800 >>>>>>> > Subject: Re: Pig on Spark >>>>>>> > From: aniket...@gmail.com >>>>>>> > To: user@spark.apache.org; tgraves...@yahoo.com >>>>>>> > >>>>>>> > >>>>>>> > There is some work to make this work on yarn at >>>>>>> > https://github.com/aniket486/pig. (So, compile pig with ant >>>>>>> > -Dhadoopversion=23) >>>>>>> > >>>>>>> > You can look at >>>>>>> https://github.com/aniket486/pig/blob/spork/pig-spark to >>>>>>> > find out what sort of env variables you need (sorry, I haven't >>>>>>> been able to >>>>>>> > clean this up- in-progress). There are few known issues with this, >>>>>>> I will >>>>>>> > work on fixing them soon. >>>>>>> > >>>>>>> > Known issues- >>>>>>> > 1. Limit does not work (spork-fix) >>>>>>> > 2. Foreach requires to turn off schema-tuple-backend (should be a >>>>>>> pig-jira) >>>>>>> > 3. Algebraic udfs dont work (spork-fix in-progress) >>>>>>> > 4. Group by rework (to avoid OOMs) >>>>>>> > 5. UDF Classloader issue (requires SPARK-1053, then you can put >>>>>>> > pig-withouthadoop.jar as SPARK_JARS in SparkContext along with udf >>>>>>> jars) >>>>>>> > >>>>>>> > ~Aniket >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > On Thu, Mar 6, 2014 at 1:36 PM, Tom Graves <tgraves...@yahoo.com> >>>>>>> wrote: >>>>>>> > >>>>>>> > I had asked a similar question on the dev mailing list a while >>>>>>> back (Jan >>>>>>> > 22nd). >>>>>>> > >>>>>>> > See the archives: >>>>>>> > >>>>>>> http://mail-archives.apache.org/mod_mbox/spark-dev/201401.mbox/browser-> >>>>>>> > look for spork. >>>>>>> > >>>>>>> > Basically Matei said: >>>>>>> > >>>>>>> > Yup, that was it, though I believe people at Twitter picked it up >>>>>>> again >>>>>>> > recently. I'd suggest >>>>>>> > asking Dmitriy if you know him. I've seen interest in this from >>>>>>> several >>>>>>> > other groups, and >>>>>>> > if there's enough of it, maybe we can start another open source >>>>>>> repo to >>>>>>> > track it. The work >>>>>>> > in that repo you pointed to was done over one week, and already >>>>>>> had most of >>>>>>> > Pig's operators >>>>>>> > working. (I helped out with this prototype over Twitter's hack >>>>>>> week.) That >>>>>>> > work also calls >>>>>>> > the Scala API directly, because it was done before we had a Java >>>>>>> API; it >>>>>>> > should be easier >>>>>>> > with the Java one. >>>>>>> > >>>>>>> > >>>>>>> > Tom >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > On Thursday, March 6, 2014 3:11 PM, Sameer Tilak <ssti...@live.com> >>>>>>> wrote: >>>>>>> > Hi everyone, >>>>>>> > >>>>>>> > We are using to Pig to build our data pipeline. I came across >>>>>>> Spork -- Pig >>>>>>> > on Spark at: https://github.com/dvryaboy/pig and not sure if it >>>>>>> is still >>>>>>> > active. >>>>>>> > >>>>>>> > Can someone please let me know the status of Spork or any other >>>>>>> effort that >>>>>>> > will let us run Pig on Spark? We can significantly benefit by >>>>>>> using Spark, >>>>>>> > but we would like to keep using the existing Pig scripts. >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > -- >>>>>>> > "...:::Aniket:::... Quetzalco@tl" >>>>>>> > >>>>>>> > >>>>>>> >>>>>> >>>>>> >>>>> >>>> >>> >>> >>> -- >>> "...:::Aniket:::... Quetzalco@tl" >>> >> >>