Bam !!!
http://docs.sigmoidanalytics.com/index.php/Setting_up_spork_with_spark_0.8.1


Mayur Rustagi
Ph: +1 (760) 203 3257
http://www.sigmoidanalytics.com
@mayur_rustagi <https://twitter.com/mayur_rustagi>



On Thu, Apr 10, 2014 at 3:07 AM, Konstantin Kudryavtsev <
kudryavtsev.konstan...@gmail.com> wrote:

> Hi Mayur,
>
> I wondered if you could share your findings in some way (github, blog
> post, etc). I guess your experience will be very interesting/useful for
> many people
>
> sent from Lenovo YogaTablet
> On Apr 8, 2014 8:48 PM, "Mayur Rustagi" <mayur.rust...@gmail.com> wrote:
>
>> Hi Ankit,
>> Thanx for all the work on Pig.
>> Finally got it working. Couple of high level bugs right now:
>>
>>    - Getting it working on Spark 0.9.0
>>    - Getting UDF working
>>    - Getting generate functionality working
>>    - Exhaustive test suite on Spark on Pig
>>
>> are you maintaining a Jira somewhere?
>>
>> I am currently trying to deploy it on 0.9.0.
>>
>> Regards
>> Mayur
>>
>> Mayur Rustagi
>> Ph: +1 (760) 203 3257
>> http://www.sigmoidanalytics.com
>> @mayur_rustagi <https://twitter.com/mayur_rustagi>
>>
>>
>>
>> On Fri, Mar 14, 2014 at 1:37 PM, Aniket Mokashi <aniket...@gmail.com>wrote:
>>
>>> We will post fixes from our side at - https://github.com/twitter/pig.
>>>
>>> Top on our list are-
>>> 1. Make it work with pig-trunk (execution engine interface) (with 0.8 or
>>> 0.9 spark).
>>> 2. Support for algebraic udfs (this mitigates the group by oom problems).
>>>
>>> Would definitely love more contribution on this.
>>>
>>> Thanks,
>>> Aniket
>>>
>>>
>>> On Fri, Mar 14, 2014 at 12:29 PM, Mayur Rustagi <mayur.rust...@gmail.com
>>> > wrote:
>>>
>>>> Dam I am off to NY for Structure Conf. Would it be possible to meet
>>>> anytime after 28th March?
>>>> I am really interested in making it stable & production quality.
>>>>
>>>> Regards
>>>> Mayur Rustagi
>>>> Ph: +1 (760) 203 3257
>>>> http://www.sigmoidanalytics.com
>>>> @mayur_rustagi <https://twitter.com/mayur_rustagi>
>>>>
>>>>
>>>>
>>>> On Fri, Mar 14, 2014 at 11:53 AM, Julien Le Dem <jul...@twitter.com>wrote:
>>>>
>>>>> Hi Mayur,
>>>>> Are you going to the Pig meetup this afternoon?
>>>>> http://www.meetup.com/PigUser/events/160604192/
>>>>> Aniket and I will be there.
>>>>> We would be happy to chat about Pig-on-Spark
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Mar 11, 2014 at 8:56 AM, Mayur Rustagi <
>>>>> mayur.rust...@gmail.com> wrote:
>>>>>
>>>>>> Hi Lin,
>>>>>> We are working on getting Pig on spark functional with 0.8.0, have
>>>>>> you got it working on any spark version ?
>>>>>> Also what all functionality works on it?
>>>>>> Regards
>>>>>> Mayur
>>>>>>
>>>>>> Mayur Rustagi
>>>>>> Ph: +1 (760) 203 3257
>>>>>> http://www.sigmoidanalytics.com
>>>>>> @mayur_rustagi <https://twitter.com/mayur_rustagi>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Mon, Mar 10, 2014 at 11:00 PM, Xiangrui Meng <men...@gmail.com>wrote:
>>>>>>
>>>>>>> Hi Sameer,
>>>>>>>
>>>>>>> Lin (cc'ed) could also give you some updates about Pig on Spark
>>>>>>> development on her side.
>>>>>>>
>>>>>>> Best,
>>>>>>> Xiangrui
>>>>>>>
>>>>>>> On Mon, Mar 10, 2014 at 12:52 PM, Sameer Tilak <ssti...@live.com>
>>>>>>> wrote:
>>>>>>> > Hi Mayur,
>>>>>>> > We are planning to upgrade our distribution MR1> MR2 (YARN) and
>>>>>>> the goal is
>>>>>>> > to get SPROK set up next month. I will keep you posted. Can you
>>>>>>> please keep
>>>>>>> > me informed about your progress as well.
>>>>>>> >
>>>>>>> > ________________________________
>>>>>>> > From: mayur.rust...@gmail.com
>>>>>>> > Date: Mon, 10 Mar 2014 11:47:56 -0700
>>>>>>> >
>>>>>>> > Subject: Re: Pig on Spark
>>>>>>> > To: user@spark.apache.org
>>>>>>> >
>>>>>>> >
>>>>>>> > Hi Sameer,
>>>>>>> > Did you make any progress on this. My team is also trying it out
>>>>>>> would love
>>>>>>> > to know some detail so progress.
>>>>>>> >
>>>>>>> > Mayur Rustagi
>>>>>>> > Ph: +1 (760) 203 3257
>>>>>>> > http://www.sigmoidanalytics.com
>>>>>>> > @mayur_rustagi
>>>>>>> >
>>>>>>> >
>>>>>>> >
>>>>>>> > On Thu, Mar 6, 2014 at 2:20 PM, Sameer Tilak <ssti...@live.com>
>>>>>>> wrote:
>>>>>>> >
>>>>>>> > Hi Aniket,
>>>>>>> > Many thanks! I will check this out.
>>>>>>> >
>>>>>>> > ________________________________
>>>>>>> > Date: Thu, 6 Mar 2014 13:46:50 -0800
>>>>>>> > Subject: Re: Pig on Spark
>>>>>>> > From: aniket...@gmail.com
>>>>>>> > To: user@spark.apache.org; tgraves...@yahoo.com
>>>>>>> >
>>>>>>> >
>>>>>>> > There is some work to make this work on yarn at
>>>>>>> > https://github.com/aniket486/pig. (So, compile pig with ant
>>>>>>> > -Dhadoopversion=23)
>>>>>>> >
>>>>>>> > You can look at
>>>>>>> https://github.com/aniket486/pig/blob/spork/pig-spark to
>>>>>>> > find out what sort of env variables you need (sorry, I haven't
>>>>>>> been able to
>>>>>>> > clean this up- in-progress). There are few known issues with this,
>>>>>>> I will
>>>>>>> > work on fixing them soon.
>>>>>>> >
>>>>>>> > Known issues-
>>>>>>> > 1. Limit does not work (spork-fix)
>>>>>>> > 2. Foreach requires to turn off schema-tuple-backend (should be a
>>>>>>> pig-jira)
>>>>>>> > 3. Algebraic udfs dont work (spork-fix in-progress)
>>>>>>> > 4. Group by rework (to avoid OOMs)
>>>>>>> > 5. UDF Classloader issue (requires SPARK-1053, then you can put
>>>>>>> > pig-withouthadoop.jar as SPARK_JARS in SparkContext along with udf
>>>>>>> jars)
>>>>>>> >
>>>>>>> > ~Aniket
>>>>>>> >
>>>>>>> >
>>>>>>> >
>>>>>>> >
>>>>>>> > On Thu, Mar 6, 2014 at 1:36 PM, Tom Graves <tgraves...@yahoo.com>
>>>>>>> wrote:
>>>>>>> >
>>>>>>> > I had asked a similar question on the dev mailing list a while
>>>>>>> back (Jan
>>>>>>> > 22nd).
>>>>>>> >
>>>>>>> > See the archives:
>>>>>>> >
>>>>>>> http://mail-archives.apache.org/mod_mbox/spark-dev/201401.mbox/browser->
>>>>>>> > look for spork.
>>>>>>> >
>>>>>>> > Basically Matei said:
>>>>>>> >
>>>>>>> > Yup, that was it, though I believe people at Twitter picked it up
>>>>>>> again
>>>>>>> > recently. I'd suggest
>>>>>>> > asking Dmitriy if you know him. I've seen interest in this from
>>>>>>> several
>>>>>>> > other groups, and
>>>>>>> > if there's enough of it, maybe we can start another open source
>>>>>>> repo to
>>>>>>> > track it. The work
>>>>>>> > in that repo you pointed to was done over one week, and already
>>>>>>> had most of
>>>>>>> > Pig's operators
>>>>>>> > working. (I helped out with this prototype over Twitter's hack
>>>>>>> week.) That
>>>>>>> > work also calls
>>>>>>> > the Scala API directly, because it was done before we had a Java
>>>>>>> API; it
>>>>>>> > should be easier
>>>>>>> > with the Java one.
>>>>>>> >
>>>>>>> >
>>>>>>> > Tom
>>>>>>> >
>>>>>>> >
>>>>>>> >
>>>>>>> > On Thursday, March 6, 2014 3:11 PM, Sameer Tilak <ssti...@live.com>
>>>>>>> wrote:
>>>>>>> > Hi everyone,
>>>>>>> >
>>>>>>> > We are using to Pig to build our data pipeline. I came across
>>>>>>> Spork -- Pig
>>>>>>> > on Spark at: https://github.com/dvryaboy/pig and not sure if it
>>>>>>> is still
>>>>>>> > active.
>>>>>>> >
>>>>>>> > Can someone please let me know the status of Spork or any other
>>>>>>> effort that
>>>>>>> > will let us run Pig on Spark? We can significantly benefit by
>>>>>>> using Spark,
>>>>>>> > but we would like to keep using the existing Pig scripts.
>>>>>>> >
>>>>>>> >
>>>>>>> >
>>>>>>> >
>>>>>>> >
>>>>>>> > --
>>>>>>> > "...:::Aniket:::... Quetzalco@tl"
>>>>>>> >
>>>>>>> >
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>> "...:::Aniket:::... Quetzalco@tl"
>>>
>>
>>

Reply via email to