Are all the features available in PIG working in SPORK ?? Like for eg: UDFs
?

Thanks.


On Thu, Apr 24, 2014 at 1:54 AM, Mayur Rustagi <mayur.rust...@gmail.com>wrote:

> Thr are two benefits I get as of now
> 1. Most of the time a lot of customers dont want the full power but they
> want something dead simple with which they can do dsl. They end up using
> Hive for a lot of ETL just cause its SQL & they understand it. Pig is close
> & wraps up a lot of framework level semantics away from the user & lets him
> focus on data flow
> 2. Some have codebases in Pig already & are just looking to do it faster.
> I am yet to benchmark that on Pig on spark.
>
> I agree that pig on spark cannot solve a lot problems but it can solve
> some without forcing the end customer to do anything even close to coding,
> I believe thr is quite some value in making Spark accessible to larger
> group of audience.
> End of the day to each his own :)
>
> Regards
> Mayur
>
>
> Mayur Rustagi
> Ph: +1 (760) 203 3257
> http://www.sigmoidanalytics.com
> @mayur_rustagi <https://twitter.com/mayur_rustagi>
>
>
>
> On Thu, Apr 24, 2014 at 1:24 AM, Bharath Mundlapudi 
> <mundlap...@gmail.com>wrote:
>
>> This seems like an interesting question.
>>
>> I love Apache Pig. It is so natural and the language flows with nice
>> syntax.
>>
>> While I was at Yahoo! in core Hadoop Engineering, I have used Pig a lot
>> for analytics and provided feedback to Pig Team to do much more
>> functionality when it was at version 0.7. Lots of new functionality got
>> offered now
>> .
>> End of the day, Pig is a DSL for data flows. There will be always gaps
>> and enhancements. I was often thought is DSL right way to solve data flow
>> problems? May be not, we need complete language construct. We may have
>> found the answer - Scala. With Scala's dynamic compilation, we can write
>> much power constructs than any DSL can provide.
>>
>> If I am a new organization and beginning to choose, I would go with Scala.
>>
>> Here is the example:
>>
>> #!/bin/sh
>> exec scala "$0" "$@"
>> !#
>> YOUR DSL GOES HERE BUT IN SCALA!
>>
>> You have DSL like scripting, functional and complete language power! If
>> we can improve first 3 lines, here you go, you have most powerful DSL to
>> solve data problems.
>>
>> -Bharath
>>
>>
>>
>>
>>
>> On Mon, Mar 10, 2014 at 11:00 PM, Xiangrui Meng <men...@gmail.com> wrote:
>>
>>> Hi Sameer,
>>>
>>> Lin (cc'ed) could also give you some updates about Pig on Spark
>>> development on her side.
>>>
>>> Best,
>>> Xiangrui
>>>
>>> On Mon, Mar 10, 2014 at 12:52 PM, Sameer Tilak <ssti...@live.com> wrote:
>>> > Hi Mayur,
>>> > We are planning to upgrade our distribution MR1> MR2 (YARN) and the
>>> goal is
>>> > to get SPROK set up next month. I will keep you posted. Can you please
>>> keep
>>> > me informed about your progress as well.
>>> >
>>> > ________________________________
>>> > From: mayur.rust...@gmail.com
>>> > Date: Mon, 10 Mar 2014 11:47:56 -0700
>>> >
>>> > Subject: Re: Pig on Spark
>>> > To: user@spark.apache.org
>>> >
>>> >
>>> > Hi Sameer,
>>> > Did you make any progress on this. My team is also trying it out would
>>> love
>>> > to know some detail so progress.
>>> >
>>> > Mayur Rustagi
>>> > Ph: +1 (760) 203 3257
>>> > http://www.sigmoidanalytics.com
>>> > @mayur_rustagi
>>> >
>>> >
>>> >
>>> > On Thu, Mar 6, 2014 at 2:20 PM, Sameer Tilak <ssti...@live.com> wrote:
>>> >
>>> > Hi Aniket,
>>> > Many thanks! I will check this out.
>>> >
>>> > ________________________________
>>> > Date: Thu, 6 Mar 2014 13:46:50 -0800
>>> > Subject: Re: Pig on Spark
>>> > From: aniket...@gmail.com
>>> > To: user@spark.apache.org; tgraves...@yahoo.com
>>> >
>>> >
>>> > There is some work to make this work on yarn at
>>> > https://github.com/aniket486/pig. (So, compile pig with ant
>>> > -Dhadoopversion=23)
>>> >
>>> > You can look at https://github.com/aniket486/pig/blob/spork/pig-sparkto
>>> > find out what sort of env variables you need (sorry, I haven't been
>>> able to
>>> > clean this up- in-progress). There are few known issues with this, I
>>> will
>>> > work on fixing them soon.
>>> >
>>> > Known issues-
>>> > 1. Limit does not work (spork-fix)
>>> > 2. Foreach requires to turn off schema-tuple-backend (should be a
>>> pig-jira)
>>> > 3. Algebraic udfs dont work (spork-fix in-progress)
>>> > 4. Group by rework (to avoid OOMs)
>>> > 5. UDF Classloader issue (requires SPARK-1053, then you can put
>>> > pig-withouthadoop.jar as SPARK_JARS in SparkContext along with udf
>>> jars)
>>> >
>>> > ~Aniket
>>> >
>>> >
>>> >
>>> >
>>> > On Thu, Mar 6, 2014 at 1:36 PM, Tom Graves <tgraves...@yahoo.com>
>>> wrote:
>>> >
>>> > I had asked a similar question on the dev mailing list a while back
>>> (Jan
>>> > 22nd).
>>> >
>>> > See the archives:
>>> > http://mail-archives.apache.org/mod_mbox/spark-dev/201401.mbox/browser->
>>> > look for spork.
>>> >
>>> > Basically Matei said:
>>> >
>>> > Yup, that was it, though I believe people at Twitter picked it up again
>>> > recently. I'd suggest
>>> > asking Dmitriy if you know him. I've seen interest in this from several
>>> > other groups, and
>>> > if there's enough of it, maybe we can start another open source repo to
>>> > track it. The work
>>> > in that repo you pointed to was done over one week, and already had
>>> most of
>>> > Pig's operators
>>> > working. (I helped out with this prototype over Twitter's hack week.)
>>> That
>>> > work also calls
>>> > the Scala API directly, because it was done before we had a Java API;
>>> it
>>> > should be easier
>>> > with the Java one.
>>> >
>>> >
>>> > Tom
>>> >
>>> >
>>> >
>>> > On Thursday, March 6, 2014 3:11 PM, Sameer Tilak <ssti...@live.com>
>>> wrote:
>>> > Hi everyone,
>>> >
>>> > We are using to Pig to build our data pipeline. I came across Spork --
>>> Pig
>>> > on Spark at: https://github.com/dvryaboy/pig and not sure if it is
>>> still
>>> > active.
>>> >
>>> > Can someone please let me know the status of Spork or any other effort
>>> that
>>> > will let us run Pig on Spark? We can significantly benefit by using
>>> Spark,
>>> > but we would like to keep using the existing Pig scripts.
>>> >
>>> >
>>> >
>>> >
>>> >
>>> > --
>>> > "...:::Aniket:::... Quetzalco@tl"
>>> >
>>> >
>>>
>>
>>
>

Reply via email to