Are all the features available in PIG working in SPORK ?? Like for eg: UDFs ?
Thanks. On Thu, Apr 24, 2014 at 1:54 AM, Mayur Rustagi <mayur.rust...@gmail.com>wrote: > Thr are two benefits I get as of now > 1. Most of the time a lot of customers dont want the full power but they > want something dead simple with which they can do dsl. They end up using > Hive for a lot of ETL just cause its SQL & they understand it. Pig is close > & wraps up a lot of framework level semantics away from the user & lets him > focus on data flow > 2. Some have codebases in Pig already & are just looking to do it faster. > I am yet to benchmark that on Pig on spark. > > I agree that pig on spark cannot solve a lot problems but it can solve > some without forcing the end customer to do anything even close to coding, > I believe thr is quite some value in making Spark accessible to larger > group of audience. > End of the day to each his own :) > > Regards > Mayur > > > Mayur Rustagi > Ph: +1 (760) 203 3257 > http://www.sigmoidanalytics.com > @mayur_rustagi <https://twitter.com/mayur_rustagi> > > > > On Thu, Apr 24, 2014 at 1:24 AM, Bharath Mundlapudi > <mundlap...@gmail.com>wrote: > >> This seems like an interesting question. >> >> I love Apache Pig. It is so natural and the language flows with nice >> syntax. >> >> While I was at Yahoo! in core Hadoop Engineering, I have used Pig a lot >> for analytics and provided feedback to Pig Team to do much more >> functionality when it was at version 0.7. Lots of new functionality got >> offered now >> . >> End of the day, Pig is a DSL for data flows. There will be always gaps >> and enhancements. I was often thought is DSL right way to solve data flow >> problems? May be not, we need complete language construct. We may have >> found the answer - Scala. With Scala's dynamic compilation, we can write >> much power constructs than any DSL can provide. >> >> If I am a new organization and beginning to choose, I would go with Scala. >> >> Here is the example: >> >> #!/bin/sh >> exec scala "$0" "$@" >> !# >> YOUR DSL GOES HERE BUT IN SCALA! >> >> You have DSL like scripting, functional and complete language power! If >> we can improve first 3 lines, here you go, you have most powerful DSL to >> solve data problems. >> >> -Bharath >> >> >> >> >> >> On Mon, Mar 10, 2014 at 11:00 PM, Xiangrui Meng <men...@gmail.com> wrote: >> >>> Hi Sameer, >>> >>> Lin (cc'ed) could also give you some updates about Pig on Spark >>> development on her side. >>> >>> Best, >>> Xiangrui >>> >>> On Mon, Mar 10, 2014 at 12:52 PM, Sameer Tilak <ssti...@live.com> wrote: >>> > Hi Mayur, >>> > We are planning to upgrade our distribution MR1> MR2 (YARN) and the >>> goal is >>> > to get SPROK set up next month. I will keep you posted. Can you please >>> keep >>> > me informed about your progress as well. >>> > >>> > ________________________________ >>> > From: mayur.rust...@gmail.com >>> > Date: Mon, 10 Mar 2014 11:47:56 -0700 >>> > >>> > Subject: Re: Pig on Spark >>> > To: user@spark.apache.org >>> > >>> > >>> > Hi Sameer, >>> > Did you make any progress on this. My team is also trying it out would >>> love >>> > to know some detail so progress. >>> > >>> > Mayur Rustagi >>> > Ph: +1 (760) 203 3257 >>> > http://www.sigmoidanalytics.com >>> > @mayur_rustagi >>> > >>> > >>> > >>> > On Thu, Mar 6, 2014 at 2:20 PM, Sameer Tilak <ssti...@live.com> wrote: >>> > >>> > Hi Aniket, >>> > Many thanks! I will check this out. >>> > >>> > ________________________________ >>> > Date: Thu, 6 Mar 2014 13:46:50 -0800 >>> > Subject: Re: Pig on Spark >>> > From: aniket...@gmail.com >>> > To: user@spark.apache.org; tgraves...@yahoo.com >>> > >>> > >>> > There is some work to make this work on yarn at >>> > https://github.com/aniket486/pig. (So, compile pig with ant >>> > -Dhadoopversion=23) >>> > >>> > You can look at https://github.com/aniket486/pig/blob/spork/pig-sparkto >>> > find out what sort of env variables you need (sorry, I haven't been >>> able to >>> > clean this up- in-progress). There are few known issues with this, I >>> will >>> > work on fixing them soon. >>> > >>> > Known issues- >>> > 1. Limit does not work (spork-fix) >>> > 2. Foreach requires to turn off schema-tuple-backend (should be a >>> pig-jira) >>> > 3. Algebraic udfs dont work (spork-fix in-progress) >>> > 4. Group by rework (to avoid OOMs) >>> > 5. UDF Classloader issue (requires SPARK-1053, then you can put >>> > pig-withouthadoop.jar as SPARK_JARS in SparkContext along with udf >>> jars) >>> > >>> > ~Aniket >>> > >>> > >>> > >>> > >>> > On Thu, Mar 6, 2014 at 1:36 PM, Tom Graves <tgraves...@yahoo.com> >>> wrote: >>> > >>> > I had asked a similar question on the dev mailing list a while back >>> (Jan >>> > 22nd). >>> > >>> > See the archives: >>> > http://mail-archives.apache.org/mod_mbox/spark-dev/201401.mbox/browser-> >>> > look for spork. >>> > >>> > Basically Matei said: >>> > >>> > Yup, that was it, though I believe people at Twitter picked it up again >>> > recently. I'd suggest >>> > asking Dmitriy if you know him. I've seen interest in this from several >>> > other groups, and >>> > if there's enough of it, maybe we can start another open source repo to >>> > track it. The work >>> > in that repo you pointed to was done over one week, and already had >>> most of >>> > Pig's operators >>> > working. (I helped out with this prototype over Twitter's hack week.) >>> That >>> > work also calls >>> > the Scala API directly, because it was done before we had a Java API; >>> it >>> > should be easier >>> > with the Java one. >>> > >>> > >>> > Tom >>> > >>> > >>> > >>> > On Thursday, March 6, 2014 3:11 PM, Sameer Tilak <ssti...@live.com> >>> wrote: >>> > Hi everyone, >>> > >>> > We are using to Pig to build our data pipeline. I came across Spork -- >>> Pig >>> > on Spark at: https://github.com/dvryaboy/pig and not sure if it is >>> still >>> > active. >>> > >>> > Can someone please let me know the status of Spork or any other effort >>> that >>> > will let us run Pig on Spark? We can significantly benefit by using >>> Spark, >>> > but we would like to keep using the existing Pig scripts. >>> > >>> > >>> > >>> > >>> > >>> > -- >>> > "...:::Aniket:::... Quetzalco@tl" >>> > >>> > >>> >> >> >