UDF Generate & many many more are not working :) Several of them work. Joins, filters, group by etc. I am translating the ones we need, would be happy to get help on others. Will host a jira to track them if you are intersted.
Mayur Rustagi Ph: +1 (760) 203 3257 http://www.sigmoidanalytics.com @mayur_rustagi <https://twitter.com/mayur_rustagi> On Thu, Apr 24, 2014 at 2:10 AM, suman bharadwaj <suman....@gmail.com>wrote: > Are all the features available in PIG working in SPORK ?? Like for eg: > UDFs ? > > Thanks. > > > On Thu, Apr 24, 2014 at 1:54 AM, Mayur Rustagi <mayur.rust...@gmail.com>wrote: > >> Thr are two benefits I get as of now >> 1. Most of the time a lot of customers dont want the full power but they >> want something dead simple with which they can do dsl. They end up using >> Hive for a lot of ETL just cause its SQL & they understand it. Pig is close >> & wraps up a lot of framework level semantics away from the user & lets him >> focus on data flow >> 2. Some have codebases in Pig already & are just looking to do it faster. >> I am yet to benchmark that on Pig on spark. >> >> I agree that pig on spark cannot solve a lot problems but it can solve >> some without forcing the end customer to do anything even close to coding, >> I believe thr is quite some value in making Spark accessible to larger >> group of audience. >> End of the day to each his own :) >> >> Regards >> Mayur >> >> >> Mayur Rustagi >> Ph: +1 (760) 203 3257 >> http://www.sigmoidanalytics.com >> @mayur_rustagi <https://twitter.com/mayur_rustagi> >> >> >> >> On Thu, Apr 24, 2014 at 1:24 AM, Bharath Mundlapudi <mundlap...@gmail.com >> > wrote: >> >>> This seems like an interesting question. >>> >>> I love Apache Pig. It is so natural and the language flows with nice >>> syntax. >>> >>> While I was at Yahoo! in core Hadoop Engineering, I have used Pig a lot >>> for analytics and provided feedback to Pig Team to do much more >>> functionality when it was at version 0.7. Lots of new functionality got >>> offered now >>> . >>> End of the day, Pig is a DSL for data flows. There will be always gaps >>> and enhancements. I was often thought is DSL right way to solve data flow >>> problems? May be not, we need complete language construct. We may have >>> found the answer - Scala. With Scala's dynamic compilation, we can write >>> much power constructs than any DSL can provide. >>> >>> If I am a new organization and beginning to choose, I would go with >>> Scala. >>> >>> Here is the example: >>> >>> #!/bin/sh >>> exec scala "$0" "$@" >>> !# >>> YOUR DSL GOES HERE BUT IN SCALA! >>> >>> You have DSL like scripting, functional and complete language power! If >>> we can improve first 3 lines, here you go, you have most powerful DSL to >>> solve data problems. >>> >>> -Bharath >>> >>> >>> >>> >>> >>> On Mon, Mar 10, 2014 at 11:00 PM, Xiangrui Meng <men...@gmail.com>wrote: >>> >>>> Hi Sameer, >>>> >>>> Lin (cc'ed) could also give you some updates about Pig on Spark >>>> development on her side. >>>> >>>> Best, >>>> Xiangrui >>>> >>>> On Mon, Mar 10, 2014 at 12:52 PM, Sameer Tilak <ssti...@live.com> >>>> wrote: >>>> > Hi Mayur, >>>> > We are planning to upgrade our distribution MR1> MR2 (YARN) and the >>>> goal is >>>> > to get SPROK set up next month. I will keep you posted. Can you >>>> please keep >>>> > me informed about your progress as well. >>>> > >>>> > ________________________________ >>>> > From: mayur.rust...@gmail.com >>>> > Date: Mon, 10 Mar 2014 11:47:56 -0700 >>>> > >>>> > Subject: Re: Pig on Spark >>>> > To: user@spark.apache.org >>>> > >>>> > >>>> > Hi Sameer, >>>> > Did you make any progress on this. My team is also trying it out >>>> would love >>>> > to know some detail so progress. >>>> > >>>> > Mayur Rustagi >>>> > Ph: +1 (760) 203 3257 >>>> > http://www.sigmoidanalytics.com >>>> > @mayur_rustagi >>>> > >>>> > >>>> > >>>> > On Thu, Mar 6, 2014 at 2:20 PM, Sameer Tilak <ssti...@live.com> >>>> wrote: >>>> > >>>> > Hi Aniket, >>>> > Many thanks! I will check this out. >>>> > >>>> > ________________________________ >>>> > Date: Thu, 6 Mar 2014 13:46:50 -0800 >>>> > Subject: Re: Pig on Spark >>>> > From: aniket...@gmail.com >>>> > To: user@spark.apache.org; tgraves...@yahoo.com >>>> > >>>> > >>>> > There is some work to make this work on yarn at >>>> > https://github.com/aniket486/pig. (So, compile pig with ant >>>> > -Dhadoopversion=23) >>>> > >>>> > You can look at https://github.com/aniket486/pig/blob/spork/pig-sparkto >>>> > find out what sort of env variables you need (sorry, I haven't been >>>> able to >>>> > clean this up- in-progress). There are few known issues with this, I >>>> will >>>> > work on fixing them soon. >>>> > >>>> > Known issues- >>>> > 1. Limit does not work (spork-fix) >>>> > 2. Foreach requires to turn off schema-tuple-backend (should be a >>>> pig-jira) >>>> > 3. Algebraic udfs dont work (spork-fix in-progress) >>>> > 4. Group by rework (to avoid OOMs) >>>> > 5. UDF Classloader issue (requires SPARK-1053, then you can put >>>> > pig-withouthadoop.jar as SPARK_JARS in SparkContext along with udf >>>> jars) >>>> > >>>> > ~Aniket >>>> > >>>> > >>>> > >>>> > >>>> > On Thu, Mar 6, 2014 at 1:36 PM, Tom Graves <tgraves...@yahoo.com> >>>> wrote: >>>> > >>>> > I had asked a similar question on the dev mailing list a while back >>>> (Jan >>>> > 22nd). >>>> > >>>> > See the archives: >>>> > >>>> http://mail-archives.apache.org/mod_mbox/spark-dev/201401.mbox/browser-> >>>> > look for spork. >>>> > >>>> > Basically Matei said: >>>> > >>>> > Yup, that was it, though I believe people at Twitter picked it up >>>> again >>>> > recently. I'd suggest >>>> > asking Dmitriy if you know him. I've seen interest in this from >>>> several >>>> > other groups, and >>>> > if there's enough of it, maybe we can start another open source repo >>>> to >>>> > track it. The work >>>> > in that repo you pointed to was done over one week, and already had >>>> most of >>>> > Pig's operators >>>> > working. (I helped out with this prototype over Twitter's hack week.) >>>> That >>>> > work also calls >>>> > the Scala API directly, because it was done before we had a Java API; >>>> it >>>> > should be easier >>>> > with the Java one. >>>> > >>>> > >>>> > Tom >>>> > >>>> > >>>> > >>>> > On Thursday, March 6, 2014 3:11 PM, Sameer Tilak <ssti...@live.com> >>>> wrote: >>>> > Hi everyone, >>>> > >>>> > We are using to Pig to build our data pipeline. I came across Spork >>>> -- Pig >>>> > on Spark at: https://github.com/dvryaboy/pig and not sure if it is >>>> still >>>> > active. >>>> > >>>> > Can someone please let me know the status of Spork or any other >>>> effort that >>>> > will let us run Pig on Spark? We can significantly benefit by using >>>> Spark, >>>> > but we would like to keep using the existing Pig scripts. >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > -- >>>> > "...:::Aniket:::... Quetzalco@tl" >>>> > >>>> > >>>> >>> >>> >> >