In addition to Scalding and Scrunch, there is Scoobi. Unlike the others, it is only Scala (it doesn't wrap a Java framework). All three have fairly similar APIs and aren't too different from Spark. For example, instead of RDD you have DList (distributed list) or PCollection (parallel collection) - or in Scalding's case, Pipe, because Cascading had to get cute with its names.
On Mon, Jul 7, 2014 at 8:12 PM, Sean Owen <so...@cloudera.com> wrote: > On Tue, Jul 8, 2014 at 1:05 AM, Nabeel Memon <nm3...@gmail.com> wrote: > >> For Scala API on map/reduce (hadoop engine) there's a library called >> "Scalding". It's built on top of Cascading. If you have a huge dataset or >> if you consider using map/reduce engine for your job, for any reason, you >> can try Scalding. >> > > PS Crunch also has a Scala API called Scrunch. And Crunch can run its jobs > on Spark too, not just M/R. > > > -- Daniel Siegmann, Software Developer Velos Accelerating Machine Learning 440 NINTH AVENUE, 11TH FLOOR, NEW YORK, NY 10001 E: daniel.siegm...@velos.io W: www.velos.io