Hey JB, I think that the underlying data model is the main difference. Pig, like Hive and Cascading, has a relational data model-- the fundamental data type is a Tuple of values. Crunch is closer to bare-metal MapReduce; it doesn't impose a data model on the developer, and I think that it ends up being easier to use Crunch when you're working with data types that would otherwise require you to write lots of UDFs in Pig-- for example, time series, matrices, or HDF5 files. [1]
The other major difference is, as you alluded to, the programming environment-- Crunch is a Java library that also has a Scala wrapper, while Pig is, like Hive, a domain-specific language. Much like the data model, there is a tradeoff here as well-- Crunch requires more skilled developers, but it offers those developers the benefits of a real programming language, like for loops, debugging tools, and a rich ecosystem of testing frameworks. I am a Pig fan (see, for instance, [2] and [3]), and I see the tools as complements, not competitors. Crunch is used by developers who are building ETL pipelines in which performance and thorough testing are critical, and Pig is used by analysts and data scientists in order to run thousands of queries over the results of those ETL pipelines. Best, Josh [1] http://www.hdfgroup.org/HDF5/ [2] http://www.cloudera.com/blog/2011/11/using-hadoop-to-analyze-adverse-drug-events/ [3] http://engineering.linkedin.com/open-source/introducing-datafu-open-source-collection-useful-apache-pig-udfs On Fri, May 18, 2012 at 1:49 AM, Jean-Baptiste Onofré <j...@nanthrax.net> wrote: > Hi Josh, > > Could you compare with Pig ? Is Scala support the main difference ? > > Thanks, > Regards > JB > > > On 05/16/2012 02:23 AM, Josh Wills wrote: >> >> Hi all, >> >> I would like to propose Crunch, a library for writing MapReduce >> pipelines in Java and Scala, as an Apache Incubator project. The >> proposal is here: >> >> http://wiki.apache.org/incubator/CrunchProposal >> >> We would gladly welcome additional volunteers to act as mentors on the >> project, so if this sounds like your cup of tea, please feel free to >> sign up or let us know. >> >> Thanks! >> Josh >> > > -- > Jean-Baptiste Onofré > jbono...@apache.org > http://blog.nanthrax.net > Talend - http://www.talend.com > > --------------------------------------------------------------------- > To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org > For additional commands, e-mail: general-h...@incubator.apache.org > -- Director of Data Science Cloudera Twitter: @josh_wills --------------------------------------------------------------------- To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org