The difference is noticeable but small. I did a test just reading data in from Cassandra on our cluster & dumping it to a csv file. Pure map reduce was going at ~17k records/sec versus ~15k from Pig. There is overhead to using Pig, but it'll reduce your development time & make for more readable code if it suits your needs.
On Sun, Jun 27, 2010 at 9:53 AM, Atul Gosain <atul.gos...@gmail.com> wrote: > Thanks for the information Drew and Jonathan. > Is there any difference in performance while using Pig compared to MapReduce > directly on data store ? > I will do the experiments with both of them though in some time. > > On Fri, Jun 25, 2010 at 5:46 PM, Drew Dahlke <drew.dah...@bronto.com> wrote: >> >> The cassandra column family input format will go over a an entire >> column family sending a slice of a row into a mapper at a time. From >> there there's a lot you can do. As far as how you aggregate data >> together, I'd suggest experimenting with the latest version of Pig >> which thankfully supports the new input format. It gives you a >> SQL'esque syntax for manipulating the data and is probably the easiest >> way to experiment. >> >> On Thu, Jun 24, 2010 at 11:01 AM, Atul Gosain <atul.gos...@gmail.com> >> wrote: >> > Hi >> > What kind of Map Reduce support is provided for Cassandra ? >> > Can i get some columns from different rows and then aggregate them up >> > together. Its basically aggregation of statistics for various devices >> > connected to a network manager. Is it a right kind of use case to be >> > supported by MR ? >> > Thanks >> > Atul > >