The difference is noticeable but small. I did a test just reading data
in from Cassandra on our cluster & dumping it to a csv file. Pure map
reduce was going at ~17k records/sec versus ~15k from Pig. There is
overhead to using Pig, but it'll reduce your development time & make
for more readable code if it suits your needs.

On Sun, Jun 27, 2010 at 9:53 AM, Atul Gosain <atul.gos...@gmail.com> wrote:
> Thanks for the information Drew and Jonathan.
> Is there any difference in performance while using Pig compared to MapReduce
> directly on data store ?
> I will do the experiments with both of them though in some time.
>
> On Fri, Jun 25, 2010 at 5:46 PM, Drew Dahlke <drew.dah...@bronto.com> wrote:
>>
>> The cassandra column family input format will go over a an entire
>> column family sending a slice of a row into a mapper at a time. From
>> there there's a lot you can do. As far as how you aggregate data
>> together, I'd suggest experimenting with the latest version of Pig
>> which thankfully supports the new input format. It gives you a
>> SQL'esque syntax for manipulating the data and is probably the easiest
>> way to experiment.
>>
>> On Thu, Jun 24, 2010 at 11:01 AM, Atul Gosain <atul.gos...@gmail.com>
>> wrote:
>> > Hi
>> >   What kind of Map Reduce support is provided for Cassandra ?
>> > Can i get some columns from different rows and then aggregate them up
>> > together. Its basically aggregation of statistics for various devices
>> > connected to a network manager. Is it a right kind of use case to be
>> > supported by MR ?
>> > Thanks
>> > Atul
>
>

Reply via email to