does anyone have a feel for how performant m/r operations are when backed by cassandra as opposed to hdfs in terms of network utilization and volume of data being pushed around?
jesse -- jesse mcconnell jesse.mcconn...@gmail.com On Fri, May 7, 2010 at 08:54, Ian Kallen <spidaman.l...@gmail.com> wrote: > On 5/6/10 3:26 PM, Stu Hood wrote: >> >> Ian: I think that as get_range_slice gets faster, the approach that Mark >> was heading toward may be considerably more efficient than reading the old >> value in the OutputFormat. >> > > Interesting, I'm trying to understand the performance impact of the > different ways to do this. Under Mark's approach, the prior values are > pulled out of Cassandra in the mapper in bulk, then merged and written back > to Cassandra in the reducer; the get_range_slice is faster than the > individual row fetches that my approach does in the reducer. Is that what > you mean or are you referring to something else? > thanks! > -Ian > > -- > Ian Kallen > blog: http://www.arachna.com/roller/spidaman > tweetz: http://twitter.com/spidaman > vox: 925.385.8426 > > >