Thanks Jonathan.

yes I did notice the RF issue, and thought , for example, to get a total
salary, you'd need to divide it by RF, something like that.

I'll take a look at 1608,

Yang

On Sun, Jun 19, 2011 at 12:12 AM, Jonathan Ellis <jbel...@gmail.com> wrote:

> I'm skeptical that this is the right place to do M/R jobs (multiple
> replicas mean you'll do the work multiple times, if you have the same
> code on all nodes... and different code on the nodes could get messy
> fast.)
>
> But, the work-in-progress patches on CASSANDRA-1608 include a
> compaction pub/sub component.  So you could create a subclass of your
> desired compaction strategy that adds a "notify reduce job" hook and
> try it out that way.
>
> On Sun, Jun 19, 2011 at 1:59 AM, Yang <teddyyyy...@gmail.com> wrote:
> > I realize that the SSTable flush/compaction process is essentially
> > equivalent to the reduce stage of Map-Reduce,
> > since entries of same keys are grouped together.
> > we have felt the need to do MR-style jobs on the data already stored in
> > cassandra, it would be very useful to
> > provide a hook into the compaction process so that the reduce job can be
> > done. for example, jobs as simple as
> > dumping out all the keys in a system, or for a CF with userId being the
> key,
> > and salary as a column, calculate the total
> > salary.
> > this is different from what BRISK does, since BRISK only uses CF as a
> > physical block storage, and does not
> > utilize the data already stored in Cassandra, which has been grouped by
> > keys.
> > it is possible to come up with some sort of framework to scrape sstables
> to
> > carry out the MR jobs, but the compaction
> > hook seems an easier and faster way to get this done, given existing
> systems
> > yang
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of DataStax, the source for professional Cassandra support
> http://www.datastax.com
>

Reply via email to