I'm skeptical that this is the right place to do M/R jobs (multiple
replicas mean you'll do the work multiple times, if you have the same
code on all nodes... and different code on the nodes could get messy
fast.)

But, the work-in-progress patches on CASSANDRA-1608 include a
compaction pub/sub component.  So you could create a subclass of your
desired compaction strategy that adds a "notify reduce job" hook and
try it out that way.

On Sun, Jun 19, 2011 at 1:59 AM, Yang <teddyyyy...@gmail.com> wrote:
> I realize that the SSTable flush/compaction process is essentially
> equivalent to the reduce stage of Map-Reduce,
> since entries of same keys are grouped together.
> we have felt the need to do MR-style jobs on the data already stored in
> cassandra, it would be very useful to
> provide a hook into the compaction process so that the reduce job can be
> done. for example, jobs as simple as
> dumping out all the keys in a system, or for a CF with userId being the key,
> and salary as a column, calculate the total
> salary.
> this is different from what BRISK does, since BRISK only uses CF as a
> physical block storage, and does not
> utilize the data already stored in Cassandra, which has been grouped by
> keys.
> it is possible to come up with some sort of framework to scrape sstables to
> carry out the MR jobs, but the compaction
> hook seems an easier and faster way to get this done, given existing systems
> yang



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com

Reply via email to