Re: Hadoop over Cassandra

Mark Schnitzius Tue, 18 May 2010 21:41:17 -0700

>
> If anyone has "war stories" on the topic of Cassandra & Hadoop (or
> even just Hadoop in general) let me know.




Don't know if it counts as a war story, but I was successful recently in
implementing something I got advice on in an earlier thread, namely feeding
both a Cassandra table and a Hadoop sequence file into the same map/reduce
process and updating the same Cassandra table with the results.  I used the
approach I mentioned before, of creating an InputFormat that returns splits
from both (and creating a RecordReader that massages the Cass data into the
same format as the sequence file data).  I'll write something up about it
for the wiki, when I can find some time.

My chief concern with it, though, is gracefully handling a map/reduce
failure.  As Cassandra isn't transactional, the table may end up partially
updated, which is a problem, at least in the domain I'm working in.  So now
I'm trying to come up with a way to effect Cassandra transactions via column
naming conventions or indexes or something like that.  I'd be curious to
hear if anyone here has ever implemented a solution for something similar
before...


Thanks
Mark

Re: Hadoop over Cassandra

Reply via email to