>
> If anyone has "war stories" on the topic of Cassandra & Hadoop (or
> even just Hadoop in general) let me know.



Don't know if it counts as a war story, but I was successful recently in
implementing something I got advice on in an earlier thread, namely feeding
both a Cassandra table and a Hadoop sequence file into the same map/reduce
process and updating the same Cassandra table with the results.  I used the
approach I mentioned before, of creating an InputFormat that returns splits
from both (and creating a RecordReader that massages the Cass data into the
same format as the sequence file data).  I'll write something up about it
for the wiki, when I can find some time.

My chief concern with it, though, is gracefully handling a map/reduce
failure.  As Cassandra isn't transactional, the table may end up partially
updated, which is a problem, at least in the domain I'm working in.  So now
I'm trying to come up with a way to effect Cassandra transactions via column
naming conventions or indexes or something like that.  I'd be curious to
hear if anyone here has ever implemented a solution for something similar
before...


Thanks
Mark

Reply via email to