On 06/04/2010 21:40, Benjamin Black wrote: > I suggest the reasons you list (which are certainly great reasons!) > are also the reasons there is no referential integrity or transaction > support. Quite. I'm not trying to make recommendations for how Cassandra should be changed to be more like a traditional RDBMS... I just have a requirement, at the logical level, that would be trivial with traditional technology - so the analogy seemed an ideal way to illustrate the issue. > It seems the common practice of using a system like > Zookeeper for the synchronization parts alongside Cassandra would be > applicable here. Have you investigated that? > I started looking at Zookeeper when it was mentioned in an earlier reply. I've discovered it supports something called "Ledgers" - but I'm still unclear if they'd be useful to me - I've only uncovered a very high-level overview so far. I'm concerned that Zookeeper looks as if it might become a problematic bottleneck if all the updates must be routed through it. I don't see Zookeeper mutexes as being especially helpful... because my problem isn't really about two incompatible requests in quick succession - but, rather, about needing to ensure that "referential integrity" is eventually established between two, otherwise independent, keysets. I need to eliminate the possibility that I end up with 'dangling' inaccessible data should a hash-value become recorded in the range of the first map but not the domain of the second (or vice-versa.)
Should I assume that it isn't common practice to write updates atomically in-real time, and batch process them 'off-line' to increase the atomic granularity? It seems an obvious strategy... possibly one for which an implementation might use "MapReduce" or something similar? I don't want to re-invent the wheel, of course.