Hi, In the "atomically updating multiple keys" thread I mentioned the issue of dirtying the database with orphan keys and thought that topic was worth starting a new thread.
Say you have an operation that requires creating two keys, A and B, and you succeed in creating A but fail in creating B. How do you delete A after the fact? I have two ideas: 1) Run periodic MapReduce operations that do full db scans looking for garbage keys and deleting them (this seems really horrible, but I'll admit I'm new to distributed DBs and MapReduce). 2) Maintain cleanup logs that explicitly identify possibly offending keys, for optimized cleanup processing. For the 2) case, here is a pattern I have considered: cleanup_id = new_uuid() + "_" + timestamp_string A_value_id = new_uuid() B_value_id = new_uuid() set /cleanup/{cleanup_id} key with data [A, A_value_id], [B, B_value_id] set A key with metadata: value_id = A_value_id cleanup_id = cleanup_id set B key with metadata: value_id = B_value_id cleanup_id = cleanup_id delete /cleanup/{cleanup_id} key <--- commit succeeds here set A key to remove cleanup_id metadata set B key to remove cleanup_id metadata The rule is that when getting keys A or B, the read is only considered successful if either: 1) There is no cleanup_id metadata on the key. 2) There *is* cleanup_id metadata but no key exists with that name. It is not strictly necessary to remove the cleanup_id metadata from the keys, but doing so will optimize reads (since there won't need to be a followup check for the cleanup key each time). So it's okay if we fail to remove this metadata, it can be removed by some other client eventually. So far so good. Now for handling cleanup. Periodically, we scan the "cleanup" bucket for keys to process. Since keys only exist in this bucket at the moment of a write (they are deleted immediately afterwards), in practice there should hardly be any keys in here at any single point in time. We're talking single digits here. Much better than a full db scan to find garbage keys. Also, the keys to process can be narrowed down by time (e.g. > 5 minutes ago) based on the key name. For each old cleanup log, the processor would delete the keys listed in the cleanup information but *only if* the value_id for each key matches what is in the value_id metadata of the actual key. This way if somebody else writes to the key in the meantime, we don't delete it. Provided all deletes succeed (or skip) then the cleanup key itself is deleted. I wonder if anyone else is doing something like this. I also wonder if there is a flaw in this design. Since the lack of the cleanup key can indicate a successful commit, I wonder if there might be a problem with eventual consistency whereby some other node sees A and B but not the cleanup key because it hasn't propagated yet, and therefore thinks the keys are valid when in reality they aren't. Maybe causal consistency ensures this isn't a problem, since the cleanup key is written before A and B? Justin _______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com