On 8/19/14 12:07 AM, Branko Čibej wrote: > I think it's not that simple. > > Consider the case where an administrator decides to not use 'svnadmin hotcopy' > to back up a repository, but instead creates a (LVM) snapshot of the volume > and > uses 'tar' (or 'cp -a') to create the backup. > > When such a backup is restored and made active, everything will just work ... > except that stale caches in svnserve or mor_dav_svn will not be automatically > invalidated. In other words, the mere introduction of the instance ID does not > solve "all" problems. There are several possible resolutions to this > particular > problem: > > * Tell the users "don't do that". That won't help; they'll do it anyway. > * Require a restart of all servers when restoring such backups; been there, > people forget. > * Require that the users run 'svnadmin recover' before bringing the > repository online; this might work if 'svnadmin recover' tweaks the > instance ID, since presumably they're already using it per our existing > recommendation. > * Invent 'svnadmin restore' or 'svnadmin activate' or whatnot to make such > backups viable; see above, people forget. > * Require 'svnadmin setuuid' on the restored backups; this breaks existing > working copies. > > So, even though the existence of the instance ID is an implementation detail, > it does cause visible change in the behaviour of the repository: server > restarts due to fiddling with the repository instance are needed far less > often; but we still have to document when and why they are needed.
I think part of the problem here has been we (as in WANdisco folks) have discussed the idea of an instance ID for repositories in the past to solve the range of replacing the repository without clearing the cache issues. But this change is being added for a very different reason. Evgeny has implemented the instance ID for the purpose of solving the problem of two different repositories not being able to be locked if they happen to have the same UUID. This happens because we use a mutex to handle locking between threads and that mutex can't distinguish between different repositories with identical UUIDs. Currently the code on trunk adds the instance ID to the cache keys. I'm not sure we should be doing that (though both brane and stefan2 requested that be done). As per the discussion today at the SHF hackathon the instance ID can't resolve the failure to clear the cache issues. The best it can do is narrow the window for these issues to exist. That would seem like a good thing but I think it creates a huge false sense of security. We will ultimately have someone that comes along with a corrupted repository, we're going to say you replaced the repo while the server was running and the user is going to say "But I've been doing this for years without any problem." Without the instance ID in the cache keys users are unlikely to actually corrupt their repository (just like they would be with them, it's a pretty hard race to hit). But they are likely to get errors related to the cache being stale. This gives them a giant hint that what they're doing is wrong and gives us an opportunity to educate them.