WANdisco's MultiSite product is essentially a multiplexing http proxy that sits between the client and Apache. Testing has revealed an issue that probably affects other proxies, such as squid.
The problem occurs during commit. When the client has a direct connection to Apache it holds a connection open for the duration of the commit and sends multiple requests over the same connection. If Apache dies, or is forcibly restarted, the client sees the connection break and aborts the commit. When a proxy is present the client's connection is to the proxy and the proxy maintains its own connection to Apache. If the proxy loses a connection to Apache it may just switch to another connection. If this happens during a request then the client will still see an error, but if it happens between requests the client may not be aware that the proxy has switched connections. This is indeed how the squid proxy behaves. So if the timing is just right it's possible for one Apache process to start writing the transaction, for that process to stop, and for another process to take over the commit. WANdisco observed problems on FSFS where the transaction is synced at the end of the commit, not for each http request. What ends up in the transaction probably depends on the details of the kernel memory and disk caching, the system load, the underlying OS filesystem, etc. In my testing with squid I have not managed to produce a corrupt commit, but I suspect that under the right conditions it would happen. I think that getting mod_dav_svn to sync before acknowledging each http request is a non-starter, for performance reasons. Can mod_dav_svn detect that the connection has changed? It's too late to get the old process to sync, but perhaps we could abort the commit? Some valid commits would fail, but it would avoid the small risk of a corrupt commit. -- Philip