On Sun, Feb 20, 2011 at 6:35 PM, Mark Mielke <m...@mark.mielke.cc> wrote: > On 02/20/2011 03:50 AM, Ivan Zhakov wrote: >> >> On Wed, Dec 29, 2010 at 22:37, Stefan Fuhrmann<eq...@web.de> wrote: >>> >>> The fopen() calls should be eliminated by the >>> file handle cache. IOW, they should already be >>> addressed on the performance branch. Please >>> let me know if that is not the case. >> >> My belief that file handles cache should be implemented at OS level >> and I pretty sure that it's implemented. And right way to eliminate >> number of duplicate fopen()/reads() is improving our FS API. >> >> I didn't reviewed how file handles cache is implemented in >> fs-performance branch, but I'm nearly to -1 against implementing cache >> of open file handles in Subversion. > > What OS implements file handle caching? The OS file system layer for most > operating systems does implement caching - but open()/close() can easily > invalidate some or all of this cache due to required POSIX behaviour, > especially if the backend storage is remote and shared between multiple > clients such as would be the case over NFS. This is required to implement > consistency across clients. The local operating system cannot arbitrarily > cache everything, and every bit of data it does decide to cache could be > wrong at any point in time without other aspects in use such as file > locking. > > Of particular concern to me is how slow Subversion gets over NFS, and this > thread grabbed my attention as a result. When using NFS Subversion > operations can take many times longer (20 seconds -> 20 minutes). I think > people may be testing and making assumptions that a "local file system" will > be in use. Do people working on the fs-performance branch check with NFS? > > I don't know... just dropping in... feel free to set me straight. :-)
Hi Mark, You're absolutely right, some Subversion operations perform horribly with FSFS over NFS (we have such a setup @work). In fact, the poor performance of e.g. "svn log somefile" on NFS was one of the problems I was first interested in when looking at svn (and one of the reasons I got involved with svn development, a positive side-effect :-)). On our setup at work, "svn log" is about 10 times slower when done over NFS than on local disk. As I described in this thread (but also some threads before), "svn log somefile" opens and closes each rev file about 20 times (and the situation is not better with a packed repository, because the packed file is opened/closed just as many times), and it seems that is very expensive when working over NFS. I haven't been able to test the performance branch (with the file handle caching) on our NFS setup at work. I have only measured the number of fopen() calls for an "svn log" operation, compared to trunk, assuming that is *the* most critical performance differentiator for NFS setups. If someone could do some real measurements/benchmarks of "svn log" (and other operations of course) of the performance branch on an NFS setup, compared with trunk (and maybe also compare them with a similar setup with FSFS on local disk), that could be very interesting... > That said, I'm also (in principle) against implementing cache of open file > handles. I prefer architectures that cache intermediate data in a processed > form that the application has made a determined choice to make use of such > that the cache is the most useful to the application, rather than a > transparent caching layer that guesses at what is safe. The OS file system > layer is exactly this - any caching it does is transparent to the > application and a guess. Guesses are dangerous, which is exactly why the OS > file system layer cannot do as much caching unless it has 100% control of > the file system (= local file system). I agree that it would be best if the architecture was so that svn could organize its work for most use cases in a way that's efficient for the lower levels of the system. For instance, for "svn log", svn should in theory be able to do its work with exactly 1 open/close per rev file (or in a packed repository, maybe even only 1 open/close per packed file). But right now, this isn't the case, and I think it would be a huge amount of work, change in architecture, layering, ... Until that happens, I think such a generic file-handle caching layer could prove very helpful :-). Note though that, if I understood correctly, the file-handle caching of the performance branch will not be reintegrated into 1.7, but maybe 1.8 ... But maybe stefan2 can comment more on that :-). Cheers, -- Johan