Hello,
I’ve had a look through the issue tracker and mailing list archives and didn’t find any references to this issue. I also assume that this is a more appropriate mailing list than 'users'. We’ve noticed recently that we have terrible performance when updating a particular directory in our repository. We’ve realised that the poor performance is related to the fact that we have 5,800 or so files in a single directory. (I know! This is far from ideal but we’re a long way into development and reorganising the directory structure at this stage is very difficult.) To give some concrete numbers, we recently re-exported about 10,000 texture files (averaging about 40KB each, or 390MB) and 5,800 shaders (averaging about 4KB each, or 22MB total). Both these files are gzip compressed. Here are some approximate times for ‘svn up’ Textures: 10,000 files, 390MB, ~4 minutes Shaders: 5,800 files, 22MB, ~10 minutes The key point here is that the textures are nicely distributed in a well-organised directory structure, but the shaders are dumped into a single directory. The problem we face now is that we're iterating as lot on the engine, which is causing us to rebuild the shaders every day. To cut a long story short, I ran SysInternals procmon.exe while svn was updating, and saw two alarming behaviours: 1) .svn\entries is being read in its entirety (in 4kb chunks) for *every* file that’s updated in the directory. As the shaders dir contains so many files, it’s approximately 1MB in size. That’s 5,800 reads of a 1MB file (5.8GB in total) for a single update! I know this file is likely to be cached by the OS, but that’s still a lot of unnecessary system calls and memory being copied around. Please excuse my ignorance if there's a compelling reason to re-read this file multiple times, but can't subversion cache the contents of this file when it's updating the directory? Presumably it's locked the directory at this point, so it can be confident that the contents of this file won't be changed externally? 2) subversion appears to generate a temporary file in .svn\prop-base\ for every file that's being updated. It's generating filenames sequentially, which means that when 5,800 files are being updated it ends up doing this: file_open tempfile.tmp? Already exists! file_open tempfile.2.tmp? Already exists! file_open tempfile.3.tmp? Already exists! ...some time later file_open tempfile.5800.tmp? Yes! For N files in a directory, that means subversion ends up doing (n^2 + n)/2 calls to file_open. In our case that means it's testing for file existence 16,822,900 times (!) in order to do a full update. Even with just 100 files in a directory that's 5,050 tests. Is there any inherent reason these files need to be generated sequentially? >From reading the comments in 'svn_io_open_uniquely_named' it sounds like these files are named sequentially for the benefit of people looking at conflicts in their working directory. As these files are being generated within the 'magic' .svn folder, is there any reason to number them sequentially? Just calling rand() until there were no collisions would probably give a huge increase in performance. I appreciate that we're probably an edge case with ~6000 files, but it seems that issue 2) is a relatively straightforward change which would yield clear benefits even for more sane repositories (and across all platforms too). In case it's relevant, I'm using the CollabNet build of subversion on Windows 7 64bit. Here's 'svn --version': C:\dev\CW_br2>svn --version svn, version 1.6.6 (r40053) compiled Oct 19 2009, 09:36:48 Thanks, Paul