Daniel Shahaf wrote:
> So we loop over the remaining sha1's and remove each of them...
> I wonder if there is room for further optimization here?  e.g., does
> this prepare/reset the statement just once, or once per iteration?

Each iteration of this loop prepares, uses and resets a SQL statement,
and also removes a pristine file from disk.  So yes there is room for
further optimization of the SQL part of that.

The main concern I was addressing was that the previous method was
*quadratic* in the total number of pristines in the store, because for
each one in the store it would scan the NODES and ACTUAL_NODE tables
looking for a reference to it.  I had noticed that even a no-op cleanup
took a very long time on a large WC.  It will help if I show some real
timings.

Wall clock times for "svn cleanup" on a clean checkout of
^/subversion/branc...@1040943 on my Linux system.

  r1040662 build: first time = 15 minutes, second = 14.8 minutes.

  r1040663 build: first time = 4.4s, best of many repetitions = 0.7s.

Now the algorithm is only linear time, which is a *huge* win.  A
'cleanup' operation doesn't need to be blisteringly fast, so I don't
think it needs more optimisation.

I've edited the log message to clarify the main point, and to point out
the big-WC timing improvement.

- Julian


# r1040662 build
$ time ~/build/subversion-c/subversion/svn/svn cleanup branches/
real    15m4.962s
user    9m0.306s
sys     6m3.967s

# r1040663 build
$ time ~/build/subversion-c/subversion/svn/svn cleanup branches/
real    0m0.708s
user    0m0.436s
sys     0m0.212s



Reply via email to