Daniel Shahaf wrote: > So we loop over the remaining sha1's and remove each of them... > I wonder if there is room for further optimization here? e.g., does > this prepare/reset the statement just once, or once per iteration?
Each iteration of this loop prepares, uses and resets a SQL statement, and also removes a pristine file from disk. So yes there is room for further optimization of the SQL part of that. The main concern I was addressing was that the previous method was *quadratic* in the total number of pristines in the store, because for each one in the store it would scan the NODES and ACTUAL_NODE tables looking for a reference to it. I had noticed that even a no-op cleanup took a very long time on a large WC. It will help if I show some real timings. Wall clock times for "svn cleanup" on a clean checkout of ^/subversion/branc...@1040943 on my Linux system. r1040662 build: first time = 15 minutes, second = 14.8 minutes. r1040663 build: first time = 4.4s, best of many repetitions = 0.7s. Now the algorithm is only linear time, which is a *huge* win. A 'cleanup' operation doesn't need to be blisteringly fast, so I don't think it needs more optimisation. I've edited the log message to clarify the main point, and to point out the big-WC timing improvement. - Julian # r1040662 build $ time ~/build/subversion-c/subversion/svn/svn cleanup branches/ real 15m4.962s user 9m0.306s sys 6m3.967s # r1040663 build $ time ~/build/subversion-c/subversion/svn/svn cleanup branches/ real 0m0.708s user 0m0.436s sys 0m0.212s