I've been looking at which bugs can be closed by the pending
completion of wc-ng.  In particular, I've been looking at issue #1284:
http://subversion.tigris.org/issues/show_bug.cgi?id=1284

The issue is essentially a complaint about the performance of 'svn mv
somedir targetdir' when the number of files in somedir is very large,
in the thousands.  It's a performance bug with no real good feeling
for done, but given the performance improvements of wc-ng, I'm marking
it as so.

What I thought people would be interested in are the actual numbers I
used to justify the bug closure.  Although I narrow use case, I think
this provides an interesting window into wc-ng.  The script used to
generate these results is attached to the issue:
http://subversion.tigris.org/nonav/issues/showattachment.cgi/1156/test.sh

The script runs the experiment for a given number of trials for
several directory sizes, on both trunk and 1.6.x.  I also ran it on a
couple of different machines: my OS X laptop, which has a relatively
slow disk, and a Linux box with a relatively speedy disk.  The
following numbers are all arithmetic means over the various trials for
a given scenario.  Times are in seconds.  (Standard deviations within
each scenario were all relatively small.)

                 Slow Disk
                 =========

 Files  |  1.6.x       Trunk   |  Improvement
--------+----------------------+---------------
    10  |    0.98        0.97  |        -
   100  |    0.97        0.98  |        -
  1000  |   17.92        9.16  |    48.9%
  5000  |  440.92       75.71  |    82.9%


                 Fast Disk
                 =========

 Files  |  1.6.x       Trunk   |  Improvement
--------+----------------------+-------------
    10  |    0.07        0.10  |        -
   100  |    0.33        0.19  |    42.4%
  1000  |   18.31        2.23  |    87.8%
  5000  |  322.10       34.08  |    89.4%

I haven't run any higher statistical analysis on these, but the raw
numbers are compelling.  For both fast and slow disks, wc-ng speeds up
this particular operation by *a lot*, and the difference increases as
the number of files.  I suspect the same may be true for other
disk-intensive activities in a single directory.

Another observation is that the trunk times for the fast disk are much
different than those of a slow disk, but I'm not really sure what to
divine into that (the boxes also have different CPU speeds, cache
sizes, etc.)

Still interesting stuff, though.

-Hyrum

Reply via email to