Hi Bert,

In short: excellent work! With your latest changes,
all algorithmic issues have been resolved as already
confirmed on IRC. Two places in the server could
also be improved (r1340874-5).

Note to package builders: Sqlite 3.7.7 will not use
all available indexes while 3.7.12 does (an possibly
earlier ones). Deletion will still be slow with those
old versions.

To illustrate how much performance improved since
April here the original results:

Processing 16384 files in the same folder
    Creating files ...      real   user    sys
    Adding files ...       10.174  8.317  1.780
    Running status ...     0.235  0.196  0.036
    Commit files ...       19.434  8.997  6.116
    Listing files ...      0.157  0.064  0.012
    Updating files ...     0.312  0.292  0.012
    Local copy ...         5228.840  4998.244  215.085
    Commit copy ...        3.398  2.304  1.012
    Delete 1 file ...      0.275  0.260  0.012
    Deleting files ...     5025.387  4521.755  455.236
    Commit deletions ...    1681.892  1508.402  94.458
    Export all ...      1.758  1.012  0.720
    Check out all ...      17.582  9.533  7.828

Now look at HEAD results:

Processing 16384 files in the same folder
    Creating files ...      real   user    sys
    Adding files ...       9.590  7.896  1.620
    Running status ...     0.135  0.084  0.048
    Commit files ...       14.130  5.676  4.908
    Listing files ...      0.105  0.036  0.012
    Updating files ...     0.267  0.244  0.016
    Local copy ...         10.023  5.344  4.592
    Commit copy ...        0.675  0.568  0.064
    Delete 1 file ...      0.009  0.004  0.004
    Deleting files ...     6.686  4.892  1.744
    Commit deletions ..    17.530  4.016  2.320
    Export all ...      1.259  0.604  0.632
    Check out all ...      14.026  6.640  7.236

(some ra_svn improvements can be seen in list,
export and such).

-- Stefan^2.

Bert Huijben wrote:

-----Original Message-----
From: Stefan Fuhrmann [mailto:stefanfuhrm...@alice-dsl.de]
Sent: zaterdag 19 mei 2012 20:34
To: Subversion Development
Subject: Working copy operations that are worse than O(change size)

Hi Bert&  interested parties,

as promised, I ran the create_bigdir.sh benchmark with the
latest commit harvester improvements (r1340484).  While
I do see some speedup, the following operations have runtime
that is not proportional to the size of the change,  i.e. they
exhibit O(n^2) behavior.

All statements are run in directories that are directly below
the WC root and contain(ed) no sub-directories.

* svn cp WC/dir1 WC/dir2
    because STMT_INSERT_WORKING_NODE_COPY_FROM_BASE
    is O(entries in dir1)
This query was O(entries-in-wc). This query is O(n) now.
(It is evaluated for every node that is copied)

* svn del WC/dir/file
    because STMT_HAS_SERVER_EXCLUDED_NODES
This query is no longer evaluated for files that are deleted; only for BASE
directories that are deleted.
And this query properly uses an index now.

    STMT_INSERT_DELETE_LIST
I have an alternative implementation that should be more efficient if I look
at the query plan, but my measurements don't show a difference.

/* This matches the selection in STMT_INSERT_DELETE_FROM_NODE_RECURSIVE.
    A subquery is used instead of nodes_current to avoid a table scan */
-- STMT_INSERT_DELETE_LIST
INSERT INTO delete_list(local_relpath)
SELECT local_relpath FROM nodes AS n
WHERE wc_id = ?1
   AND (local_relpath = ?2
        OR IS_STRICT_DESCENDANT_OF(local_relpath, ?2))
   AND op_depth>= ?3
   AND op_depth = (SELECT MAX(s.op_depth) FROM nodes AS s
                   WHERE s.wc_id = ?1
                     AND s.local_relpath = n.local_relpath)
   AND presence NOT IN ('base-deleted', 'not-present', 'excluded', 'absent')

This one uses the right index, while the current one doesn't.

It appears that for me most time is spend in the INSERT part; not in the
SELECT.
(Even if I change it to a simple INSERT, the profiled result is the same)

    STMT_DELETE_NODES_RECURSIVE
Split in two queries. Uses the right index now

    STMT_INSERT_DELETE_FROM_NODE_RECURSIVE
    are O(entries in dir).
I can't find an O(nodes) behavior on this query. It does use the right
indexes.
(And the query plan contains the pristine refcount triggers)
* svn ci WC/dir
    after deleting every single file in dir
    is O(files in dir ^ 2) because
    STMT_SELECT_BASE_NODE_LOCK_TOKENS_RECURSIVE
This query uses an index now (was table scan)

    is O(files in dir) in both calls of from
    svn_wc__db_base_get_lock_tokens_recursive()
    and STMT_DELETE_NODES_RECURSIVE is
Split in two queries. The delete and relpath revert code paths use an index
now.
(That leaves the wcroot revert case, which really needs a table scan)

    O(files in dir) as well.

It would be nice if these queries could be fixed for 1.8.
I think most are fixed now :)


Most of these show huge performance improvements on large working copies,
but on the small working copies from our test suite a table scan is often
slightly faster than an index lookup, followed by looking up the actual
record.

        Bert



Reply via email to