On 02.08.2010 12:32, Bert Huijben wrote:
I don't think there is a specific per folder check like this, but retrieving
specific data about just one node (instead of its folder) will be *much*
faster than in the old entries store. With the entries files we had to read
the entire file in all cases, but a real database doesn't have that
limitation.
For all metadata except for pristine files we only have to open one file and
sqlite just seeks to the right locations to fetch the data using its
indexes.
For AnkhSVN I'm thinking about splitting the status cache in two layers,
instead of doing a 'svn status' per folder like we do in 1.6. (I think
TortoiseSVN might do the same thing, but maybe it calls status with depth
infinity)
Yes, TSVN does the same: one 'svn st' per folder with depth immediate.
Getting information from the working copy per individual file will be so
much cheaper than before, that I will look for metadata changes first (and
cache only a fraction of the informational details I used to cache before)
and only when I really need to, I will perform the pristine file comparison.
(I don't know yet if I will use svn_(client|wc)_status for this or by just
calling svn_wc_text_modified_p2() myself).
I would imagine that TortoiseSVN's folder glyph status would be calculated
much faster by using a similar strategy: First check if there is a metadata
change or conflict somewhere in the tree (keeping track of translated
filesize + filedate as these will be useful in the next step).
(This would be +- svn_client_infoX(). This should also inform you of any
property changes (I don't know if it already does that; but the information
in our internal API's is there now))
If there is such a status: just set the right glyph (early out; no need to
check any pristine files)
So basically use svn_client_info() instead of svn_client_status(), then
only check the status for files that don't have a defined status yet
from that info. That seems like a good idea - a lot of work to rewrite
the existing code, but it should be worth it.
And only if there isn't a status perform the svn_wc_text_modified_p2() calls
where needed.
Would this API get renamed to svn_client_*? Or should I risk calling an
svn_wc_ API? It's still not clear whether the svn_wc_ APIs will get made
private as was discussed before.
Your disk cache (via its hook) knows which on-disk files changed since the
last scan, so it can handle this much smarter than the simple algorithm in
svn_(client|wc)_status, which is mostly optimized for running in a cold
cache situation.
Instead of just one timestamp to compare to, you have more information: the
current on disk-time and the information that a file just changed. And only
if the file was modified in the last run, or when it's time is different
than the stored and your previous on-disk time you have to perform the
check.
I think this would require some redesign on your current cache strategy (It
certainly does for AnkhSVN), but the fact that you can now perform status
updates per file instead of per directory by itself should open room for
performance improvement. (I hope to solve some worse scenarios in AnkhSVN on
directories containing a lot of files with this)
I'll start with the design soon. This will take quite a while until it
works properly...
Something else I use quite a lot in TSVN and especially the cache is a
quick check whether a folder is versioned or not, simply by checking
whether an .svn folder exists or not. Again here I only need to know
whether it's *maybe* versioned. If there's no .svn folder, I *know* it's
not versioned but if there is, I call the svn APIs and would get an
error in return if e.g. the .svn folder is empty or corrupted.
But with the single db design, there won't be .svn folders anymore
except for the root of the wc?
So is there an (almost as) fast way to check whether a folder is
versioned or not?
I think the fastest way in the current code would be to call
svn_wc_read_kind() on the directory, maybe after first checking that there
is some .svn in at least one of the parent directories.
I thought about implementing a small cache for that, so that I don't
have to walk up the tree every time to find an .svn dir.
But I thought I read something about such a small cache getting
implemented in the svn library itself so I wanted to ask first - maybe
there's already an API to use that cache. Or maybe I just remember it wrong.
The effect on single-db would be: open sqlite file (if not cached) and query
two rows by using its primary key, via an index.
(I think that function currently does the same queries twice; but that is on
my TODO list).
Did you try compiling Subversion with the SVN_WC__SINGLE_DB and SINGLE_DB
defined in wc.h yet? (This enables the experimental single-db mode)
It should give some impression on what you can expect with single-db. (I
think the current status is about 40 testfailures (9 in the upgrade tests),
but it almost reduces the testsuite time by 50% compared to multi-db)
I don't like to build the TSVN nightlies with such experimental features
yet. Once the features get into trunk without compile switches, I will
of course start using them. But as long as they're not activated, I
think I'll stay away from those. Not just because they might be too
unstable, but mostly because that means the APIs still change a lot and
that's just too much work for me to adjust TSVN every time. There's
enough work to be done in TSVN itself :)
Stefan
--
___
oo // \\ "De Chelonian Mobile"
(_,\/ \_/ \ TortoiseSVN
\ \_/_\_/> The coolest Interface to (Sub)Version Control
/_/ \_\ http://tortoisesvn.net