On 02.08.2010 12:32, Bert Huijben wrote:

I don't think there is a specific per folder check like this, but retrieving
specific data about just one node (instead of its folder) will be *much*
faster than in the old entries store. With the entries files we had to read
the entire file in all cases, but a real database doesn't have that
limitation.

For all metadata except for pristine files we only have to open one file and
sqlite just seeks to the right locations to fetch the data using its
indexes.

For AnkhSVN I'm thinking about splitting the status cache in two layers,
instead of doing a 'svn status' per folder like we do in 1.6. (I think
TortoiseSVN might do the same thing, but maybe it calls status with depth
infinity)

Yes, TSVN does the same: one 'svn st' per folder with depth immediate.

Getting information from the working copy per individual file will be so
much cheaper than before, that I will look for metadata changes first (and
cache only a fraction of the informational details I used to cache before)
and only when I really need to, I will perform the pristine file comparison.
(I don't know yet if I will use svn_(client|wc)_status for this or by just
calling svn_wc_text_modified_p2() myself).

I would imagine that TortoiseSVN's folder glyph status would be calculated
much faster by using a similar strategy: First check if there is a metadata
change or conflict somewhere in the tree (keeping track of translated
filesize + filedate as these will be useful in the next step).
(This would be +- svn_client_infoX(). This should also inform you of any
property changes (I don't know if it already does that; but the information
in our internal API's is there now))
If there is such a status: just set the right glyph (early out; no need to
check any pristine files)

So basically use svn_client_info() instead of svn_client_status(), then only check the status for files that don't have a defined status yet from that info. That seems like a good idea - a lot of work to rewrite the existing code, but it should be worth it.

And only if there isn't a status perform the svn_wc_text_modified_p2() calls
where needed.

Would this API get renamed to svn_client_*? Or should I risk calling an svn_wc_ API? It's still not clear whether the svn_wc_ APIs will get made private as was discussed before.

Your disk cache (via its hook) knows which on-disk files changed since the
last scan, so it can handle this much smarter than the simple algorithm in
svn_(client|wc)_status, which is mostly optimized for running in a cold
cache situation.

Instead of just one timestamp to compare to, you have more information: the
current on disk-time and the information that a file just changed. And only
if the file was modified in the last run, or when it's time is different
than the stored and your previous on-disk time you have to perform the
check.


I think this would require some redesign on your current cache strategy (It
certainly does for AnkhSVN), but the fact that you can now perform status
updates per file instead of per directory by itself should open room for
performance improvement. (I hope to solve some worse scenarios in AnkhSVN on
directories containing a lot of files with this)

I'll start with the design soon. This will take quite a while until it works properly...

Something else I use quite a lot in TSVN and especially the cache is a
quick check whether a folder is versioned or not, simply by checking
whether an .svn folder exists or not. Again here I only need to know
whether it's *maybe* versioned. If there's no .svn folder, I *know* it's
not versioned but if there is, I call the svn APIs and would get an
error in return if e.g. the .svn folder is empty or corrupted.
But with the single db design, there won't be .svn folders anymore
except for the root of the wc?
So is there an (almost as) fast way to check whether a folder is
versioned or not?

I think the fastest way in the current code would be to call
svn_wc_read_kind() on the directory, maybe after first checking that there
is some .svn in at least one of the parent directories.

I thought about implementing a small cache for that, so that I don't have to walk up the tree every time to find an .svn dir. But I thought I read something about such a small cache getting implemented in the svn library itself so I wanted to ask first - maybe there's already an API to use that cache. Or maybe I just remember it wrong.


The effect on single-db would be: open sqlite file (if not cached) and query
two rows by using its primary key, via an index.
(I think that function currently does the same queries twice; but that is on
my TODO list).


Did you try compiling Subversion with the SVN_WC__SINGLE_DB and SINGLE_DB
defined in wc.h yet? (This enables the experimental single-db mode)

It should give some impression on what you can expect with single-db. (I
think the current status is about 40 testfailures (9 in the upgrade tests),
but it almost reduces the testsuite time by 50% compared to multi-db)

I don't like to build the TSVN nightlies with such experimental features yet. Once the features get into trunk without compile switches, I will of course start using them. But as long as they're not activated, I think I'll stay away from those. Not just because they might be too unstable, but mostly because that means the APIs still change a lot and that's just too much work for me to adjust TSVN every time. There's enough work to be done in TSVN itself :)

Stefan


--
       ___
  oo  // \\      "De Chelonian Mobile"
 (_,\/ \_/ \     TortoiseSVN
   \ \_/_\_/>    The coolest Interface to (Sub)Version Control
   /_/   \_\     http://tortoisesvn.net

Reply via email to