Re: A two-part vision for Subversion and large binary objects.

Julian Foad Fri, 21 Jan 2022 03:15:12 -0800

I have been studying when this implementation fetches pristines. Two
concerns about performance in the current implementation:


1. scanning the whole subtree, calling 'stat' on every file

2. premature hydrating


Scanning with 'stat'

I'm concerned about the implementation scanning the whole subtree,
calling 'stat' on every file to determine whether the file is "changed"
(locally modified). This is done in svn_wc__textbase_sync() with its 
textbase_walk_cb().

It does this scan on every sync, which is twice on every syncing
operation such as diff.

Don't we already have an optimised scan for local modifications
implemented in the "status" code? Could we re-use this?


Premature Hydrating

The present implementation "hydrates" (fetches missing pristines) every
file within the whole subtree the operation targets. This is done by
every major client operation calling svn_client__textbase_sync() before
and afterwards.

That is pessimistic: the operation may not actually touch all these
files if limited in any way such as by

  - depth filtering
  - other filtering (changelist, properties-only, ...)
  - terminating early (e.g. output piped to 'head')

That introduces all the fetching overhead for the given subtree as a
latency before the operation shows its results, which for something
small at the root of the tree such as "svn diff --depth=empty
--properties-only ./" may make a significant usability impact.

Presumably we could add the depth and some other kinds of filtering to
the tree walk. But that will always leave terminating early, and
possibly other cases, sub-optimal.

I would prefer a solution that defers the hydrating until closer to the
moment of demand.


Evgeny, have you looked into these possibilities at all? What are your
thoughts about these?

- Julian

Re: A two-part vision for Subversion and large binary objects.

Reply via email to