Re: Repeated SQL queries when doing 'svn st'

Stefan Fuhrmann Wed, 08 Sep 2010 03:25:47 -0700

Philip Martin wrote:

Branko Čibej <br...@xbc.nu> writes:

 On 06.09.2010 12:16, Philip Martin wrote:

To use a per-directory query strategy we would probably have to cache
data in memory, although not to the same extent as in 1.6.  We should
probably avoid having Subversion make status callbacks into the
application while a query is in progress, so we would accumulate all
the row data and complete the query before making any callbacks.  Some
sort of private svn_wc__db_node_t to hold the results of the select
would probably be sufficient.

I wonder if per-directory is really necessary; I guess I'm worrying
about the case were the WC tree has lots of directories with few files.
Do we not have the whole tree in a single Sqlide DB now? Depending on
the schema, it might be possible to load the status information from the
database in one single query.


Yes, per-tree would probably work but I expect most WCs have more
files than directories so the gains over per-dir would be small.  One
big advantage of doing status per-tree is that it gives a proper
snapshot, the tree cannot be modified during the status walk.  I'm not
pushing per-dir as the final solution, my point is that per-node
SQLite queries are not going to be fast enough.

There are actually two or three reasons why status should
run queries on directory granularity:

* directories usually resemble files in that opening them is
 expensive relative to reading their content
* operation can be canceled in a timely manner (may or may
 not be an issue with huge SQL query results)
* maybe: queries for a specific folder may be simpler / faster
 than for sub-trees (depends on schema)

Also, I don't think there is a need to cache query results.
Instead, the algorithm should be modified to look like this:

dir_status:

   // get all relevant info; each array sorted by name
   stat_recorded = sql_query("BASE + recorded change info of dir entries")
   stat_actual = read_dir()
   prop_changes = sql_query("find prop changes in dir")

// "align" / "merge" arrays and send results to client

   foreach name do
      recorded= has(stat_recorded,name) ? stat_recorded[name] : NULL;
      actual = has(stat_actual,name) ? stat_actual[name] : NULL;
      changed_props = has(prop_changes,name) ? prop_changes[name] : NULL;

      // compare file content if necessary
      if (recorded&& actual && needs_content_check(recorded, actual))
         actual = check_content(name)

      send_node_status(recorded, actual, changed_props)

Only two SQL queries (give or take) per directory.

-- Stefan^2.

Re: Repeated SQL queries when doing 'svn st'

Reply via email to