On 11.03.2011 20:13, Greg Stein wrote: > I also don't like to see structures like svn_wc__db_info_t. We had a > big problem with the entry_t, and things like info_t will continue to > propagate that broken model. By definition, to use that structure a > query must be done against both NODES and ACTUAL_NODE.
This comment is somewhat orthogonal to the API discussions, but as I've noted before ... after my relatively brief sojourn in wc-db, I came to the conclusion that having separate NODES and ACTUAL_NODE tables is going to be a perpetual impediment to really speeding up the working copy. I believe this split is a very premature space-vs-speed optimization, and it doesn't even save all that much space, relatively speaking. It wouldn't be so bad if outer joins were reasonably fast in Sqlite, but my measurements at the time showed that they can be several orders of magnitude slower than inner joins. (Merging NODES and ACTUAL_NODE would effectively create a materialized view of a left-joined query over both tables, without the overhead that this implies, and of course ignoring the fact that Sqlite doesn't support materialized views anyway.) When thinking about the API, I suggest the main things to keep in mind should be: * Use the power of SQL. Complex queries and filtering should be done in SQL, not C code. * Whenever possible, perform a single large query and store results in temporary tables for processing, instead of issuing many small queries and combining the results in code. A single query with file-backed cooked results will almost always be faster than a bunch of smaller queries (speedup can range from several times to several orders of magniture, depending on working copy size), /and/ preparing the dataset in a single Sqlite transaction will guarantee that the results returned by the API are a consistent snapshot of WC state. -- Brane -- Brane