On Mon, Nov 29, 2010 at 07:43:40PM -0600, Hyrum K. Wright wrote: > We use callbacks extensively throughout our code as a means of > providing streamy feedback to callers. It's a pretty good paradigm, > and one that has served us well. We don't put many restrictions on > what the callbacks can do in terms of fetching more information or > calling other functions. > > Enter wc-ng. > > Stefan's patch to make a recursive proplist much more performant > highlights the great benefit that our sqlite-backed storage can have. > However, he reverted it due to concerns about the potential for > database contention. The theory was that the callback might try and > call additional wc functions to get more information, and such nested > statements weren't healthy for sqlite. We talked about it for a bit > in IRC this morning, and the picture raised by this issue was quite > dire. > > In an attempt to find out what the consequences of these nested > queries are, I wrote a test program to attempt to demonstrate the > failure, only now I can't seem to do so. Attached is the test > program, but when I run it, I'm able to successfully execute multiple > prepared statements on the same set of rows simultaneously, which was > the concern we had about our callback mechanism in sqlite. > > So is this a valid problem? If so, could somebody use the attached > test program to illustrate it for those of us who may not fully > understand the situation?
Some more thoughts on this: We could forbid callbacks from calling any other libsvn_client function, and provide all information they will ever need as arguments. Careful consideration of possible use cases for an API would allow us to do this. E.g. for proplist, it's clear that the callback will want a path and a list of properties. It wouldn't be allowed to retrieve additional information. This approach keeps memory usage during a crawl as low as possible and could get away with a single query per operation. However, this approach might break existing third party code using callbacks which call into libsvn_client. It won't be trivial to support 1.6.x semantics ("you're free to call whatever you want") on top of a 1.7.x API placing restrictions on the callbacks. Another possibility is partitioning the problem by running queries per directory. The walk_children crawler would crawl per directory instead of per path. All queries work in units of directories and no callbacks are invoked while an sqlite transaction is active. Callbacks are free to do whatever they want, but the crawler will need to deal with resulting edge cases such as the current directory disappearing from the DB or from disk (our current node walker seems to account for missing paths only before the first invocation of the callback -- it doesn't reckon with the callback deleting paths or removing relevant data from the DB.) This approach will cause memory usage to be a function of the number of directory entries. It will also be slower than the first approach because one or more queries are issued per directory. Given our backwards-compatibility policy I'd say we'd have to go with the second option. Can anyone think of any other options? Stefan