I didn't miss any part. You're optimizing writes, when you should worry about reads.
The DB is always open, so reading and writing to it is "cheap". I don't care about a scheme to seek to the end of a 16Gb chunk. You're making stuff up again. On Mar 3, 2010 9:45 AM, "Stefan Sperling" <s...@elego.de> wrote: On Wed, Mar 03, 2010 at 12:24:29PM -0500, Greg Stein wrote: > You're talking about schemes to verify... I didn't say that. I'd very much like svn to verify data it reads from the pristine store, on the fly, and point out corrupted pristines to the user. > You're talking about splitting files for certain filesystems to help with > size limitations, yet... Those are side-issues. What Neels and I are trying to get rid of is the need for locking when writing to the pristine store. You missed the part of not storing data in an sqlite DB which will never change once written. We need to store the MD5 of every pristine somewhere, for instance. If we do store this data in a DB, writing to the pristine store requires synchronising access to the DB to keep the DB in a consistent state, on top writing the pristine itself. Writing the pristine itself is already lockless, and also writing the MD5 while at it means we wouldn't need any locking. > Putting data in the file means you have to *open* it to read the data. We're opening and reading pristines anyway. Reading pristines is disk i/o we cannot avoid. The proposed scheme even minimises I/O in case we need only a chunk near the end of a file: Seek across a few SHA1 checksums, read a SHA1 checksum, then open the pristine with that checksum, instead of seeking an entire 16GB pristine until the right block has been found. Granted, reading an entire huge pristine involves opening a number of other prisitines. Not sure which is better. > Again: we are centralizing in order to aggregate data and reduce I/O. Your > idea defeats that go... Is writing another few bytes to the file slower than writing to the file and then opening the DB and modifying the DB, possibly waiting for another process to unlock the DB, so we can store the MD5? Stefan