Re: pristine store design

Neels J Hofmeyr Tue, 02 Mar 2010 16:51:13 -0800

Julian Foad wrote:
> On Thu, 2010-02-18, Neels J Hofmeyr wrote:
>> Great, moving forward fast on pristine design questions!
> 
> Hi Neels.
> 
> Did you start working the new knowledge into a document?  Lots of stuff
> was said in this thread and it would be useful to see where we are at.


thanks for poking :)

> I have a couple of comments.
> 
> 
> THE PRISTINE-WRITE API
> 
> I was thinking about the "write" API and how an API designed around a
> stream is surely better than one designed around "tell me the path where
> I can put a file".
> 
> The objection to "give me a writable stream and I'll write to it" was
> that the stream close handler wouldn't know whether the stream was
> closed after a successful write or because of an error.  We can rectify
> this by adding a second step: require the caller to call a special
> "commit" API to close the stream on success, and any other way of
> closing the stream would abandon the new content.
> 
>   // Caller passes the checksum if it knows it, or passes NULL and
>   // the store in that case will calculate the checksum.
>   stream = pristine_start_new_file(expected_checksum);
> 
>   [caller writes to stream]
> 
>   // Now the store commits the new text, verifies the checksum
>   // (if it was given an expected one) and returns it.
>   new_checksum = pristine_close_new_file(stream);

Let's also look at the advantages of the current design of
svn_wc__db_pristine_get_tempdir() and _install():

pseudocode:
[[[
 pristine_get_tempdir(&tempdir)

 /* Make tempfile */
 svn_stream_open_unique(&tempstream, &tempfile_path, tempdir)

 /* Write and checksum */
 tempstream = svn_stream_checksummed2(tempstream, &write_checksum)
 write to and close tempstream

 pristine_install(tempfile_path, write_checksum)
]]]

When there is only a write-stream available, some options get lost, while
_get_tempdir() and _install() offer the whole palette of possibilities. For
convenience, we can offer a wrapper for the first three instructions above.
But additionally, we can offer the possibility to determine a file location
*on the same file system device* as the pristine store. This is useful when
we can provide some external tool or specialized function X with a
destination path in advance. We create a tempfile in pristine_temp_dir() and
pass that location to X.  X writes or OS-copies or <whatever>s to the final
device, possibly saving one copy-across-fs-devices operation per pristine.

Also, X may already have a checksum tee in its implementation, obsoleting
the need to calculate again. In that case it's nice that pristine_install()
doesn't care where the checksum came from. (Also, if we are going to
validate the checksum on "every" read, there is no need to be paranoid about
correctness of the checksum when writing -- we anyway have to "expect"
corruptions to occur after any successful write.)

Note that your proposed usage is a subset of the usage provided by
_get_temp_dir() and _install(). Your way does KISS some API complexity
goodbye when thinking in human terms, but the OS will find the
_get_temp_dir() and _install() simpler in some cases :P

The simpler usage of above pseudocode would be:

pseudocode:
[[[
 /* convenience function provided by API or whomever. */
 svn_error_t *
 pristine_get_write_stream(stream_t **write_stream,
                           svn_checksum_t *write_checksum,
                           const char **temp_path)
 {
   pristine_get_tempdir(&tempdir)

   /* Make tempfile */
   svn_stream_open_unique(&tempstream, &tempfile_path, tempdir)

   /* Write checksum to the checksum location provided by caller */
   *write_stream = svn_stream_checksummed2(tempstream, write_checksum)
 }

 ...

 /* calling code */
 {
   pristine_get_write_stream(&stream, &write_checksum, &stream_temp_path)
   write to and close stream
   if (had priori checksum and does not match write_checksum):
     bail
   pristine_install(stream_temp_path, write_checksum)
 }
]]]

This is similar to what you propose, when considering that your
implementation would also need a baton to pass the stream temp path along
(right?). So IMHO it boils down to: Do you think it is better to opaquify
such baton for the sacrifice of not being able to get a write destination on
the target device in advance?

I vote for removing _pristine_write() and for keeping _get_temp_dir() and
_install(), plus adding a convenience function in the spirit of above
pseudocode to make the stream-only case sit comfortably.

Yay?

> 
> Now let's examine the ways in which a caller might want to give new
> content to the store:
> 
>   1. Caller asks for a writable stream and pushes content to that, then
> calls a "commit" function.
> 
>   2. Caller has a readable stream and wants the store to pull from that.
> 
>   3. Caller has a (non-temporary) file and wants the store to read from
> that file.
> 
>   4. Caller has to create a temporary file for reasons beyond its
> control (output of an external tool perhaps) and wants the store to take
> the entire file by an atomic move if possible. This is the case where it
> would be more efficient if it know where to put the file in the first
> place.
> 
> The caller can easily implement 2 and 3 in terms of an API that provides
> 1, so that just leaves 1 and 4 that are worthwhile as an API.
> 
> I feel that (1) is by far the more important one to have, and (4) is a
> specialist optimisation.

...which may nevertheless come up and is not that different in its
implementation, IMHO. As above. Internal API.

> 
> 
> VERIFYING CHECKSUMS
> 
> I didn't read everything you were discussing but I got worried by
> hearing about providing options for the caller to request checksums to
> be verified or not per call.  That sounds like too much complexity.  I'm
> sure we should start with a global compile-time verification enable
> switch, and if we really find we need more fine-grained control then we
> should consider how to provide it then.  It might not need an API flag:
> for example we might decide it should automatically verify on the first
> read and once in every hundred reads, or all sorts of internal
> possibilities like that.

Yes, during pristine_read(), we will want to switch on checksum validation
by default, until we find the need to do otherwise. We argued so far that
the API could checksum along internally, perform the checksum validation on
stream close, and just ignore the calculated checksum for partial reads,
which does not happen a lot.

The check mode, which I presume you are worried about, comes in when we want
to determine whether we have a valid pristine or not in various situations.
Previously, I ended up having these checkmodes:

_usable: "The pristine exists in both DB and file system and
          size and mtime match up"

_known:  "Take any one indication that a pristine exists at
          face value, do that in any way, but fast please."

_valid:  "In addition to checking DB and filesystem, read
          the whole pristine file and checksum it."

Thinking about it now, we can lose _valid. That can be done by checking
_usable and pumping the stream provided by pristine_read(). Or *only* a
pristine_read() and stream pump, even.

We were also ambiguous about _known. Some argue that it's too special, while
it makes sense to me (in case of many checks done for heuristics) and Greg
says he remembers to have known such a use case to exist, heh.

But I now think we should drop the enum svn_wc__db_checkmode_t entirely.
Let's provide the _usable check as a single function in 1.7. If we find the
need, we can add a faster check in a later release, or we can even make this
single check function faster by never checking mtime and len (except when
_read()ing).

Having gone that far, one could argue that we need no _check function at
all, we can just have pristine_read(). If it returns a stream, it indicates
existence and usability, just close it and be merry. Since _usable will
anyway cause disk I/O when we decide to check mtime and size via fstat, the
argument that we want a check function that causes no disk I/O is half-hearted.
On the other hand, if we were only having a _read() API function, we would
want to have a convenience function that closes the stream for those callers
that just want _usable. In this case, the convenience function would provide
practically the same API as providing a separate _check() function in the
first place, just slightly less optimal (causing file preread instead of
just fstat traffic). It would also blur the option of easily making all the
check() use cases faster by a central handle. Some more pseudocode to
illustrate:

No check() function:

  pristine_read(stream_t **stream):
    if (static_check_mtime_and_len() == match):
      *stream = <open checksumming stream>

  convenience_pristine_check(svn_boolean_t *exists):
    pristine_read(&stream)
    *exists = (stream != NULL)
    if (stream):
      <close stream>            <-- no need to have opened it


With check() function in the API:

  pristine_read(stream_t **stream):
    /* same as above */

  pristine_check(svn_boolean_t *exists):
    *exists = (static_check_mtime_and_len() == match)


I still favor the explicit check() function included in the API and keep
with my above suggestion for 1.7. We can provide a second, faster check
function later or even make the single check function faster.

> 
> 
>> The one thing left now is:
>>> Can someone explain a motivation for even creating a database row before
>>> the pristine file is moved into place in the pristine store? I currently
>>> don't see why it can't be way simpler.
> [...]
> 
> I would just write it down the way you think it should be in the main
> flow of your document, and mention outstanding questions like this in
> notes.
> 
> "Simultaneous or multi-threaded clients" would be my first reaction to
> that particular question.

You guys already agreed on what I think in the other two posts, i.e. that we
just let people write concurrently. With pristine_install(), there isn't
anything that can go wrong concerning concurrency -- assuming a 'mv' is
atomic that's obvious, but even if not, the concurrent processes should
anyway write the same values to the same file offsets. (not given when we
have compressed pristines! see below)

Q: Is there any deadlock / error with concurrent OS 'mv's to the same
location? I am assuming not, but am not 100% sure. If there turns out to be
one, we can still create a lock-file write lock thang (which could also
serve to prevent two separate processes from downloading the same pristine
twice, if we choose to cover that).


However, I am now also thinking about database consistency. If one
concurrent process wins the database race, while another wins the file
system race, there may be a mismatching mtime in the database (questionable
if the mtime resolution is no less than a whole eternal second, but it still
remains possible; also, mtime currently does not exist in the schema but
possibly should). Would this problem go away if we mv the pristine file into
place while the DB transaction is open? Would SQLite mutex the two processes
apart for free? (talking just about the final 'mv' during _install())

The same problem as with mtime is more aggravating with compression. If one
process decides to compress the pristine while the other decides to write it
out plainly, we have a concurrency problem in *both* DB and on disk. We
could avoid this by having absolute per-pristine-store rules in place about
when a pristine is compressed, e.g. compress all those being larger than a
certain size. Alternatively, we could append a .gz/.bz2/.zip suffix to the
checksum filename for compressed files, doing away with concurrency problems
in the file system. It would then not matter which process wins the DB race,
but the pristine would exist twice (once compressed, once uncompressed).

We have not put any thought in implementing compression yet. I think we
should drop compression from 1.7, and plan to provide an opaque compression
mechanism behind pristine_install() based on absolute rules.

Or maybe not, because disk space is too cheap to bother and no-one has
called for a compressed pristine store because we already put some kind of
high-water mark in place. Who knows.

(I don't like pristine compression. If we're optimizing for disk space, why
store it in the first place. It makes a lot of things way more complex.)

Either way, it would be nice to not have a "locked" state in the pristine
database. E.g. if we allow the database row to be able to say "I'm busy
writing this file", then we need to decide what the other process should do
in that case. Wait indefinitely? I'd prefer not to have that, but rather
have the option for lock-file based optimization to prevent the same content
from being downloaded twice at the same time, emphasis on *option*. Lock
files are cleaned up when a process dies. Not so the DB row.



<Reminding myself that the need for MD5 sums forces us to have some storage
over and above the pristine itself. Still dreaming of a DB-less pristine
store like
  echo "content" > SHA1SUMSHA1SUMSHA1SUM
  touch MD5SUMMD5SUM
  ln -s MD5SUMMD5SUM SHA1SUMSHA1SUMSHA1SUM.md5
but that's probably nonsense. Having a DB is also better for implementing
future features.>


Comments welcome!

As you can see, reading this thread again now, a little while later, has
brought some more things to the surface. I shall cast a newer version of the
pristine API design doc I started, but my time for today is *so* over :)

*neels feels the early morning breath of a two-year-old down his sleepy neck

~Neels

signature.asc
Description: OpenPGP digital signature

Re: pristine store design

Reply via email to