Re: Why do we check the base checksum so often?

Julian Foad Sat, 04 Feb 2012 09:00:14 -0800

Hyrum K Wright wrote:

> Julian Foad wrote:
>>  Hyrum K Wright wrote:
>>>  The Ev2 shims get in the way of how text deltas are transmitted, by
>>>  reconstituting the full text, and then just streaming that to the
>>>  receiver via svn_txdelta_send_stream().  I've got a patch which
>>>  actually starts reporting the base checksum---which with the shims
>>>  will always be the "empty" checksum---and it turns out that 
>>> such a  patch breaks the World.
>>> 
>>>  The reason for this breakage is that there are several places in both
>>>  the FS and the WC that we check the delta editor's reported base
>>>  checksum against some other value we have on hand which we *think*
>>>  should be the base.  Until now, these checks have always passed, since
>>>  there was an implicit understanding about what the delta editor would
>>>  use as its base.
>>> 
>>>  However, I think that these checks are wrong.  They rely upon an
>>>  implementation detail ("is the delta editor sending a text delta
>>>  against the base we think it ought to?") rather than the result ("did
>>>  we end up with the content we expected to end up with?")
>> 
>>  When we (the WC update code for example) receive a text delta, we apply it
>> to a text base that we already have, in order to create a new text.  We
>> need to be applying it against the correct base [...]
> 
> I understand this principle, but I don't think that's what the API
> is/should be doing.  The apply_textdelta callback is essentially
> saying "apply this delta against the base with this checksum".  In the
> current regime, we know a priori what that base "should" be, so we
> make sure that apply_textdelta spits that information back to us.
> 
> But I don't think that's always a valid assumption.  If the delta
> editor chose some other base to use (in this case, the empty stream),
> and indicated that through the apply_textdelta() base checksum
> parameter, a receiver should be happy to accomodate that request.
> "Why should I use the base you told me to use, when I can use this one
> more efficiently?"


We're talking here about the delta editor (Ev1).  The driver shouldn't have 
free rein to choose any base, because the receiver does not have all possible 
bases at hand ready to apply the delta onto.  At least in the server-to-client 
direction (update etc.) the client probably only has one suitable base text per 
possible file.  Either the server would have to be told what base texts it 
could choose from, or the client would potentially not be able to apply the 
delta until it first asks the server to send it the relevant base text, which 
would pretty much negate the point of having deltified in the first place.  In 
the other direction, of course, we can now start to design protocols where the 
client picks any base text that it knows exists in the repository, and the 
server could be able to access it, now we have the rep-cache and the idea of 
looking up texts by their checksum.  But ... that can't be what you're thinking 
of, I'm sure.

The empty stream is a special case.  It's valid suggestion to say the driver 
should have the option of sending a full text, or a delta against an empty 
stream which is semantically the same thing.  But retro-fitting that onto Ev1 
isn't interesting at this point.

Now, if we talk about Ev2 (I know you're actually looking at  the shims between 
the two), then we've explicitly designed that the mechanism for transferring 
texts is outside the scope of the editor iteself and so the driver and receiver 
code are responsible (assisted by respective layers above them) for 
co-ordinating in any way they want to.  The Ev2 solution for deltifying text 
between driver and receiver could include (warning: possible hair-brained 
ideas): the receiver telling the driver what base texts it has available; the 
driver first choosing a base that's convenient for it, and letting the receiver 
request that base from the driver (out of band) if the receiver doesn't have it 
available; and so on.

I'm not quite sure I fully follow you at the moment, so I'm not sure if my 
reply is on the right track at all, but it's really sounding like you're up 
against a mis-match of responsibilities between Ev1 which sends deltas 
according to particular rules and Ev2 which is designed to be wrapped inside a 
driver-receiver pairing that knows privately how to deltify and recover to full 
text in any way it wants to.  The shims obviously need to convert from the Ev2 
deltification back (via a full text intermediary if necessary) to what Ev1 
expects.

- Julian

Re: Why do we check the base checksum so often?

Reply via email to