On Sat, Apr 25, 2020 at 11:18 AM Daniel Shahaf <d...@daniel.shahaf.name> wrote:
>
> Karl Fogel wrote on Fri, 24 Apr 2020 13:43 -0500:
> > On 24 Apr 2020, Mark Phippard wrote:
> > >I think this would be a good idea in that it might be one of the last
> > >remaining niches where SVN is a better tool for the job than a DVCS.
> > >I do not think I could contribute though.
> > >
> > >I just wanted to throw another item on the pile. I recall an old
> > >thread (have not been able to find it) where it was shown that a
> > >massive performance win on large binary blobs would be if we could
> > >skip all of the xdelta stuff and just stream the binary.  If I recall
> > >correctly, you can even see and demo this today using WebDAV and just
> > >doing a PUT or whatever is right request with the entire file.  The
> > >server already knows how to handle it and store the file the same as
> > >it would if it had come via a SVN client.  I think there were some
> > >complications with how svndiff0/svndiff1 etc are expected by a
> > >client, but if there were some way to have a property on a file that
> > >caused us to skip all of this, including storing the extra pristine
> > >copy, it could be a big win for managing large binaries with SVN.
> > >
> > >It seems like we could make revert fetch the file from the server
> > >again to restore a binary.
> > >
> > >If I can find any of those old threads I will share them.  So far the
> > >only one I found was about how using a larger xdelta window size
> > >could give better compression, but the thread I recall was about not
> > >doing it at all.  It also assume that the xdelta is of no real value
> > >because it does not shrink the amount of bytes that have to be
> > >transferred.
> >
> > Ah, thanks for this reminder!  I also recall those results (and I guess 
> > they're not surprising).  I'll make sure we keep it in mind if this project 
> > happens.  If you happen to dig up any of the old threads, that'd be great, 
> > but even if you don't, the above information is enough for a developer to 
> > know the possibility exists.
>
> That one doesn't seem like it'd be terribly hard to implement.  The
> data format of svndiff0 enables "Produce the following bytes verbatim"
> to be represented.  There's nothing stopping whoever generates an
> svndiff stream from using that feature of the data format to produce a
> degenerate self-delta (that is, a self-delta that doesn't attempt to
> compress) where currently it would produce a self-delta or a delta
> against the BASE revision, as well as to produce an svndiff0 stream
> even when the other side accepts svndiff1 and/or svndiff2.  We don't
> even need a new wire capability for this.
>
> With this approach files would still be split into SVN_DELTA_WINDOW_SIZE
> bytes -sized windows, so we won't reach the performance of sendfile(2);
> however, I suspect the lion's share of the slowdown is due to the
> deltification and compression steps.
>
> Cheers,
>
> Daniel
>
> P.S. This being users@, clarification: "svndiff0" and "svndiff1" are
> internal binary delta formats that have nothing whatsoever to do with
> the «svn diff» command.

I think Mark was referring to this thread on dev@ from 2017, which was
started by Paul Hammant (I think he was working on a tool for
versioning big directory trees easily, with merkle trees etc ... might
be interesting to get in touch with him):

https://svn.haxx.se/dev/archive-2017-07/0034.shtml

Philip Martin made some interesting suggestions and provided some
numbers, first focusing on the deltification overhead (which he could
eliminate, on the client-side, by enabling SVNAutoversioning and
performing a PUT with curl -- IIUC it's not possible right now to
eliminate deltification on the server-side):

https://svn.haxx.se/dev/archive-2017-07/0040.shtml

and later he also eliminated compression on the server-side, which
yielded another factor 3 speed boost:

https://svn.haxx.se/dev/archive-2017-07/0043.shtml

-- 
Johan

Reply via email to