On Sat, Apr 25, 2020 at 11:18 AM Daniel Shahaf <d...@daniel.shahaf.name> wrote: > > Karl Fogel wrote on Fri, 24 Apr 2020 13:43 -0500: > > On 24 Apr 2020, Mark Phippard wrote: > > >I think this would be a good idea in that it might be one of the last > > >remaining niches where SVN is a better tool for the job than a DVCS. > > >I do not think I could contribute though. > > > > > >I just wanted to throw another item on the pile. I recall an old > > >thread (have not been able to find it) where it was shown that a > > >massive performance win on large binary blobs would be if we could > > >skip all of the xdelta stuff and just stream the binary. If I recall > > >correctly, you can even see and demo this today using WebDAV and just > > >doing a PUT or whatever is right request with the entire file. The > > >server already knows how to handle it and store the file the same as > > >it would if it had come via a SVN client. I think there were some > > >complications with how svndiff0/svndiff1 etc are expected by a > > >client, but if there were some way to have a property on a file that > > >caused us to skip all of this, including storing the extra pristine > > >copy, it could be a big win for managing large binaries with SVN. > > > > > >It seems like we could make revert fetch the file from the server > > >again to restore a binary. > > > > > >If I can find any of those old threads I will share them. So far the > > >only one I found was about how using a larger xdelta window size > > >could give better compression, but the thread I recall was about not > > >doing it at all. It also assume that the xdelta is of no real value > > >because it does not shrink the amount of bytes that have to be > > >transferred. > > > > Ah, thanks for this reminder! I also recall those results (and I guess > > they're not surprising). I'll make sure we keep it in mind if this project > > happens. If you happen to dig up any of the old threads, that'd be great, > > but even if you don't, the above information is enough for a developer to > > know the possibility exists. > > That one doesn't seem like it'd be terribly hard to implement. The > data format of svndiff0 enables "Produce the following bytes verbatim" > to be represented. There's nothing stopping whoever generates an > svndiff stream from using that feature of the data format to produce a > degenerate self-delta (that is, a self-delta that doesn't attempt to > compress) where currently it would produce a self-delta or a delta > against the BASE revision, as well as to produce an svndiff0 stream > even when the other side accepts svndiff1 and/or svndiff2. We don't > even need a new wire capability for this. > > With this approach files would still be split into SVN_DELTA_WINDOW_SIZE > bytes -sized windows, so we won't reach the performance of sendfile(2); > however, I suspect the lion's share of the slowdown is due to the > deltification and compression steps. > > Cheers, > > Daniel > > P.S. This being users@, clarification: "svndiff0" and "svndiff1" are > internal binary delta formats that have nothing whatsoever to do with > the «svn diff» command.
I think Mark was referring to this thread on dev@ from 2017, which was started by Paul Hammant (I think he was working on a tool for versioning big directory trees easily, with merkle trees etc ... might be interesting to get in touch with him): https://svn.haxx.se/dev/archive-2017-07/0034.shtml Philip Martin made some interesting suggestions and provided some numbers, first focusing on the deltification overhead (which he could eliminate, on the client-side, by enabling SVNAutoversioning and performing a PUT with curl -- IIUC it's not possible right now to eliminate deltification on the server-side): https://svn.haxx.se/dev/archive-2017-07/0040.shtml and later he also eliminated compression on the server-side, which yielded another factor 3 speed boost: https://svn.haxx.se/dev/archive-2017-07/0043.shtml -- Johan