<e...@thyrsus.com> wrote: >Julian Foad <julianf...@apache.org>: >> Isn't format 3 documented in the section called "Version 3 format" in >> dump-load-format.txt? >> >It appears to have slipped your mind that the person who wrote most of >that documentation was *me*. And I do still update it occasionally, >most recently about a month ago. If it held the information I needed >I'd already know it and not be bothering this list. > >Since a dev as senior as you has forgotten this, I have to assume the >rest of the list has also forgotten or never knew how >dump-load-format.txt came to exist. So, a reminder: have patience as >I describe it for the list because this bears directly on why "just >use Version 3" is insufficient and the (possibly unintended) >implication that Version 2 might be retired someday is feeply >disturbing. > >I had a very specific motivation for documenting the dump format - >reposurgeon comsumed and generated Subversion dumps, and I know it's a >fragile and dangerous thing when the assumptions of that kind of >reader code are only documented in the code itself. It is much better >practice to write a ground-truth document about the format (or the >parts one uses, anyway) and then have that be the authority for the >code. For another example of the practice, see > >https://gpsd.gitlab.io/gpsd/AIVDM.html > >You have dump-load-format.txt because, having written it, I thought it >was silly for it not to be flying with the Subversion distribution, so >I combined it with some historical notes on the old version 1 format, >and voila. > > >But. The Version 2 documentation I wrote for Subversion is incomplete, >because there were details I could neither find in pre-existing >documentation nor easily discover at the time I wrote the bulk of it >in 2012. And once I made reposurgeon able to read and emit version 2 >dumps, digging deep enough to find out what version 3 was doing never >made it far enough up my priority list that I actually did it. > >Notably: dump-load-format.txt does not describe the delta format. I >have since seen hints in the SVN Book that version 3 uses some kind of >binary delta compression. But the SVN book does not describe either >of these details; it's not even clear enough for me to be sure I'm >not hallucinating the "binary" part. > >> Format 3 makes such a huge difference to data transfer size in >> typical cases, as far as I recall, that it is hard to justify using >> format 2 for anything. > >Oh, *hell* no it isn't. > >Have you ever written an importer-from-Subversion? Other than >svnadmin load, I mean; all you have to verify about that is that it >round-trips streams, which tends to avoid the problems I'm about to >describe. > >If you had ever tried writing other stream analysis tools (I've done >this twice), you would know that they're a very different use case >from transport or archiving and have different tradeoffs. The >bulkiness of the Version 2 stream files is a good trade for its easy >parseability and eyeball-friendliness. > >I have a whole bunch of Subversion dumps in my regression-test suite >for reposurgeon/repocutter, some collected in the wild and some >hand-crafted. It would be *bad* if I couldn't sic a text editor on >any of those to read or modify it. Very bad. Certainly a huge pain in >the ass for me, plausibly a crash landing of I-can-no-longer-support-this >severity. That worst case would leave a lot of users stuck; even if you >don't care about inconveniencing me, please don't risk it for them. > >Plain text blobs for every revision may be fat but it's a super-stable >and discoverable place in the design space, what economists call a >Schelling point; in practice, great future-proofing. Deltas and >compression are *not* future-proofing. Once you start playing that >game, the temptation to iterate on it by improving the >delta/compression pieces can't be resisted, and indeed shouldn't be - >as long as there's still Version 2 dump support for people who want to >evade problems like ... uh, what kind of compression are they using >and how do I unpack it? How do I interpret this diff format? > > >So please do not *ever* think of Version 2 as in any way obsolete or >dispensible. It's got at least one important use case - reposurgeon, >the only tool in the world that can do really lossless conversions to/from >other VCSes needs it. You'd probably be able to hear the screaming from >the direction of my house if you actually dropped it. > >But more generally, it's more future-proof and discoverable than any >space optimization of it you could invent. That in itself is good >enough reason to fully support it, including in svnrdump. It's a >promise to your users: "This is *understandable*. No matter what >wacky things we get up to to optimize the transport/archiving case, >you're not screwed." > >> > Should I file an issue about this? >> >> You can certainly file an issue if there isn't one. > >I will do so. > >> If I were to have a say I would recommend anyone should rather work >> on adding v3 to reposurgeon and addressing any documentation that >> may be lacking. > >Adding v3 to the documentation should certainly be done. If I could >write it I would have already, but I'm more than willing to ask picky >questions of anyone who rights a draft. > >As for adding v3 to reposurgeon, it could be *a* solution but it's not >the *right* solution. It's not the path that delivers the best >guarantee to all your users of "you won't be messed over in ten years >by unintended side effects of optimizations that seemed like good >ideas at the time".
Thanks for your detailed reply, Eric. I can accept your argument of value of v2 dump format for its simplicity for these purposes. I am very well aware you wrote much of the spec doc and worked a lot with it, especially figuring out the semantics; and that's why, seeing its "v3" section, I was puzzled when you wrote simply "it's not documented". I even noticed the language in that section seemed to match your style but didn't have time to go digging with "svn blame". Thanks for explaining what's missing. I haven't tried writing an importer. - Julian