Julian Foad <julianf...@apache.org>: > Eric S. Raymond wrote: > > Reposurgeon can't handle the Version 3 format with deltas, and there > > is no realistic possibility that this will change because the format > > is not documented anywhere. > > Isn't format 3 documented in the section called "Version 3 format" in > dump-load-format.txt? > http://svn.apache.org/viewvc/subversion/trunk/notes/dump-load-format.txt?revision=1884689&view=markup#l503
It appears to have slipped your mind that the person who wrote most of that documentation was *me*. And I do still update it occasionally, most recently about a month ago. If it held the information I needed I'd already know it and not be bothering this list. Since a dev as senior as you has forgotten this, I have to assume the rest of the list has also forgotten or never knew how dump-load-format.txt came to exist. So, a reminder: have patience as I describe it for the list because this bears directly on why "just use Version 3" is insufficient and the (possibly unintended) implication that Version 2 might be retired someday is feeply disturbing. I had a very specific motivation for documenting the dump format - reposurgeon comsumed and generated Subversion dumps, and I know it's a fragile and dangerous thing when the assumptions of that kind of reader code are only documented in the code itself. It is much better practice to write a ground-truth document about the format (or the parts one uses, anyway) and then have that be the authority for the code. For another example of the practice, see https://gpsd.gitlab.io/gpsd/AIVDM.html You have dump-load-format.txt because, having written it, I thought it was silly for it not to be flying with the Subversion distribution, so I combined it with some historical notes on the old version 1 format, and voila. But. The Version 2 documentation I wrote for Subversion is incomplete, because there were details I could neither find in pre-existing documentation nor easily discover at the time I wrote the bulk of it in 2012. And once I made reposurgeon able to read and emit version 2 dumps, digging deep enough to find out what version 3 was doing never made it far enough up my priority list that I actually did it. Notably: dump-load-format.txt does not describe the delta format. I have since seen hints in the SVN Book that version 3 uses some kind of binary delta compression. But the SVN book does not describe either of these details; it's not even clear enough for me to be sure I'm not hallucinating the "binary" part. > Format 3 makes such a huge difference to data transfer size in > typical cases, as far as I recall, that it is hard to justify using > format 2 for anything. Oh, *hell* no it isn't. Have you ever written an importer-from-Subversion? Other than svnadmin load, I mean; all you have to verify about that is that it round-trips streams, which tends to avoid the problems I'm about to describe. If you had ever tried writing other stream analysis tools (I've done this twice), you would know that they're a very different use case from transport or archiving and have different tradeoffs. The bulkiness of the Version 2 stream files is a good trade for its easy parseability and eyeball-friendliness. I have a whole bunch of Subversion dumps in my regression-test suite for reposurgeon/repocutter, some collected in the wild and some hand-crafted. It would be *bad* if I couldn't sic a text editor on any of those to read or modify it. Very bad. Certainly a huge pain in the ass for me, plausibly a crash landing of I-can-no-longer-support-this severity. That worst case would leave a lot of users stuck; even if you don't care about inconveniencing me, please don't risk it for them. Plain text blobs for every revision may be fat but it's a super-stable and discoverable place in the design space, what economists call a Schelling point; in practice, great future-proofing. Deltas and compression are *not* future-proofing. Once you start playing that game, the temptation to iterate on it by improving the delta/compression pieces can't be resisted, and indeed shouldn't be - as long as there's still Version 2 dump support for people who want to evade problems like ... uh, what kind of compression are they using and how do I unpack it? How do I interpret this diff format? So please do not *ever* think of Version 2 as in any way obsolete or dispensible. It's got at least one important use case - reposurgeon, the only tool in the world that can do really lossless conversions to/from other VCSes needs it. You'd probably be able to hear the screaming from the direction of my house if you actually dropped it. But more generally, it's more future-proof and discoverable than any space optimization of it you could invent. That in itself is good enough reason to fully support it, including in svnrdump. It's a promise to your users: "This is *understandable*. No matter what wacky things we get up to to optimize the transport/archiving case, you're not screwed." > > Should I file an issue about this? > > You can certainly file an issue if there isn't one. I will do so. > If I were to have a say I would recommend anyone should rather work > on adding v3 to reposurgeon and addressing any documentation that > may be lacking. Adding v3 to the documentation should certainly be done. If I could write it I would have already, but I'm more than willing to ask picky questions of anyone who rights a draft. As for adding v3 to reposurgeon, it could be *a* solution but it's not the *right* solution. It's not the path that delivers the best guarantee to all your users of "you won't be messed over in ten years by unintended side effects of optimizations that seemed like good ideas at the time". -- <a href="http://www.catb.org/~esr/">Eric S. Raymond</a>