<e...@thyrsus.com> wrote:
>Julian Foad <julianf...@apache.org>:
>> Isn't format 3 documented in the section called "Version 3 format" in 
>> dump-load-format.txt?
>> 
>It appears to have slipped your mind that the person who wrote most of
>that documentation was *me*. And I do still update it occasionally,
>most recently about a month ago.  If it held the information I needed
>I'd already know it and not be bothering this list.
>
>Since a dev as senior as you has forgotten this, I have to assume the
>rest of the list has also forgotten or never knew how
>dump-load-format.txt came to exist.  So, a reminder: have patience as
>I describe it for the list because this bears directly on why "just
>use Version 3" is insufficient and the (possibly unintended)
>implication that Version 2 might be retired someday is feeply
>disturbing.
>
>I had a very specific motivation for documenting the dump format -
>reposurgeon comsumed and generated Subversion dumps, and I know it's a
>fragile and dangerous thing when the assumptions of that kind of
>reader code are only documented in the code itself.  It is much better
>practice to write a ground-truth document about the format (or the
>parts one uses, anyway) and then have that be the authority for the
>code.  For another example of the practice, see
>
>https://gpsd.gitlab.io/gpsd/AIVDM.html
>
>You have dump-load-format.txt because, having written it, I thought it
>was silly for it not to be flying with the Subversion distribution, so
>I combined it with some historical notes on the old version 1 format,
>and voila.
>
>
>But. The Version 2 documentation I wrote for Subversion is incomplete,
>because there were details I could neither find in pre-existing
>documentation nor easily discover at the time I wrote the bulk of it
>in 2012. And once I made reposurgeon able to read and emit version 2
>dumps, digging deep enough to find out what version 3 was doing never
>made it far enough up my priority list that I actually did it.
>
>Notably: dump-load-format.txt does not describe the delta format.  I
>have since seen hints in the SVN Book that version 3 uses some kind of
>binary delta compression.  But the SVN book does not describe either
>of these details; it's not even clear enough for me to be sure I'm
>not hallucinating the "binary" part.
>
>> Format 3 makes such a huge difference to data transfer size in
>> typical cases, as far as I recall, that it is hard to justify using
>> format 2 for anything.
>
>Oh, *hell* no it isn't.
>
>Have you ever written an importer-from-Subversion?  Other than
>svnadmin load, I mean; all you have to verify about that is that it
>round-trips streams, which tends to avoid the problems I'm about to
>describe.
>
>If you had ever tried writing other stream analysis tools (I've done
>this twice), you would know that they're a very different use case
>from transport or archiving and have different tradeoffs. The
>bulkiness of the Version 2 stream files is a good trade for its easy
>parseability and eyeball-friendliness.
>
>I have a whole bunch of Subversion dumps in my regression-test suite
>for reposurgeon/repocutter, some collected in the wild and some
>hand-crafted.  It would be *bad* if I couldn't sic a text editor on
>any of those to read or modify it. Very bad.  Certainly a huge pain in
>the ass for me, plausibly a crash landing of I-can-no-longer-support-this
>severity.  That worst case would leave a lot of users stuck; even if you
>don't care about inconveniencing me, please don't risk it for them.
>
>Plain text blobs for every revision may be fat but it's a super-stable
>and discoverable place in the design space, what economists call a
>Schelling point; in practice, great future-proofing.  Deltas and
>compression are *not* future-proofing.  Once you start playing that
>game, the temptation to iterate on it by improving the
>delta/compression pieces can't be resisted, and indeed shouldn't be -
>as long as there's still Version 2 dump support for people who want to
>evade problems like ... uh, what kind of compression are they using
>and how do I unpack it?  How do I interpret this diff format?
>
>
>So please do not *ever* think of Version 2 as in any way obsolete or
>dispensible.  It's got at least one important use case - reposurgeon,
>the only tool in the world that can do really lossless conversions to/from
>other VCSes needs it. You'd probably be able to hear the screaming from
>the direction of my house if you actually dropped it.
>
>But more generally, it's more future-proof and discoverable than any
>space optimization of it you could invent.  That in itself is good
>enough reason to fully support it, including in svnrdump.  It's a
>promise to your users: "This is *understandable*.  No matter what
>wacky things we get up to to optimize the transport/archiving case,
>you're not screwed."
>
>> > Should I file an issue about this?
>> 
>> You can certainly file an issue if there isn't one.
>
>I will do so.
>
>> If I were to have a say I would recommend anyone should rather work
>> on adding v3 to reposurgeon and addressing any documentation that
>> may be lacking.
>
>Adding v3 to the documentation should certainly be done.  If I could
>write it I would have already, but I'm more than willing to ask picky
>questions of anyone who rights a draft.
>
>As for adding v3 to reposurgeon, it could be *a* solution but it's not
>the *right* solution.  It's not the path that delivers the best
>guarantee to all your users of "you won't be messed over in ten years
>by unintended side effects of optimizations that seemed like good
>ideas at the time".

Thanks for your detailed reply, Eric. I can accept your argument of value of v2 
dump format for its simplicity for these purposes. I am very well aware you 
wrote much of the spec doc and worked a lot with it, especially figuring out 
the semantics; and that's why, seeing its "v3" section, I was puzzled when you 
wrote simply "it's not documented". I even noticed the language in that section 
seemed to match your style but didn't have time to go digging with "svn blame". 
Thanks for explaining what's missing.

I haven't tried writing an importer.

- Julian

Reply via email to