Guillem Jover writes ("Re: Idea: rsync-based source format"): > On Fri, 2015-08-21 at 16:32:09 +0100, Ian Jackson wrote: > > (I spoke to Guillem about this at Debconf and promised to write it up > > so he could think about it properly at his leisure.) > > [ Checked this over DebConf, but then I could not find you on the > venue anymore. :) ]
Sorry... Thanks for taking the time to read it. I left the venue on Saturday night (at about 4am...) and didn't come back on Sunday. > I did some preliminary quick pondering and got some concerns, and I think > perhaps a workable alternative solution that might cover your needs? ... > Actually you should be able to represent at least these with a git format > patch, which are already supported by the latest patch program (its only > current limitation AFAIR is binary file deltas), and which is required by > dpkg-dev to be able to properly handle them at extraction time. I did think about this, but: I absolutely want binary file deltas too. And the use of patch files with these kind of features is very new, so I'm not sure I want to trust it. Also, it would make it hard to backport support for the new format, which is definitely something we would want to do. > > It would be more like a successor to 1.0 with diff, than 3.0 (quilt) > > is, in that it wouldn't represent a patch stack, merely a tree. > > (From the code PoV, and from the properties you describe it would > probably be more a successor of 2.0 than 1.0, but sure.) Heh. I don't really care what we call it ... > > It also contains an rsync batchfile P_V-R.rsync.Z. > > This is what triggers my concerns. I was not aware of rsync batchfiles! Many people aren't. > So I took a quick look at the man page only (I've not dug further), > and I've got the impression this might not be a good format for long > term storage, given that it seems to rely on the rsync protocol itself When building the batchfile, dpkg-source would specify the protocol version to use. I imagine we would fix it at 28 or 30. > (it is already at version 28; does the program remove support for > ancient protocol versions for example)? This is a reasonable question. I think that it would be a good idea to talk to rsync upstream before using rsync batchfiles as an archival format for long term (decades) storage. However: According to the rsync OLDNEWS file, protocol version 28 was released in 2.6.1 in April 2004. The minimum version supported by sid's rsync is protocol version 20, from April 1999. According to the manual the batchfile format changed in 2.6.3 (Sep 2004), but (according to the OLDNEWS) at that stage batch mode was still experimental. (The OLDNEWS file doesn't seem to clearly say when batchmode became non-experimental.) Looking at OLDNEWS I think we would probably want require rsync >= 2.6.6 (Jul 2005), because we would need the --only-write-batch option that was introduced the. Overall, rsync has an absolutely stellar record for reliability, stability, and compatibility. Many many people have been using it for many years. I think it almost inconceivable that rsync would deprecate an old protocol version on a timescale that would be a problem for Debian releases. If they did, you would also find that you couldn't do normal (non-batch) rsync between the relevant versions of Debian, either. > It also ties the implentation of the format to the rsync tool, > because I assume we'd not want to reimplement it ourselves(?), and > keep in sync with upstream over time. And as such it would require > pulling rsync into the build-essential set practically forever, > because once there are such source packages around dpkg-source > should be able to at least extract them (well it could get demoted > to Recommends in case we switched to something else). I don't see that adding rsync to the build-essential set is a problem. rsync is extremely portable and has very limited build-dependencies. libacl and libattr are surely already in the needed-for-essential set, let alone needed-for-build-essential. I'm not sure whether libpopt is already in the needed-for-boostrap-to-build-essential set, but its only build dependencies are debhelper, dh-autoreconf, and gettext. > I'd recommend looking into git format patches, which should be a > stable interchange format, are already supported by our dpkg tools > (although by delegating the work to GNU patch), and should be able > to represent the changes you mentioned before. Not sure if they would > take more space, although I'd assume that should not make much of a > difference once compressed with something like xz. I think the difference between our perspectives is entirely due to our different view of rsync. Perhaps you just haven't got as much value out of rsync as I have. I find it difficult to say how awesome I have found rsync to be. It is software of extraordinary quality. > In case we'd still wanted for whatever reason to distinguish this new > format from a quilt one, I guess we could always add a new one such as > «3.0 (delta)» or similar. Yes. > Or would that not work for you for some reason or I've missed something > very obvious? Well, I am wary of the new patch features. They aren't widely used. While patch is a good program with a reasonable history, it does not have rsync's excellent record. rsync's ability to reproduce an identical tree, via an rsync protocolstream, is tested and verified very frequently on wide range of trees by people around the world. Many people rely utterly on rsync for on their backups, without even performing a verification step - and, while not ideal, this is not even very foolish! I _do_ verify that the backup client tree and the backed up data are identical and I have found two harmless and extremely obscure bugs in a decade and a half (FTR, bugs in --link-dest, which wouldn't affect dpkg-source's use of rsync). rsync batchmode is not tested to the same degree, but it uses the same protocol stream infrastructure, so the complex code paths we would be using are the same ones as everyone else is using. Ultimately, if you're worried about format stability and software quality, I would suggest that picking a ten-year-old rsync feature is a better idea than a brand new (or maybe not even implemented yet) patch feature. Thanks, Ian.