On Fri, Nov 5, 2021 at 11:50 AM Robert Haas <robertmh...@gmail.com> wrote: > I went ahead and committed 0001 and 0002, but got nervous about > proceeding with 0003.
It turns out that these commits are causing failures on prairiedog. Per email from Tom off-list, that's apparently because prairiedog has a fussy version of tar that doesn't like it when you omit the trailing NUL blocks that are supposed to be part of a tar file. So how did this get broken? It turns out that in the current state of the world, the server sends an almost-tarfile to the client. What I mean by an almost-tarfile is that it sends something that looks like a valid tarfile except that the two blocks of trailing NUL bytes are omitted. Prior to these patches, that was a very strategic omission, because the pg_basebackup code wants to edit the tar files, and it wasn't smart enough to parse them, so it just received all the data from the server, then added any members that it wanted to add (e.g. recovery.signal) and then added the terminator itself. I would classify this as an ugly hack, but it worked. With these changes, the client is now capable of really parsing a tarfile, so it would have no problem injecting new files into the archive whether or not the server terminates it properly. It also has no problem adding the two blocks of terminating NUL bytes if the server omits them, but not otherwise. All in all, it's significantly smarter code. However, I also set things up so that the client doesn't bother parsing the tar file from the server if it's not doing anything that requires editing the tar file on the fly. That saves some overhead, and it's also important for the rest of the patch set, which wants to make it so that the server could send us something besides a tarfile, like maybe a .tar.gz. We can't just have a convention of adding 1024 NUL bytes to any file the server sends us unless what the server sends us is always and precisely an unterminated tarfile. Unfortunately, that means that in the case where the tar parsing logic isn't used, the tar file ends up with the proper terminator. Because most 'tar' implementations are happy to ignore that defect, the tests pass on my machine, but not on prairiedog. I think I realized this problem at some point during the development process of this patch, but then I forgot about it again and ended up committing something that has a problem of which, at some earlier point in time, I had been entirely aware. Oops. It's tempting to try to fix this problem by changing the server so that it properly terminates the tar files it sends to the client. Honestly, I don't know how we ever thought it was OK to design a protocol for base backups that involved the server sending something that is almost but not quite a valid tarfile. However, that's not quite good enough, because pg_basebackup is supposed to be backward compatible, so we'd still have the same problem if a new version of pg_basebackup were used with an old server. So what I'm inclined to do is fix both the server and pg_basebackup. On the server side, properly terminate the tarfile. On the client side, if we're talking to a pre-v15 server and don't need to parse the tarfile, blindly add 1024 NUL bytes at the end. I think I can get patches for this done today. Please let me know ASAP if you have objections to this line of attack. Thanks, -- Robert Haas EDB: http://www.enterprisedb.com