On Mon, Nov 8, 2021 at 10:59 AM Tom Lane <t...@sss.pgh.pa.us> wrote: > Robert Haas <robertmh...@gmail.com> writes: > > It turns out that these commits are causing failures on prairiedog. > > Per email from Tom off-list, that's apparently because prairiedog has > > a fussy version of tar that doesn't like it when you omit the trailing > > NUL blocks that are supposed to be part of a tar file. > > FTR, prairiedog is green. It's Noah's AIX menagerie that's complaining.
Woops. > It's actually a little bit disturbing that we're only seeing a failure > on that one platform, because that means that nothing else is anchoring > us to the strict POSIX specification for tarfile format. We knew that > GNU tar is forgiving about missing trailing zero blocks, but apparently > so is BSD tar. Yeah. > One part of me wants to add some explicit test for the trailing blocks. > Another says, well, the *de facto* tar standard seems not to require > the trailing blocks, never mind the letter of POSIX --- so when AIX > dies, will anyone care anymore? Maybe not. FWIW, I think both of those are pretty defensible positions. Honestly, I'm not sure how likely the bug is to recur once we fix it here, either. The only reason this is a problem is because of the kludge of having the server generate the entire output file except for the last 1kB. If we eliminate that behavior I don't know that this particular problem is especially likely to come back. But adding a test isn't stupid either, just a bit tricky to write. When I was testing locally this morning I found that there were considerably more than 1024 zero bytes at the end of the file because the last file it backs up is pg_control which ends with lots of zero bytes. So it's not sufficient to just write a test that checks for non-zero bytes in the last 1kB of the file. What I think you'd need to do is figure out the number of files in the archive and the sizes of each one, and based on that work out how big the tar archive should be: 512 bytes per file or directory or symlink plus enough extra 512 byte chunks to cover the contents of each file plus an extra 1024 bytes at the end. That doesn't seem particularly simple to code. We could run 'tar tvf' and parse the output to get the number of files and their lengths, but that seems likely to cause more portability headaches than the underlying issue. Since pg_basebackup now has the logic to do all of this parsing internally, we could make it complain if it receives from a v15+ server an archive trailer that is not 1024 bytes of zeroes, but that wouldn't help with this exact problem, because the issue in this case is when pg_basebackup decides it doesn't need to parse in the first place. We could add a pg_basebackup option --force-parsing-and-check-if-the-server-seems-broken, but that seems like overkill to me. So overall I'm inclined to just do nothing about this unless someone has a better idea how to write a reasonable test. Anyway, here's my proposal for fixing the issue immediately before us. 0001 adds logic to pad out the unterminated tar archives, and 0002 makes the server terminate its tar archives while preserving the logic added by 0001 for cases where we're talking to an older server. I assume that it's best to get something committed quickly here so will do that in ~4 hours if there are no major objections, or sooner if I hear some enthusiastic endorsement. -- Robert Haas EDB: http://www.enterprisedb.com
0001-Minimal-fix-for-unterminated-tar-archive-problem.patch
Description: Binary data
0002-Have-the-server-properly-terminate-tar-archives.patch
Description: Binary data