Greetings, * Julien Rouhaud (rjuju...@gmail.com) wrote: > On Wed, Jun 16, 2021 at 9:19 PM Stephen Frost <sfr...@snowman.net> wrote: > > This is exactly it. I don't agree that we can, or should, treat every > > sensible thing that we realize about what the archive command or the > > backup tool should be doing as some bug in our documentation that has to > > be backpatched. > > If you're serious about continuing on this path, it strikes me that the > > next step would be to go review all of the above mentioned tools, > > identify all of the things that they do and the checks that they have, > > and then craft a documentation patch to add all of those- for both > > archive command and pg_start/stop_backup. > > 1) I'm not saying that every single check that every single tools > currently does is a requirement for a safe command and/or should be > documented
That's true- you're agreeing that there's even checks beyond those that are currently implemented which should also be done. That's exactly what I was responding to. > 2) I don't think that there are thousands and thousands of > requirements, as you seem to imply You've not reviewed any of the tools which have been written and so I'm not sure what you're basing your belief on. I've done reviews of the various tools and have been rather involved in the development of one of them. I do think there's lots of requirements and it's not some static list which could be just written down once and then never touched or thought about again. Consider pg_dump- do we document everything that a logical export tool should do? That someone who wants to implement pg_dump should make sure that the tool runs around and takes out a share lock on all of the tables to be exported? No, of course we don't, because we provide a tool to do that and if people actually want to understand how it works, we point them to the source code. Had we started out with a backup tool in core, the same would be true for that. Instead, we didn't, and such tools were developed outside of core (and frankly have largely had to play catch-up to try and figure out all the things that are needed to do it well and likely always will be since they aren't part of core). > 3) I still don't understand why you think that having a partial > knowledge of what makes an archive_command safe scattered in the > source code of many third party tools is a good thing Having partial knowledge of what makes an archive_command safe in the official documentation is somehow better..? What would that lead to- other people seriously developing a backup solution for PG? No, I seriously doubt that, as those who are seriously developing such solutions couldn't trust to only what we've got documented anyway but would have to go looking through the source code and would need to develop a deep understanding of how WAL works, what happens when PG is started up to perform PITR but with archiving disabled and how that impacts what ends up being archived (hint: the server will switch timelines but won't actually archive a history file because archiving is disabled- a restart which then enables archiving will then start pushing WAL on a timeline where there's no history file; do that twice from an older backup and not you've got the same WAL files trying to be pushed into the repo which are actually on materially different timelines even though the same timeline has been chosen multiple times...), how timelines work, and all the rest. We already have partial documentation about what should go into developing an archive_command and what it's lead to are people ignoring that and instead copying the example that's explicitly called out as not sufficient. That's the actual problem that needs to be addressed here. Let's rip out the example and instead promote tools which have been written to specifically address this and which are actively maintained. If someone actually comes asking about how to develop their own backup solution for PG, we should suggest that they review the PG code related to WAL, timelines, how promotion works, etc, and probably point them at the OSS projects which already work to tackle this issue, because to develop a proper tool you need to actually understand all of that. > But what better alternative are you suggesting? Say that no ones > knows what an archive_command should do and let people put a link to > their backup solution in the hope that they will eventually converge > to a safe solution that no one will be able to validate? There are people who do know, today, what an archive command should do and we should be promoting the tools that they've developed which do, in fact, implement those checks already, at least the ones we've thought of so far. Instead, the suggestion being made here is to write a detailed design document for how to develop a backup tool (and, no, I don't agree that we can "just" focus on archive command) for PG and then to maintain it and update it and backpatch every change to it that we think of. Thanks, Stephen
signature.asc
Description: PGP signature