Hackers, The .partial mechanism was added in de768844 to help avoid conflicts between a newly promoted primary and an old primary that might produce the same WAL segment. This works for a single promotion but can become problematic in HA configurations where there may be several promotions before a stable primary emerges.
Consider the following scenario: 1) A is the primary 2) B follows A as a standby 3) A is shutdown immediate 4) B is promoted and selects timeline 2 5) B archives 000000010000000100000001.partial 6) B archives 00000002.history 7) B goes away before archiving 000000020000000100000001 8) A is put into recovery 9) A is promoted and selects timeline 3 10) A can't archive 000000010000000100000001.partial because it already exists We recommend that archive commands not overwrite an existing segment. Some backup tools will compare the contents and succeed if they are equal, but in this case that will still often fail because recycled WAL segments will have different bytes at the end on the primary and standby. The files may not even be logically the same because B may not have received all WAL from A. After some discussion with the Patroni folks, Stephen and I came up with the idea of adding the timeline that the cluster is *promoting to* into the .partial name to avoid these sorts of conflicts. However, there is still a race condition here. Since the 000000010000000100000001.partial is archived first the 00000002.history file might not make it to the archive before B crashes. In that case A will pick timeline 2 and still be stuck. However, I'm thinking it would be easy to teach pgarch_readyXlog() to return any .history files it finds first (in order, of course). Another option would be to immediately archive the first WAL segment on timeline 2 and forgo the .partial file entirely. In this case the archiver will archive the 00000002.history file before 000000020000000100000001 and we avoid the race condition above. That also means we could recover A and promote without a conflict on the .partial. Or we could recover A along timeline 2. Or we could do some combination of the above. I have attached a patch that adds the timeline to the .partial file. This passes check-world. I think we should consider back-patching some set of these changes since this causes real pain in current production HA configurations. Thoughts? -- -David da...@pgmasters.net
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index c80b14ed97..b2e0c1abc1 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -7624,8 +7624,10 @@ StartupXLOG(void) * the completed version of the same segment later, it will fail. (We * used to do that in 9.4 and below, and it caused such problems). * - * As a compromise, we rename the last segment with the .partial - * suffix, and archive it. Archive recovery will never try to read + * As a compromise, we rename the last segment with the new timeline and + * .partial suffix, and archive it. The timeline is added in case there + * are multiple promotions from the same timeline before a stable + * primary emerges. Archive recovery will never try to read * .partial segments, so they will normally go unused. But in the odd * PITR case, the administrator can copy them manually to the pg_wal * directory (removing the suffix). They can be useful in debugging, @@ -7653,8 +7655,10 @@ StartupXLOG(void) char partialpath[MAXPGPATH]; XLogFilePath(origpath, EndOfLogTLI, endLogSegNo, wal_segment_size); - snprintf(partialfname, MAXFNAMELEN, "%s.partial", origfname); - snprintf(partialpath, MAXPGPATH, "%s.partial", origpath); + snprintf(partialfname, MAXFNAMELEN, "%s-%08X.partial", + origfname, ThisTimeLineID); + snprintf(partialpath, MAXPGPATH, "%s-%08X.partial", origpath, + ThisTimeLineID); /* * Make sure there's no .done or .ready file for the .partial