On Fri, Jan 20, 2012 at 1:01 PM, Steve Singer <ssinger...@sympatico.ca> wrote: > Here is my review of this verison of the patch. I think this patch has been > in every CF for 9.2 and I feel it is getting close to being committed.
Thanks for the review! > Testing Review > -------------------------------- > > I encountered this on my first replica (the one based on the master). I am > not sure if it is related to this patch, it happened after the pg_basebackup > against the replica finished. > > TRAP: FailedAssertion("!(((xid) != ((TransactionId) 0)))", File: > "twophase.c", Line: 1238) > LOG: startup process (PID 12222) was terminated by signal 6: Aborted I spent one hour to reproduce that issue, but finally I was not able to do that :( For now I have no idea what causes that issue. But basically the patch doesn't touch any codes related to that issue, so I'm guessing that it's a problem of the HEAD rather than the patch... I will spend more time to diagnose the issue. If you notice something, please let me know. > - set full page writes=off and did a checkpoint > - Started the pg_basebackup > - set full_page_writes=on and did a HUP + some database activity that might > have forced a checkpoint. > > I got this message from pg_basebackup. > ./pg_basebackup -D ../data3 -l foo -h localhost -p 5438 > pg_basebackup: could not get WAL end position from server > > I point this out because the message is different than the normal "could not > initiate base backup: FATAL: WAL generated with full_page_writes=off" thatI > normally see. I guess that's because you started pg_basebackup before checkpoint record with full_page_writes=off had been replicated and replayed to the standby. In this case, when you starts pg_basebackup, it uses the previous checkpoint record with maybe full_page_writes=on as the backup starting checkpoint, so pg_basebackup passes the check of full_page_writes at the start of backup. Then, it fails the check at the end of backup, so you got such an error message. > We might want to add a PQerrorMessage(conn)) to > pg_basebackup to print the error details. Since this patch didn't actually > change pg_basebackup I don't think your required to improve the error > messages in it. I am just mentioning this because it came up in testing. Agreed. When PQresultStatus() returns an unexpected status, basically the error message from PQerrorMessage() should be reported. But only when pg_basebackup could not get WAL end position, PQerrorMessage() was not reported... This looks like a oversight of pg_basebackup... I think that it's better to fix that as a separate patch (attached). Thought? Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c index 4007680..873ef64 100644 --- a/src/bin/pg_basebackup/pg_basebackup.c +++ b/src/bin/pg_basebackup/pg_basebackup.c @@ -914,7 +914,7 @@ BaseBackup(void) res = PQexec(conn, "IDENTIFY_SYSTEM"); if (PQresultStatus(res) != PGRES_TUPLES_OK) { - fprintf(stderr, _("%s: could not identify system: %s\n"), + fprintf(stderr, _("%s: could not identify system: %s"), progname, PQerrorMessage(conn)); disconnect_and_exit(1); } @@ -1049,8 +1049,8 @@ BaseBackup(void) res = PQgetResult(conn); if (PQresultStatus(res) != PGRES_TUPLES_OK) { - fprintf(stderr, _("%s: could not get WAL end position from server\n"), - progname); + fprintf(stderr, _("%s: could not get WAL end position from server: %s"), + progname, PQerrorMessage(conn)); disconnect_and_exit(1); } if (PQntuples(res) != 1)
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers