Hi, It is quite common to set a global statement_timeout to a few seconds (or minutes) in postgresql.conf in order to avoid hitting a production server with slow/bad queries. This value is applied to all connections in the system unless it is redefined per database, user, or explicitly changed in the connection. Pg_rewind runs quite a few queries on the primary and if statement_timeout hits one of them it breaks the whole process. I can't tell for sure if this is an unrecoverable error or not and maybe the second run of pg_rewind would be able to finish the process. Even in case if retry is possible it makes it hard to use it for reliable automation.
There are a few workarounds to this problem: 1. ALTER DATABASE postgres SET statement_timeout = 0; 2. ALTER rewind_username SET statement_timeout = 0; 3. Run export PGOPTIONS="-c statement_timeout=0" before calling pg_rwind. All of them have certain pros and cons. The third approach works good for automation, but IMHO we should simply fix pg_rewind itself and SET statement_timeout after establishing a connection, so everybody will benefit from it. Patch attached. Regards, -- Alexander Kukushkin
diff --git a/src/bin/pg_rewind/libpq_fetch.c b/src/bin/pg_rewind/libpq_fetch.c index 37eccc3126..18740c15d0 100644 --- a/src/bin/pg_rewind/libpq_fetch.c +++ b/src/bin/pg_rewind/libpq_fetch.c @@ -54,6 +54,13 @@ libpqConnect(const char *connstr) if (showprogress) pg_log_info("connected to server"); + /* We don't want our queries cancelled due to statement timeout */ + res = PQexec(conn, "SET statement_timeout = 0"); + if (PQresultStatus(res) != PGRES_COMMAND_OK) + pg_fatal("could not set up statement_timeout: %s", + PQresultErrorMessage(res)); + PQclear(res); + res = PQexec(conn, ALWAYS_SECURE_SEARCH_PATH_SQL); if (PQresultStatus(res) != PGRES_TUPLES_OK) pg_fatal("could not clear search_path: %s",