Dear Nathan,

Thank you for making the patch! I tested your patch, and it basically worked 
well.
About following part:

```
                        ConfigReloadPending = false;
                        ProcessConfigFile(PGC_SIGHUP);
+                       now = GetCurrentTimestamp();
+                       for (int i = 0; i < NUM_LRW_WAKEUPS; i++)
+                               LogRepWorkerComputeNextWakeup(i, now);
+
+                       /*
+                        * If a wakeup time for starting sync workers was set, 
just set it
+                        * to right now.  It will be recalculated as needed.
+                        */
+                       if (next_sync_start != PG_INT64_MAX)
+                               next_sync_start = now;
                }
```

Do we have to recalculate the NextWakeup when subscriber receives SIGHUP signal?
I think this may cause the unexpected change like following.

Assuming that wal_receiver_timeout is 60s, and wal_sender_timeout on publisher 
is
0s (or the network between nodes is disconnected).
And we send SIGHUP signal per 20s to subscriber's postmaster.

Currently the last_recv_time is calcurated when the worker accepts messages,
and the value is used for deciding to send a ping. The worker will exit if the
walsender does not reply.

But in your patch, the apply worker calcurates wakeup[LRW_WAKEUP_PING] and
wakeup[LRW_WAKEUP_TERMINATE] again when it gets SIGHUP, so the worker never 
sends
ping with requestReply = true, and never exits due to the timeout.

My case seems to be crazy, but there may be another issues if it remains.


Best Regards,
Hayato Kuroda
FUJITSU LIMITED



Reply via email to