On Sunday, September 16, 2012 12:14 AM Fujii Masao wrote: On Sat, Sep 15, 2012 at 4:26 PM, Amit kapila <amit.kap...@huawei.com> wrote: > On Saturday, September 15, 2012 11:27 AM Fujii Masao wrote: > On Fri, Sep 14, 2012 at 10:01 PM, Amit kapila <amit.kap...@huawei.com> wrote: >> >> On Thursday, September 13, 2012 10:57 PM Fujii Masao >> On Thu, Sep 13, 2012 at 1:22 PM, Amit Kapila <amit.kap...@huawei.com> wrote: >>> On Wednesday, September 12, 2012 10:15 PM Fujii Masao >>> On Wed, Sep 12, 2012 at 8:54 PM, <amit.kap...@huawei.com> wrote: >>>>>>> The following bug has been logged on the website: > >>>>>> I would like to implement such feature for walreceiver, but there is one >>>>>> confusion that whether to use >>>>>> same configuration parameter(replication_timeout) for walrecevier as for >>>>>> master or introduce a new >>>>>> configuration parameter (receiver_replication_timeout). >> >>>>>I like the latter. I believe some users want to set the different >>>>>timeout values, >>>>>for example, in the case where the master and standby servers are placed in >>>>>the same room, but cascaded standby is placed in other continent. >> >>>> Thank you for your suggestion. I have implemented as per your suggestion to have separate timeout parameter for walreceiver. >>>> The main changes are: >>>> 1. Introduce a new configuration parameter wal_receiver_replication_timeout for walreceiver. >>>> 2. In function WalReceiverMain(), check if there is no communication till wal_receiver_replication_timeout, exit the walreceiver. >>>> This is same as walsender functionality. >> >>>> As this is a feature, So I am uploading the attached patch in coming CommitFest. >> >>>> Suggestions/Comments? > >>> You also need to change walsender so that it periodically sends the heartbeat >>> message, like walreceiver does each wal_receiver_status_interval. Otherwise, >>> walreceiver will detect the timeout wrongly whenever there is no traffic in the >>> master. > >> Doesn't current keepalive message from walsender will suffice that need?
> No. Though the keepalive interval should be smaller than the timeout, > IIRC there is > no way to specify the keepalive interval now. To define the behavior correctly, according to me there are 2 options now: Approach-1 : Document that both(sender and receiver) the timeout parameters should be greater than wal_receiver_status_interval. If both are greater, then I think it might never timeout due to Idle. Approach-2 : Provide a variable wal_send_status_interval, such that if this is 0, then the current behavior would prevail and if its non-zero then KeepAlive message would be send maximum after that time. The modified code of WALSendLoop will be as follows: TimestampTz timeout = 0; long sleeptime = 10000; /* 10 s */ int wakeEvents; /* sleeptime should be equal to wal send interval if it is not zero otherwise default as 10 sec*/ if (wal_send_status_interval > 0) { sleeptime = wal_send_status_interval; } wakeEvents = WL_LATCH_SET | WL_POSTMASTER_DEATH | WL_SOCKET_READABLE | WL_TIMEOUT; if (pq_is_send_pending()) wakeEvents |= WL_SOCKET_WRITEABLE; else if (wal_send_status_interval > 0) { WalSndKeepalive(output_message); /* Try to flush pending output to the client */ if (pq_flush_if_writable() != 0) break; } /* Determine time until replication timeout */ if (replication_timeout > 0) { timeout = TimestampTzPlusMilliseconds(last_reply_timestamp, replication_timeout); if (wal_send_status_interval <= 0) { sleeptime = 1 + (replication_timeout / 10); } } /* Sleep until something happens or replication timeout */ WaitLatchOrSocket(&MyWalSnd->latch, wakeEvents, MyProcPort->sock, sleeptime); /* * Check for replication timeout. Note we ignore the corner case * possibility that the client replied just as we reached the * timeout ... he's supposed to reply *before* that. */ if (replication_timeout > 0 && GetCurrentTimestamp() >= timeout) { /* * Since typically expiration of replication timeout means * communication problem, we don't send the error message to * the standby. */ ereport(COMMERROR, (errmsg("terminating walsender process due to replication timeout"))); break; } } Which way you think is better or you have any other idea to handle. With Regards, Amit Kapila. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers