On Thu, Apr 21, 2011 at 12:18 PM, Robert Haas <robertmh...@gmail.com> wrote: > On Wed, Apr 20, 2011 at 11:15 AM, Tom Lane <t...@sss.pgh.pa.us> wrote: >> Robert Haas <robertmh...@gmail.com> writes: >>> I am a bit concerned about the reliability of this approach. If there >>> is some network lag, or some lag in processing from the master, we >>> could easily get the idea that there is time skew between the machines >>> when there really isn't. And our perception of the time skew could >>> easily bounce around from message to message, as the lag varies. I >>> think it would be tremendously ironic of the two machines were >>> actually synchronized to the microsecond, but by trying to be clever >>> about it we managed to make the lag-time accurate only to within >>> several seconds. >> >> Well, if walreceiver concludes that there is no more than a few seconds' >> difference between the clocks, it'd probably be OK to take the master >> timestamps at face value. The problem comes when the skew gets large >> (compared to the configured time delay, I guess). > > I suppose. Any bound on how much lag there can be before we start > applying to skew correction is going to be fairly arbitrary.
When the replication connection is terminated, the standby tries to read WAL files from the archive. In this case, there is no walreceiver process, so how does the standby calculate the clock difference? > errmsg("parameter \"%s\" requires a temporal value", "recovery_time_delay"), We should s/"a temporal"/"an Integer"? After we run "pg_ctl promote", time-delayed replication should be disabled? Otherwise, failover might take very long time when we set recovery_time_delay to high value. http://forge.mysql.com/worklog/task.php?id=344 According to the above page, one purpose of time-delayed replication is to protect against user mistakes on master. But, when an user notices his wrong operation on master, what should he do next? The WAL records of his wrong operation might have already arrived at the standby, so neither "promote" nor "restart" doesn't cancel that wrong operation. Instead, probably he should shutdown the standby, investigate the timestamp of XID of the operation he'd like to cancel, set recovery_target_time and restart the standby. Something like this procedures should be documented? Or, we should implement new "promote" mode which finishes a recovery as soon as "promote" is requested (i.e., not replay all the available WAL records)? Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers