On Tue, Apr 5, 2016 at 4:17 AM, Robert Haas <robertmh...@gmail.com> wrote: > On Wed, Mar 30, 2016 at 2:22 AM, Thomas Munro > <thomas.mu...@enterprisedb.com> wrote: >> On Wed, Mar 30, 2016 at 2:36 PM, Robert Haas <robertmh...@gmail.com> wrote: >>> OK, I committed this, with a few tweaks. In particular, I added a >>> flag variable instead of relying on "latch set" == "need to send >>> reply"; the other changes were cosmetic. >>> >>> I'm not sure how much more of this we can realistically get into 9.6; >>> the latter patches haven't had much review yet. But I'll set this >>> back to Needs Review in the CommitFest and we'll see where we end up. >>> But even if we don't get anything more than this, it's still rather >>> nice: remote_apply turns out to be only slightly slower than remote >>> flush, and it's a guarantee that a lot of people are looking for. >> >> Thank you Michael and Robert! >> >> Please find attached the rest of the patch series, rebased against >> master. The goal of the 0002 patch is to provide an accurate >> indication of the current replay lag on each standby, visible to users >> like this: >> >> postgres=# select application_name, replay_lag from pg_stat_replication; >> application_name │ replay_lag >> ──────────────────┼───────────────── >> replica1 │ 00:00:00.000299 >> replica2 │ 00:00:00.000323 >> replica3 │ 00:00:00.000319 >> replica4 │ 00:00:00.000303 >> (4 rows) >> >> It works by maintaining a buffer of (end of WAL, time now) samples >> received from the primary, and then eventually feeding those times >> back to the primary when the recovery process replays the >> corresponding locations. >> >> Compared to approaches based on commit timestamps, this approach has >> the advantage of providing non-misleading information between commits. >> For example, if you run a batch load job that takes 1 minute to insert >> the whole phonebook and no other transactions run, you will see >> replay_lag updating regularly throughout that minute, whereas typical >> commit timestamp-only approaches will show an increasing lag time >> until a commit record is eventually applied. Compared to simple LSN >> location comparisons, it reports in time rather than bytes of WAL, >> which can be more meaningful for DBAs. >> >> When the standby is entirely caught up and there is no write activity, >> the reported time effectively represents the ping time between the >> servers, and is updated every wal_sender_timeout / 2, when keepalive >> messages are sent. While new WAL traffic is arriving, the walreceiver >> records timestamps at most once per second in a circular buffer, and >> then sends back replies containing the recorded timestamps as fast as >> the recovery process can apply the corresponding xlog. The lag number >> you see is computed by the primary server comparing two timestamps >> generated by its own system clock, one of which has been on a journey >> to the standby and back. >> >> Accurate lag estimates are a prerequisite for the 0004 patch (about >> which more later), but I believe users would find this valuable as a >> feature on its own. > > Well, one problem with this is that you can't put a loop inside of a > spinlock-protected critical section. > > In general, I think this is a pretty reasonable way of attacking this > problem, but I'd say it's significantly under-commented. Where should > someone go to get a general overview of this mechanism? The answer is > not "at place XXX within the patch". (I think it might merit some > more extensive documentation, too, although I'm not exactly sure what > that should look like.) > > When you overflow the buffer, you could thin in out in a smarter way, > like by throwing away every other entry instead of the oldest one. I > guess you'd need to be careful how you coded that, though, because > replaying an entry with a timestamp invalidates some of the saved > entries without formally throwing them out. > > Conceivably, 0002 could be split into two patches, one of which > computes "stupid replay lag" considering only records that naturally > carry timestamps, and a second adding the circular buffer to handle > the case where much time passes without finding such a record.
Thanks. I see a way to move that loop and change the overflow behaviour along those lines but due to other commitments I won't be able to post a well tested patch and still leave time for reviewers and committer before the looming deadline. After the freeze I will post an updated version that addresses these problems for the next CF. -- Thomas Munro http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers