On 24 March 2017 at 05:39, Thomas Munro wrote:
> Fujii-san for the idea of tracking write and flush lag too
You mentioned wishing that logical replication would update sent lag
as the decoding position.
It appears to do just that already; see the references to restart_lsn
in StartLogicalReplica
On Thu, Mar 23, 2017 at 10:50 PM, Simon Riggs wrote:
>> Second thoughts... I'll just make LagTrackerWrite externally
>> available, so a plugin can send anything it wants to the tracker.
>> Which means I'm explicitly removing the "logical replication support"
>> from this patch.
>
> Done.
>
> Here'
On Thu, Mar 23, 2017 at 10:50 PM, Simon Riggs wrote:
>> Second thoughts... I'll just make LagTrackerWrite externally
>> available, so a plugin can send anything it wants to the tracker.
>> Which means I'm explicitly removing the "logical replication support"
>> from this patch.
>
> Done.
>
> Here'
> Second thoughts... I'll just make LagTrackerWrite externally
> available, so a plugin can send anything it wants to the tracker.
> Which means I'm explicitly removing the "logical replication support"
> from this patch.
Done.
Here's the patch I'm looking to commit, with some docs and minor code
On 23 March 2017 at 06:42, Simon Riggs wrote:
> On 23 March 2017 at 01:02, Thomas Munro wrote:
>
>> Thanks! Please find attached v7, which includes a note we can point
>> at when someone asks why it doesn't show 00:00:00, as requested.
>
> Thanks.
>
> Now I look harder the handling for logical l
On 23 March 2017 at 01:02, Thomas Munro wrote:
> Thanks! Please find attached v7, which includes a note we can point
> at when someone asks why it doesn't show 00:00:00, as requested.
Thanks.
Now I look harder the handling for logical lag seems like it would be
problematic in many cases. It's
On Wed, Mar 15, 2017 at 8:15 PM, Ian Barwick
wrote:
>> 2. Recognise when the last reported write/flush/apply LSN from the
>> standby == end of WAL on the sending server, and show lag times of
>> 00:00:00 in all three columns. I consider this entirely bogus: it's
>> not an actual measurement that
On Thu, Mar 23, 2017 at 12:12 AM, Simon Riggs wrote:
> On 22 March 2017 at 11:03, Thomas Munro wrote:
>
>> Hah. Apologies for the delay -- I will post a patch with
>> documentation as requested within 24 hours.
>
> Thanks very much. I'll reserve time to commit it tomorrow, all else being
> good
On Wed, Mar 22, 2017 at 6:57 AM, Simon Riggs wrote:
> Not sure whether this a 6 day lag, or we should show NULL because we
> are up to date.
OK, that made me laugh.
Thanks for putting in the effort on this patch, BTW.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Post
On 22 March 2017 at 11:03, Thomas Munro wrote:
> Hah. Apologies for the delay -- I will post a patch with
> documentation as requested within 24 hours.
Thanks very much. I'll reserve time to commit it tomorrow, all else being good.
--
Simon Riggshttp://www.2ndQuadrant.com/
Pos
On Wed, Mar 22, 2017 at 11:57 PM, Simon Riggs wrote:
>>> I accept your proposal for how we handle these, on condition that you
>>> write up some docs that explain the subtle difference between the two,
>>> so we can just show people the URL. That needs to explain clearly the
>>> difference in an i
On 21 March 2017 at 17:32, David Steele wrote:
> Hi Thomas,
>
> On 3/15/17 8:38 PM, Simon Riggs wrote:
>>
>> On 16 March 2017 at 08:02, Thomas Munro
>> wrote:
>>
>>> I agree that these states exist, but we disagree on what 'lag' really
>>> means, or, rather, which of several plausible definitions
Hi Thomas,
On 3/15/17 8:38 PM, Simon Riggs wrote:
On 16 March 2017 at 08:02, Thomas Munro wrote:
I agree that these states exist, but we disagree on what 'lag' really
means, or, rather, which of several plausible definitions would be the
most useful here.
My proposal is that the *_lag column
On 16 March 2017 at 08:02, Thomas Munro wrote:
> I agree that these states exist, but we disagree on what 'lag' really
> means, or, rather, which of several plausible definitions would be the
> most useful here.
>
> My proposal is that the *_lag columns should always report how long it
> took for
On Thu, Mar 16, 2017 at 12:07 PM, Simon Riggs wrote:
> There are two ways of knowing the lag: 1) by measurement/sampling,
> which is the main way this patch approaches this, 2) by direct
> observation the LSNs match. Both are equally valid ways of
> establishing knowledge. Strangely (2) is the onl
On 14 March 2017 at 07:39, Thomas Munro wrote:
> Hi,
>
> Please see separate replies to Simon and Craig below.
>
> On Sun, Mar 5, 2017 at 8:38 PM, Simon Riggs wrote:
>> On 1 March 2017 at 10:47, Thomas Munro wrote:
>>> I do see why a new user trying this feature for the first time might
>>> expe
On 14 March 2017 at 07:39, Thomas Munro wrote:
>
> On Mon, Mar 6, 2017 at 3:22 AM, Craig Ringer wrote:
>> On 5 March 2017 at 15:31, Simon Riggs wrote:
>>> What we want from this patch is something that works for both, as much
>>> as that is possible.
>>
>> If it shows a sawtooth pattern for flus
Hi
Just adding a couple of thoughts on this.
On 03/14/2017 08:39 AM, Thomas Munro wrote:
> Hi,
>
> Please see separate replies to Simon and Craig below.
>
> On Sun, Mar 5, 2017 at 8:38 PM, Simon Riggs wrote:
>> On 1 March 2017 at 10:47, Thomas Munro wrote:
>>> I do see why a new user trying th
Hi,
Please see separate replies to Simon and Craig below.
On Sun, Mar 5, 2017 at 8:38 PM, Simon Riggs wrote:
> On 1 March 2017 at 10:47, Thomas Munro wrote:
>> I do see why a new user trying this feature for the first time might
>> expect it to show a lag of 0 just as soon as sent LSN =
>> writ
On 5 March 2017 at 15:31, Simon Riggs wrote:
> On 1 March 2017 at 10:47, Thomas Munro wrote:
>> This seems to be problematic. Logical peers report LSN changes for
>> all three operations (write, flush, commit) only on commit. I suppose
>> that might work OK for synchronous replication, but it
On 1 March 2017 at 10:47, Thomas Munro wrote:
>>> I added a fourth case 'overwhelm.png' which you might find
>>> interesting. It's essentially like one 'burst' followed by a 100% ide
>>> primary. The primary stops sending new WAL around 50 seconds in and
>>> then there is no autovacuum, nothing
On 1 March 2017 at 10:47, Thomas Munro wrote:
> On Fri, Feb 24, 2017 at 9:05 AM, Simon Riggs wrote:
>> On 21 February 2017 at 21:38, Thomas Munro
>> wrote:
>>> However, I think a call like LagTrackerWrite(SendRqstPtr,
>>> GetCurrentTimestamp()) needs to go into XLogSendLogical, to mirror
>>> wha
On Fri, Feb 24, 2017 at 9:05 AM, Simon Riggs wrote:
> On 21 February 2017 at 21:38, Thomas Munro
> wrote:
>> However, I think a call like LagTrackerWrite(SendRqstPtr,
>> GetCurrentTimestamp()) needs to go into XLogSendLogical, to mirror
>> what happens in XLogSendPhysical. I'm not sure about tha
On 21 February 2017 at 21:38, Thomas Munro
wrote:
> On Tue, Feb 21, 2017 at 6:21 PM, Simon Riggs wrote:
>> And happier again, leading me to move to the next stage of review,
>> focusing on the behaviour emerging from the design.
>>
>> So my current understanding is that this doesn't rely upon LSN
On Thu, Feb 23, 2017 at 11:52 AM, Thomas Munro
wrote:
> The overall graph looks pretty similar, but it is more likely to short
> hiccups caused by occasional slow WAL fsyncs in walreceiver. See the
I meant to write "more likely to *miss* short hiccups".
--
Thomas Munro
http://www.enterprisedb
On Tue, Feb 21, 2017 at 6:21 PM, Simon Riggs wrote:
> I think what we need to show some test results with the graph of lag
> over time for these cases:
> 1. steady state - pgbench on master, so we can see how that responds
> 2. blocked apply on standby - so we can see how the lag increases but
> a
On Tue, Feb 21, 2017 at 6:21 PM, Simon Riggs wrote:
> And happier again, leading me to move to the next stage of review,
> focusing on the behaviour emerging from the design.
>
> So my current understanding is that this doesn't rely upon LSN
> arithmetic to measure lag, which is good. That means l
On 17 February 2017 at 07:45, Thomas Munro
wrote:
> On Fri, Feb 17, 2017 at 12:45 AM, Simon Riggs wrote:
>> Feeling happier about this for now at least.
>
> Thanks!
And happier again, leading me to move to the next stage of review,
focusing on the behaviour emerging from the design.
So my curre
On Fri, Feb 17, 2017 at 12:45 AM, Simon Riggs wrote:
> Feeling happier about this for now at least.
Thanks!
> I think we need to document how this works more in README or header
> comments. That way I can review it against what it aims to do rather
> than what I think it might do.
I have added
On Thu, Feb 16, 2017 at 11:18 PM, Abhijit Menon-Sen
wrote:
> Hi Thomas.
>
> At 2017-02-15 00:48:41 +1300, thomas.mu...@enterprisedb.com wrote:
>>
>> Here is a new version with the buffer on the sender side as requested.
>
> This looks good.
Thanks for the review!
>> + write_lag
>> + int
On 14 February 2017 at 11:48, Thomas Munro
wrote:
> On Wed, Feb 1, 2017 at 5:21 PM, Michael Paquier
> wrote:
>> On Sat, Jan 21, 2017 at 10:49 AM, Thomas Munro
>> wrote:
>>> Ok. I see that there is a new compelling reason to move the ring
>>> buffer to the sender side: then I think lag tracking
Hi Thomas.
At 2017-02-15 00:48:41 +1300, thomas.mu...@enterprisedb.com wrote:
>
> Here is a new version with the buffer on the sender side as requested.
This looks good.
> + write_lag
> + interval
> + Estimated time taken for recent WAL records to be written on this
> + standby
On 14 February 2017 at 11:48, Thomas Munro
wrote:
> Here is a new version with the buffer on the sender side as requested.
Thanks, I will definitely review in good time to get this in PG10
--
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DB
On Wed, Feb 1, 2017 at 5:21 PM, Michael Paquier
wrote:
> On Sat, Jan 21, 2017 at 10:49 AM, Thomas Munro
> wrote:
>> Ok. I see that there is a new compelling reason to move the ring
>> buffer to the sender side: then I think lag tracking will work
>> automatically for the new logical replication
On Sat, Jan 21, 2017 at 10:49 AM, Thomas Munro
wrote:
> Ok. I see that there is a new compelling reason to move the ring
> buffer to the sender side: then I think lag tracking will work
> automatically for the new logical replication that just landed on
> master. I will try it that way. Thanks
On Tue, Jan 17, 2017 at 7:45 PM, Fujii Masao wrote:
> On Thu, Dec 22, 2016 at 6:14 AM, Thomas Munro
> wrote:
>> On Thu, Dec 22, 2016 at 2:14 AM, Fujii Masao wrote:
>>> I agree that the capability to measure the remote_apply lag is very useful.
>>> Also I want to measure the remote_write and remo
On Thu, Dec 22, 2016 at 6:14 AM, Thomas Munro
wrote:
> On Thu, Dec 22, 2016 at 2:14 AM, Fujii Masao wrote:
>> I agree that the capability to measure the remote_apply lag is very useful.
>> Also I want to measure the remote_write and remote_flush lags, for example,
>> in order to diagnose the caus
On Thu, Jan 5, 2017 at 12:03 AM, Thomas Munro
wrote:
> So perhaps I should get rid of that replication_lag_sample_interval
> GUC and send back apply timestamps frequently, as you were saying. It
> would add up to a third more replies.
Oops, of course I meant to say up to 50% more replies...
--
On Wed, Jan 4, 2017 at 8:58 PM, Simon Riggs wrote:
> On 3 January 2017 at 23:22, Thomas Munro
> wrote:
>
>>> I don't see why that would be unacceptable. If we do it for
>>> remote_apply, why not also do it for other modes? Whatever the
>>> reasoning was for remote_apply should work for other mod
On 3 January 2017 at 23:22, Thomas Munro wrote:
>> I don't see why that would be unacceptable. If we do it for
>> remote_apply, why not also do it for other modes? Whatever the
>> reasoning was for remote_apply should work for other modes. I should
>> add it was originally designed to be that way
On Wed, Jan 4, 2017 at 12:22 PM, Thomas Munro
wrote:
> (replay_lag - (write_lag / 2) may be a cheap proxy
> for a lag time that doesn't include the return network leg, and still
> doesn't introduce clock difference error)
(Upon reflection it's a terrible proxy for that because of the mix of
write
On Wed, Jan 4, 2017 at 12:22 PM, Thomas Munro
wrote:
> The patch streams (time-right-now, end-of-wal) to the standby in every
> outgoing message, and then sees how long it takes for those timestamps
> to be fed back to it.
Correction: we already stream (time-right-now, end-of-wal) to the
standby
On Wed, Jan 4, 2017 at 1:06 AM, Simon Riggs wrote:
> On 21 December 2016 at 21:14, Thomas Munro
> wrote:
>> I thought about that too, but I couldn't figure out how to make the
>> sampling work. If the primary is choosing (LSN, time) pairs to store
>> in a buffer, and the standby is sending repli
On 21 December 2016 at 21:14, Thomas Munro
wrote:
> On Thu, Dec 22, 2016 at 2:14 AM, Fujii Masao wrote:
>> I agree that the capability to measure the remote_apply lag is very useful.
>> Also I want to measure the remote_write and remote_flush lags, for example,
>> in order to diagnose the cause o
On Thu, Dec 29, 2016 at 1:28 AM, Thomas Munro
wrote:
> On Thu, Dec 22, 2016 at 10:14 AM, Thomas Munro
> wrote:
>> On Thu, Dec 22, 2016 at 2:14 AM, Fujii Masao wrote:
>>> I agree that the capability to measure the remote_apply lag is very useful.
>>> Also I want to measure the remote_write and re
On Thu, Dec 22, 2016 at 10:14 AM, Thomas Munro
wrote:
> On Thu, Dec 22, 2016 at 2:14 AM, Fujii Masao wrote:
>> I agree that the capability to measure the remote_apply lag is very useful.
>> Also I want to measure the remote_write and remote_flush lags, for example,
>> in order to diagnose the cau
On Thu, Dec 22, 2016 at 2:14 AM, Fujii Masao wrote:
> I agree that the capability to measure the remote_apply lag is very useful.
> Also I want to measure the remote_write and remote_flush lags, for example,
> in order to diagnose the cause of replication lag.
Good idea. I will think about how t
On Mon, Dec 19, 2016 at 8:13 PM, Thomas Munro
wrote:
> On Mon, Dec 19, 2016 at 4:03 PM, Peter Eisentraut
> wrote:
>> On 11/22/16 4:27 AM, Thomas Munro wrote:
>>> Thanks very much for testing! New version attached. I will add this
>>> to the next CF.
>>
>> I don't see it there yet.
>
> Thanks fo
On Mon, Dec 19, 2016 at 10:46 PM, Simon Riggs wrote:
> On 26 October 2016 at 11:34, Thomas Munro
> wrote:
>
>> It works by taking advantage of the { time, end-of-WAL } samples that
>> sending servers already include in message headers to standbys. That
>> seems to provide a pretty good proxy fo
On Mon, Dec 19, 2016 at 4:03 PM, Peter Eisentraut
wrote:
> On 11/22/16 4:27 AM, Thomas Munro wrote:
>> Thanks very much for testing! New version attached. I will add this
>> to the next CF.
>
> I don't see it there yet.
Thanks for the reminder. Added here: https://commitfest.postgresql.org/12
On 26 October 2016 at 11:34, Thomas Munro wrote:
> It works by taking advantage of the { time, end-of-WAL } samples that
> sending servers already include in message headers to standbys. That
> seems to provide a pretty good proxy for when the WAL was written, if
> you ignore messages where the
On 11/22/16 4:27 AM, Thomas Munro wrote:
> Thanks very much for testing! New version attached. I will add this
> to the next CF.
I don't see it there yet.
--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Se
On Tue, Nov 8, 2016 at 2:35 PM, Masahiko Sawada wrote:
> replay_lag_sample_interval is 1s by default but I got 1000s by SHOW command.
> postgres(1:36789)=# show replay_lag_sample_interval ;
> replay_lag_sample_interval
>
> 1000s
> (1 row)
Oops, fixed.
>> 1. The la
On Wed, Oct 26, 2016 at 7:34 PM, Thomas Munro
wrote:
> Hi hackers,
>
> Here is a new version of my patch to add a replay_lag column to the
> pg_stat_replication view (originally proposed as part of a larger
> patch set for 9.6[1]), like this:
Thank you for working on this!
> postgres=# select ap
Hi hackers,
Here is a new version of my patch to add a replay_lag column to the
pg_stat_replication view (originally proposed as part of a larger
patch set for 9.6[1]), like this:
postgres=# select application_name, replay_lag from pg_stat_replication;
┌──┬─┐
│ app
55 matches
Mail list logo