subject:"\[HACKERS\] Measuring replay lag"

Re: [HACKERS] Measuring replay lag

2017-03-23 Thread Craig Ringer

On 24 March 2017 at 05:39, Thomas Munro wrote: > Fujii-san for the idea of tracking write and flush lag too You mentioned wishing that logical replication would update sent lag as the decoding position. It appears to do just that already; see the references to restart_lsn in StartLogicalReplica

Re: [HACKERS] Measuring replay lag

2017-03-23 Thread Thomas Munro

On Thu, Mar 23, 2017 at 10:50 PM, Simon Riggs wrote: >> Second thoughts... I'll just make LagTrackerWrite externally >> available, so a plugin can send anything it wants to the tracker. >> Which means I'm explicitly removing the "logical replication support" >> from this patch. > > Done. > > Here'

Re: [HACKERS] Measuring replay lag

2017-03-23 Thread Thomas Munro

On Thu, Mar 23, 2017 at 10:50 PM, Simon Riggs wrote: >> Second thoughts... I'll just make LagTrackerWrite externally >> available, so a plugin can send anything it wants to the tracker. >> Which means I'm explicitly removing the "logical replication support" >> from this patch. > > Done. > > Here'

Re: [HACKERS] Measuring replay lag

2017-03-23 Thread Simon Riggs

> Second thoughts... I'll just make LagTrackerWrite externally > available, so a plugin can send anything it wants to the tracker. > Which means I'm explicitly removing the "logical replication support" > from this patch. Done. Here's the patch I'm looking to commit, with some docs and minor code

Re: [HACKERS] Measuring replay lag

2017-03-23 Thread Simon Riggs

On 23 March 2017 at 06:42, Simon Riggs wrote: > On 23 March 2017 at 01:02, Thomas Munro wrote: > >> Thanks! Please find attached v7, which includes a note we can point >> at when someone asks why it doesn't show 00:00:00, as requested. > > Thanks. > > Now I look harder the handling for logical l

Re: [HACKERS] Measuring replay lag

2017-03-22 Thread Simon Riggs

On 23 March 2017 at 01:02, Thomas Munro wrote: > Thanks! Please find attached v7, which includes a note we can point > at when someone asks why it doesn't show 00:00:00, as requested. Thanks. Now I look harder the handling for logical lag seems like it would be problematic in many cases. It's

Re: [HACKERS] Measuring replay lag

2017-03-22 Thread Thomas Munro

On Wed, Mar 15, 2017 at 8:15 PM, Ian Barwick wrote: >> 2. Recognise when the last reported write/flush/apply LSN from the >> standby == end of WAL on the sending server, and show lag times of >> 00:00:00 in all three columns. I consider this entirely bogus: it's >> not an actual measurement that

Re: [HACKERS] Measuring replay lag

2017-03-22 Thread Thomas Munro

On Thu, Mar 23, 2017 at 12:12 AM, Simon Riggs wrote: > On 22 March 2017 at 11:03, Thomas Munro wrote: > >> Hah. Apologies for the delay -- I will post a patch with >> documentation as requested within 24 hours. > > Thanks very much. I'll reserve time to commit it tomorrow, all else being > good

Re: [HACKERS] Measuring replay lag

2017-03-22 Thread Robert Haas

On Wed, Mar 22, 2017 at 6:57 AM, Simon Riggs wrote: > Not sure whether this a 6 day lag, or we should show NULL because we > are up to date. OK, that made me laugh. Thanks for putting in the effort on this patch, BTW. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Post

Re: [HACKERS] Measuring replay lag

2017-03-22 Thread Simon Riggs

On 22 March 2017 at 11:03, Thomas Munro wrote: > Hah. Apologies for the delay -- I will post a patch with > documentation as requested within 24 hours. Thanks very much. I'll reserve time to commit it tomorrow, all else being good. -- Simon Riggshttp://www.2ndQuadrant.com/ Pos

Re: [HACKERS] Measuring replay lag

2017-03-22 Thread Thomas Munro

On Wed, Mar 22, 2017 at 11:57 PM, Simon Riggs wrote: >>> I accept your proposal for how we handle these, on condition that you >>> write up some docs that explain the subtle difference between the two, >>> so we can just show people the URL. That needs to explain clearly the >>> difference in an i

Re: [HACKERS] Measuring replay lag

2017-03-22 Thread Simon Riggs

On 21 March 2017 at 17:32, David Steele wrote: > Hi Thomas, > > On 3/15/17 8:38 PM, Simon Riggs wrote: >> >> On 16 March 2017 at 08:02, Thomas Munro >> wrote: >> >>> I agree that these states exist, but we disagree on what 'lag' really >>> means, or, rather, which of several plausible definitions

Re: [HACKERS] Measuring replay lag

2017-03-21 Thread David Steele

Hi Thomas, On 3/15/17 8:38 PM, Simon Riggs wrote: On 16 March 2017 at 08:02, Thomas Munro wrote: I agree that these states exist, but we disagree on what 'lag' really means, or, rather, which of several plausible definitions would be the most useful here. My proposal is that the *_lag column

Re: [HACKERS] Measuring replay lag

2017-03-15 Thread Simon Riggs

On 16 March 2017 at 08:02, Thomas Munro wrote: > I agree that these states exist, but we disagree on what 'lag' really > means, or, rather, which of several plausible definitions would be the > most useful here. > > My proposal is that the *_lag columns should always report how long it > took for

Re: [HACKERS] Measuring replay lag

2017-03-15 Thread Thomas Munro

On Thu, Mar 16, 2017 at 12:07 PM, Simon Riggs wrote: > There are two ways of knowing the lag: 1) by measurement/sampling, > which is the main way this patch approaches this, 2) by direct > observation the LSNs match. Both are equally valid ways of > establishing knowledge. Strangely (2) is the onl

Re: [HACKERS] Measuring replay lag

2017-03-15 Thread Simon Riggs

On 14 March 2017 at 07:39, Thomas Munro wrote: > Hi, > > Please see separate replies to Simon and Craig below. > > On Sun, Mar 5, 2017 at 8:38 PM, Simon Riggs wrote: >> On 1 March 2017 at 10:47, Thomas Munro wrote: >>> I do see why a new user trying this feature for the first time might >>> expe

Re: [HACKERS] Measuring replay lag

2017-03-15 Thread Simon Riggs

On 14 March 2017 at 07:39, Thomas Munro wrote: > > On Mon, Mar 6, 2017 at 3:22 AM, Craig Ringer wrote: >> On 5 March 2017 at 15:31, Simon Riggs wrote: >>> What we want from this patch is something that works for both, as much >>> as that is possible. >> >> If it shows a sawtooth pattern for flus

Re: [HACKERS] Measuring replay lag

2017-03-15 Thread Ian Barwick

Hi Just adding a couple of thoughts on this. On 03/14/2017 08:39 AM, Thomas Munro wrote: > Hi, > > Please see separate replies to Simon and Craig below. > > On Sun, Mar 5, 2017 at 8:38 PM, Simon Riggs wrote: >> On 1 March 2017 at 10:47, Thomas Munro wrote: >>> I do see why a new user trying th

Re: [HACKERS] Measuring replay lag

2017-03-13 Thread Thomas Munro

Hi, Please see separate replies to Simon and Craig below. On Sun, Mar 5, 2017 at 8:38 PM, Simon Riggs wrote: > On 1 March 2017 at 10:47, Thomas Munro wrote: >> I do see why a new user trying this feature for the first time might >> expect it to show a lag of 0 just as soon as sent LSN = >> writ

Re: [HACKERS] Measuring replay lag

2017-03-05 Thread Craig Ringer

On 5 March 2017 at 15:31, Simon Riggs wrote: > On 1 March 2017 at 10:47, Thomas Munro wrote: >> This seems to be problematic. Logical peers report LSN changes for >> all three operations (write, flush, commit) only on commit. I suppose >> that might work OK for synchronous replication, but it

Re: [HACKERS] Measuring replay lag

2017-03-04 Thread Simon Riggs

On 1 March 2017 at 10:47, Thomas Munro wrote: >>> I added a fourth case 'overwhelm.png' which you might find >>> interesting. It's essentially like one 'burst' followed by a 100% ide >>> primary. The primary stops sending new WAL around 50 seconds in and >>> then there is no autovacuum, nothing

Re: [HACKERS] Measuring replay lag

2017-03-04 Thread Simon Riggs

On 1 March 2017 at 10:47, Thomas Munro wrote: > On Fri, Feb 24, 2017 at 9:05 AM, Simon Riggs wrote: >> On 21 February 2017 at 21:38, Thomas Munro >> wrote: >>> However, I think a call like LagTrackerWrite(SendRqstPtr, >>> GetCurrentTimestamp()) needs to go into XLogSendLogical, to mirror >>> wha

Re: [HACKERS] Measuring replay lag

2017-03-01 Thread Thomas Munro

On Fri, Feb 24, 2017 at 9:05 AM, Simon Riggs wrote: > On 21 February 2017 at 21:38, Thomas Munro > wrote: >> However, I think a call like LagTrackerWrite(SendRqstPtr, >> GetCurrentTimestamp()) needs to go into XLogSendLogical, to mirror >> what happens in XLogSendPhysical. I'm not sure about tha

Re: [HACKERS] Measuring replay lag

2017-02-23 Thread Simon Riggs

On 21 February 2017 at 21:38, Thomas Munro wrote: > On Tue, Feb 21, 2017 at 6:21 PM, Simon Riggs wrote: >> And happier again, leading me to move to the next stage of review, >> focusing on the behaviour emerging from the design. >> >> So my current understanding is that this doesn't rely upon LSN

Re: [HACKERS] Measuring replay lag

2017-02-22 Thread Thomas Munro

On Thu, Feb 23, 2017 at 11:52 AM, Thomas Munro wrote: > The overall graph looks pretty similar, but it is more likely to short > hiccups caused by occasional slow WAL fsyncs in walreceiver. See the I meant to write "more likely to *miss* short hiccups". -- Thomas Munro http://www.enterprisedb

Re: [HACKERS] Measuring replay lag

2017-02-22 Thread Thomas Munro

On Tue, Feb 21, 2017 at 6:21 PM, Simon Riggs wrote: > I think what we need to show some test results with the graph of lag > over time for these cases: > 1. steady state - pgbench on master, so we can see how that responds > 2. blocked apply on standby - so we can see how the lag increases but > a

Re: [HACKERS] Measuring replay lag

2017-02-21 Thread Thomas Munro

On Tue, Feb 21, 2017 at 6:21 PM, Simon Riggs wrote: > And happier again, leading me to move to the next stage of review, > focusing on the behaviour emerging from the design. > > So my current understanding is that this doesn't rely upon LSN > arithmetic to measure lag, which is good. That means l

Re: [HACKERS] Measuring replay lag

2017-02-21 Thread Simon Riggs

On 17 February 2017 at 07:45, Thomas Munro wrote: > On Fri, Feb 17, 2017 at 12:45 AM, Simon Riggs wrote: >> Feeling happier about this for now at least. > > Thanks! And happier again, leading me to move to the next stage of review, focusing on the behaviour emerging from the design. So my curre

Re: [HACKERS] Measuring replay lag

2017-02-16 Thread Thomas Munro

On Fri, Feb 17, 2017 at 12:45 AM, Simon Riggs wrote: > Feeling happier about this for now at least. Thanks! > I think we need to document how this works more in README or header > comments. That way I can review it against what it aims to do rather > than what I think it might do. I have added

Re: [HACKERS] Measuring replay lag

2017-02-16 Thread Thomas Munro

On Thu, Feb 16, 2017 at 11:18 PM, Abhijit Menon-Sen wrote: > Hi Thomas. > > At 2017-02-15 00:48:41 +1300, thomas.mu...@enterprisedb.com wrote: >> >> Here is a new version with the buffer on the sender side as requested. > > This looks good. Thanks for the review! >> + write_lag >> + int

Re: [HACKERS] Measuring replay lag

2017-02-16 Thread Simon Riggs

On 14 February 2017 at 11:48, Thomas Munro wrote: > On Wed, Feb 1, 2017 at 5:21 PM, Michael Paquier > wrote: >> On Sat, Jan 21, 2017 at 10:49 AM, Thomas Munro >> wrote: >>> Ok. I see that there is a new compelling reason to move the ring >>> buffer to the sender side: then I think lag tracking

Re: [HACKERS] Measuring replay lag

2017-02-16 Thread Abhijit Menon-Sen

Hi Thomas. At 2017-02-15 00:48:41 +1300, thomas.mu...@enterprisedb.com wrote: > > Here is a new version with the buffer on the sender side as requested. This looks good. > + write_lag > + interval > + Estimated time taken for recent WAL records to be written on this > + standby

Re: [HACKERS] Measuring replay lag

2017-02-14 Thread Simon Riggs

On 14 February 2017 at 11:48, Thomas Munro wrote: > Here is a new version with the buffer on the sender side as requested. Thanks, I will definitely review in good time to get this in PG10 -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DB

Re: [HACKERS] Measuring replay lag

2017-02-14 Thread Thomas Munro

On Wed, Feb 1, 2017 at 5:21 PM, Michael Paquier wrote: > On Sat, Jan 21, 2017 at 10:49 AM, Thomas Munro > wrote: >> Ok. I see that there is a new compelling reason to move the ring >> buffer to the sender side: then I think lag tracking will work >> automatically for the new logical replication

Re: [HACKERS] Measuring replay lag

2017-01-31 Thread Michael Paquier

On Sat, Jan 21, 2017 at 10:49 AM, Thomas Munro wrote: > Ok. I see that there is a new compelling reason to move the ring > buffer to the sender side: then I think lag tracking will work > automatically for the new logical replication that just landed on > master. I will try it that way. Thanks

Re: [HACKERS] Measuring replay lag

2017-01-20 Thread Thomas Munro

On Tue, Jan 17, 2017 at 7:45 PM, Fujii Masao wrote: > On Thu, Dec 22, 2016 at 6:14 AM, Thomas Munro > wrote: >> On Thu, Dec 22, 2016 at 2:14 AM, Fujii Masao wrote: >>> I agree that the capability to measure the remote_apply lag is very useful. >>> Also I want to measure the remote_write and remo

Re: [HACKERS] Measuring replay lag

2017-01-16 Thread Fujii Masao

On Thu, Dec 22, 2016 at 6:14 AM, Thomas Munro wrote: > On Thu, Dec 22, 2016 at 2:14 AM, Fujii Masao wrote: >> I agree that the capability to measure the remote_apply lag is very useful. >> Also I want to measure the remote_write and remote_flush lags, for example, >> in order to diagnose the caus

Re: [HACKERS] Measuring replay lag

2017-01-04 Thread Thomas Munro

On Thu, Jan 5, 2017 at 12:03 AM, Thomas Munro wrote: > So perhaps I should get rid of that replication_lag_sample_interval > GUC and send back apply timestamps frequently, as you were saying. It > would add up to a third more replies. Oops, of course I meant to say up to 50% more replies... --

Re: [HACKERS] Measuring replay lag

2017-01-04 Thread Thomas Munro

On Wed, Jan 4, 2017 at 8:58 PM, Simon Riggs wrote: > On 3 January 2017 at 23:22, Thomas Munro > wrote: > >>> I don't see why that would be unacceptable. If we do it for >>> remote_apply, why not also do it for other modes? Whatever the >>> reasoning was for remote_apply should work for other mod

Re: [HACKERS] Measuring replay lag

2017-01-03 Thread Simon Riggs

On 3 January 2017 at 23:22, Thomas Munro wrote: >> I don't see why that would be unacceptable. If we do it for >> remote_apply, why not also do it for other modes? Whatever the >> reasoning was for remote_apply should work for other modes. I should >> add it was originally designed to be that way

Re: [HACKERS] Measuring replay lag

2017-01-03 Thread Thomas Munro

On Wed, Jan 4, 2017 at 12:22 PM, Thomas Munro wrote: > (replay_lag - (write_lag / 2) may be a cheap proxy > for a lag time that doesn't include the return network leg, and still > doesn't introduce clock difference error) (Upon reflection it's a terrible proxy for that because of the mix of write

Re: [HACKERS] Measuring replay lag

2017-01-03 Thread Thomas Munro

On Wed, Jan 4, 2017 at 12:22 PM, Thomas Munro wrote: > The patch streams (time-right-now, end-of-wal) to the standby in every > outgoing message, and then sees how long it takes for those timestamps > to be fed back to it. Correction: we already stream (time-right-now, end-of-wal) to the standby

Re: [HACKERS] Measuring replay lag

2017-01-03 Thread Thomas Munro

On Wed, Jan 4, 2017 at 1:06 AM, Simon Riggs wrote: > On 21 December 2016 at 21:14, Thomas Munro > wrote: >> I thought about that too, but I couldn't figure out how to make the >> sampling work. If the primary is choosing (LSN, time) pairs to store >> in a buffer, and the standby is sending repli

Re: [HACKERS] Measuring replay lag

2017-01-03 Thread Simon Riggs

On 21 December 2016 at 21:14, Thomas Munro wrote: > On Thu, Dec 22, 2016 at 2:14 AM, Fujii Masao wrote: >> I agree that the capability to measure the remote_apply lag is very useful. >> Also I want to measure the remote_write and remote_flush lags, for example, >> in order to diagnose the cause o

Re: [HACKERS] Measuring replay lag

2017-01-02 Thread Thomas Munro

On Thu, Dec 29, 2016 at 1:28 AM, Thomas Munro wrote: > On Thu, Dec 22, 2016 at 10:14 AM, Thomas Munro > wrote: >> On Thu, Dec 22, 2016 at 2:14 AM, Fujii Masao wrote: >>> I agree that the capability to measure the remote_apply lag is very useful. >>> Also I want to measure the remote_write and re

Re: [HACKERS] Measuring replay lag

2016-12-28 Thread Thomas Munro

On Thu, Dec 22, 2016 at 10:14 AM, Thomas Munro wrote: > On Thu, Dec 22, 2016 at 2:14 AM, Fujii Masao wrote: >> I agree that the capability to measure the remote_apply lag is very useful. >> Also I want to measure the remote_write and remote_flush lags, for example, >> in order to diagnose the cau

Re: [HACKERS] Measuring replay lag

2016-12-21 Thread Thomas Munro

On Thu, Dec 22, 2016 at 2:14 AM, Fujii Masao wrote: > I agree that the capability to measure the remote_apply lag is very useful. > Also I want to measure the remote_write and remote_flush lags, for example, > in order to diagnose the cause of replication lag. Good idea. I will think about how t

Re: [HACKERS] Measuring replay lag

2016-12-21 Thread Fujii Masao

On Mon, Dec 19, 2016 at 8:13 PM, Thomas Munro wrote: > On Mon, Dec 19, 2016 at 4:03 PM, Peter Eisentraut > wrote: >> On 11/22/16 4:27 AM, Thomas Munro wrote: >>> Thanks very much for testing! New version attached. I will add this >>> to the next CF. >> >> I don't see it there yet. > > Thanks fo

Re: [HACKERS] Measuring replay lag

2016-12-19 Thread Thomas Munro

On Mon, Dec 19, 2016 at 10:46 PM, Simon Riggs wrote: > On 26 October 2016 at 11:34, Thomas Munro > wrote: > >> It works by taking advantage of the { time, end-of-WAL } samples that >> sending servers already include in message headers to standbys. That >> seems to provide a pretty good proxy fo

Re: [HACKERS] Measuring replay lag

2016-12-19 Thread Thomas Munro

On Mon, Dec 19, 2016 at 4:03 PM, Peter Eisentraut wrote: > On 11/22/16 4:27 AM, Thomas Munro wrote: >> Thanks very much for testing! New version attached. I will add this >> to the next CF. > > I don't see it there yet. Thanks for the reminder. Added here: https://commitfest.postgresql.org/12

Re: [HACKERS] Measuring replay lag

2016-12-19 Thread Simon Riggs

On 26 October 2016 at 11:34, Thomas Munro wrote: > It works by taking advantage of the { time, end-of-WAL } samples that > sending servers already include in message headers to standbys. That > seems to provide a pretty good proxy for when the WAL was written, if > you ignore messages where the

Re: [HACKERS] Measuring replay lag

2016-12-18 Thread Peter Eisentraut

On 11/22/16 4:27 AM, Thomas Munro wrote: > Thanks very much for testing! New version attached. I will add this > to the next CF. I don't see it there yet. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Se

Re: [HACKERS] Measuring replay lag

2016-11-22 Thread Thomas Munro

On Tue, Nov 8, 2016 at 2:35 PM, Masahiko Sawada wrote: > replay_lag_sample_interval is 1s by default but I got 1000s by SHOW command. > postgres(1:36789)=# show replay_lag_sample_interval ; > replay_lag_sample_interval > > 1000s > (1 row) Oops, fixed. >> 1. The la

Re: [HACKERS] Measuring replay lag

2016-11-07 Thread Masahiko Sawada

On Wed, Oct 26, 2016 at 7:34 PM, Thomas Munro wrote: > Hi hackers, > > Here is a new version of my patch to add a replay_lag column to the > pg_stat_replication view (originally proposed as part of a larger > patch set for 9.6[1]), like this: Thank you for working on this! > postgres=# select ap

[HACKERS] Measuring replay lag

2016-10-26 Thread Thomas Munro

Hi hackers, Here is a new version of my patch to add a replay_lag column to the pg_stat_replication view (originally proposed as part of a larger patch set for 9.6[1]), like this: postgres=# select application_name, replay_lag from pg_stat_replication; ┌──┬─┐ │ app

55 matches

Mail list logo