On 17 February 2017 at 07:45, Thomas Munro <thomas.mu...@enterprisedb.com> wrote: > On Fri, Feb 17, 2017 at 12:45 AM, Simon Riggs <si...@2ndquadrant.com> wrote: >> Feeling happier about this for now at least. > > Thanks!
And happier again, leading me to move to the next stage of review, focusing on the behaviour emerging from the design. So my current understanding is that this doesn't rely upon LSN arithmetic to measure lag, which is good. That means logical replication should "just work" and future mechanisms to filter physical WAL will also just work. This is important, so please comment if you see that isn't the case. I notice that LagTrackerRead() doesn't do anything to interpolate the time given, so at present any attempt to prune the lag sample buffer would result in falsely increasing the lag times reported. Which is probably the reason why you say "There may be better adaptive compaction algorithms." We need to look at this some more, an initial guess might be that we need to insert fewer samples as the buffer fills since the LagTrackerRead() algorithm is O(N) on number of samples and thus increasing the buffer itself isn't a great plan. It would be very nice to be able to say something like that the +/- confidence limits of the lag are no more than 50% of the lag time, so we have some idea of how accurate the value is at any point. We need to document the accuracy of the result, otherwise we'll be answering questions on that for some time. So lets think about that now. Given LagTrackerRead() is reading the 3 positions in order, it seems sensible to start reading the LAG_TRACKER_FLUSH_HEAD from the place you finished reading LAG_TRACKER_WRITE_HEAD etc.. Otherwise we end up doing way too much work with larger buffers. Which makes me think about the read more. The present design calculates everything on receipt of standby messages. I think we should simply record the last few messages and do the lag calculation when the values are later read, if indeed they are ever read. That would allow us a much better diagnostic view, at least. And it would allow you to show a) latest value, b) smoothed in various ways, or c) detail of last few messages for diagnostics. The latest value would be the default value in pg_stat_replication - I agree we shouldn't make that overly wide, so we'd need another function to access the details. What is critical is that we report stable values as lag increases. i.e. we need to iron out any usage cases so we don't have to fix them in PG11 and spend a year telling people "yeh, it does that" (like we've been doing for some time). So the diagnostics will help us investigate this patch over various use cases... I think what we need to show some test results with the graph of lag over time for these cases: 1. steady state - pgbench on master, so we can see how that responds 2. blocked apply on standby - so we can see how the lag increases but also how the accuracy goes down as the lag increases and whether the reported value changes (depending upon algo) 3. burst mode - where we go from not moving to moving at high speed and then stop again quickly +other use cases you or others add Does the proposed algo work for these cases? What goes wrong with it? It's the examination of these downsides, if any, are the things we need to investigate now to allow this to get committed. Some minor points on code... Why are things defined in walsender.c and not in .h? Why is LAG_TRACKER_NUM_READ_HEADS not the same as NUM_SYNC_REP_WAIT_MODE? ...and other related constants shouldn't be redefined either. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers