Re: Syncrep and improving latency due to WAL throttling

Andres Freund Fri, 27 Jan 2023 13:34:20 -0800

Hi,

On 2023-01-27 21:45:16 +0100, Tomas Vondra wrote:
> On 1/27/23 08:18, Bharath Rupireddy wrote:
> >> I think my idea of only forcing to flush/wait an LSN some distance in the 
> >> past
> >> would automatically achieve that?
> > 
> > I'm sorry, I couldn't get your point, can you please explain it a bit more?
> > 
> 
> The idea is that we would not flush the exact current LSN, because
> that's likely somewhere in the page, and we always write the whole page
> which leads to write amplification.
> 
> But if we backed off a bit, and wrote e.g. to the last page boundary,
> that wouldn't have this issue (either the page was already flushed -
> noop, or we'd have to flush it anyway).


Yep.


> We could even back off a bit more, to increase the probability it was
> actually flushed / sent to standby.

That's not the sole goal, from my end: I'd like to avoid writing out +
flushing the WAL in too small chunks.  Imagine a few concurrent vacuums or
COPYs or such - if we're unlucky they'd each end up exceeding their "private"
limit close to each other, leading to a number of small writes of the
WAL. Which could end up increasing local commit latency / iops.

If we instead decide to only ever flush up to something like
  last_page_boundary - 1/8 * throttle_pages * XLOG_BLCKSZ

we'd make sure that the throttling mechanism won't cause a lot of small
writes.


> > Keeping replication lag under check enables one to provide a better
> > RPO guarantee as discussed in the other thread
> > https://www.postgresql.org/message-id/CAHg%2BQDcO_zhgBCMn5SosvhuuCoJ1vKmLjnVuqUEOd4S73B1urw%40mail.gmail.com.
> > 
> 
> Isn't that a bit over-complicated? RPO generally only cares about xacts
> that committed (because that's what you want to not lose), so why not to
> simply introduce a "sync mode" that simply uses a bit older LSN when
> waiting for the replica? Seems much simpler and similar to what we
> already do.

I don't think that really helps you that much. If there's e.g. a huge VACUUM /
COPY emitting loads of WAL you'll suddenly see commit latency of a
concurrently committing transactions spike into oblivion. Whereas a general
WAL throttling mechanism would throttle the VACUUM, without impacting the
commit latency of normal transactions.

Greetings,

Andres Freund

Re: Syncrep and improving latency due to WAL throttling

Reply via email to