What you need is async replication instead of synchronous replication.
On 4/21/20 3:30 PM, Kevin Brannen wrote:
I have an unusual need: I need Pg to slow down. I know, we all want our
DB to go faster, but in this case it's speed is working against me in 1 area.
We have systems that are geo-redundant for HA, with the redundancy being
handled by DRBD to keep the disks in sync, which it does at the block
level. For normal operations, it actually works out fairly well. That
said, we recognize that what we really need to do is one of the forms of
streaming (ch 26 of the manual) which I believe would help this problem a
lot if not solve it -- but we don't have the time to do that at the
moment. I plan and hope to get there by the end of the year. The part that
hurts so bad is when we do maintenance operations that are DB heavy, like
deleting really old records out of archives (weekly), moving older records
from current tables to archive tables plus an analyze (every night),
running pg_backup (every night), other archiving (weekly), and vacuum full
to remove bloat (once a quarter). All of this generates a lot of disk
writes, to state the obvious.
The local server can handle it all just fine, but the network can't handle
it as it tries to sync to the other server. Sometimes we can add network
bandwidth, many times we can't as it depends on others. To borrow a phrase
from the current times, we need to flatten the curve. 😊
A few parts of our maintenance process I've tamed by doing "nice -20" on
the process (e.g. log rotation); but I can't really do that for Pg because
the work gets handed off to a background process that's not a direct child
process … and I don't want to slow the DB as a whole because other work is
going on (like handling incoming data).
Part of the process I've slowed down by doing the work in chunks of 10K
rows at a time with a pause between each chunk to allow the network to
catch up (instead of an entire table in 1 statement). This sort of works,
but some work/SQL is between hard to next-to-impossible to break up like
that. That also produces some hard spikes, but that's better than the
alternative (next sentence). Still, large portions of the process are hard
to control and just punch the network to full capacity and hold it there
for far too long.
So, do I have any other options to help slow down some of the Pg
operations? Or maybe some other short-term mitigations we can do with Pg
configurations? Or is this a case where we've already done all we can do
and the only answer is move to WAL streaming as fast as possible?
If it matters, this is being run on Linux servers. Pg 12.2 is in final
testing and will be rolled out to production soon -- so feel free to offer
suggestions that only apply to 12.x.
Thanks,
Kevin
This e-mail transmission, and any documents, files or previous e-mail
messages attached to it, may contain confidential information. If you are
not the intended recipient, or a person responsible for delivering it to
the intended recipient, you are hereby notified that any disclosure,
distribution, review, copy or use of any of the information contained in
or attached to this message is STRICTLY PROHIBITED. If you have received
this transmission in error, please immediately notify us by reply e-mail,
and destroy the original transmission and its attachments without reading
them or saving them to disk. Thank you.
--
Angular momentum makes the world go 'round.