Re: [HACKERS] Re: Hot Standby query cancellation and Streaming Replication integration

Greg Smith Mon, 01 Mar 2010 21:50:52 -0800

Bruce Momjian wrote:

Joachim Wieland wrote:

1) With the current implementation they will see better performance on
the master and more aggressive vacuum (!), since they have less
long-running queries now on the master and autovacuum can kick in and
clean up with less delay than before. On the other hand their queries
on the standby might fail and they will start thinking that this HS+SR
feature is not as convincing as they thought it was...


I assumed they would set max_standby_delay = -1 and be happy.

The admin in this situation might be happy until the first time theprimary fails and a failover is forced, at which point there is anunbounded amount of recovery data to apply that was stuck waiting behindwhatever long-running queries were active. I don't know if you've everwatched what happens to a pre-8.2 cold standby when you start it up withhundreds or thousands of backed up WAL files to process before theserver can start, but it's not a fast process. I watched a production8.1 standby get >4000 files behind once due to an archive_command bug,and it's not something I'd like to ever chew my nails off to again. Ifyour goal was HA and you're trying to bring up the standby, the serveris down the whole time that's going on.

This is why no admin who prioritizes HA would consider'max_standby_delay = -1' a reasonable setting, and those are the sort ofusers Joachim's example was discussing. Only takes one rogue query thatruns for a long time to make the standby so far behind it's useless forHA purposes. And you also have to ask yourself "if recovery is haltedwhile waiting for this query to run, how stale is the data on thestandby getting?". That's true for any large setting for thisparameter, but using -1 for the unlimited setting also gives the maximumpossible potential for such staleness.

'max_standby_delay = -1' is really only a reasonable idea if you areabsolutely certain all queries are going to be short, which we can'tdismiss as an unfounded use case so it has value. I would expect youhave to also combine it with a matching reasonable statement_timeout toenforce that expectation to make that situation safer.

In any of the "offload batch queries to the failover standby"situations, it's unlikely an unlimited value for this setting will bepractical. Perhaps you set max_standby_delay to some number of hours,to match your expected worst-case query run time and reduce the chanceof cancellation. Not putting a limit on it at all is a situation no DBAwith healthy paranoia is going to be happy with the potential downsideof in a HA environment, given that both unbounded staleness and recoverytime are then both possible. The potential of a failed long-runningquery is much less risky than either of those.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us

Re: [HACKERS] Re: Hot Standby query cancellation and Streaming Replication integration

Reply via email to