On Thu, Mar 20, 2014 at 8:19 AM, Shaun Duncan <shaun.dun...@gmail.com>wrote:
> > On Mar 19, 2014 5:08 PM, "Venkata Balaji Nagothi" <vbn...@gmail.com> > wrote: > > > > > > On Thu, Mar 20, 2014 at 4:12 AM, Shaun Duncan <shaun.dun...@gmail.com> > wrote: > > > >> This is on a production 9.0.15 install with 1 master, 5 hot standby > read only slaves. > >> > >> We've been trying to look into a situation where we're seeing that hot > standby read slaves are receiving WAL segments, but are exceeding > max_standby_archive_delay (60s) and max_standby_streaming_delay (60s) and > not applying changes. The slave will get the first segment and hang (we've > seen up to 30m before removing the slave from our read pool to catch up) > and get further and further behind the master. Furthermore, we have seen > that after a slave has caught up, putting it back into the read pool will > mean will almost immediately start to see this happen again. It acts as if > we had max_standby_* set to -1. > >> > >> I'm just looking for some ideas, hints, or suggestions as to what might > be going on here or what we might be doing wrong. > >> > > > > Have you noticed if the read requests on standby database are taking > long and loading up the server ? If yes, any idea why the queries on > standby are taking long to complete ? > > > > WAL replay would be paused ( until the read requests are served ) at the > time of conflicting queries on the standby site. > > > > Also, Do you see any delay in master sending the WALs to standby ? > > > > > > Venkata Balaji N > > > > Sr. Database Administrator > > Fujitsu Australia > > > > > > > > > > Yes we have seen queries against the read slave degrade severely while > this happens. We have at least optimized some inefficient queries. But I > would think (correct me if I am wrong) that queries would be canceled after > WAL replay has exceeded the max delay. This is something we haven't seen. > It behaves as if it is waiting for reads to complete. > No. Generally queries continue to run on Standby without any issues, except in the below 2 situations - - When *conflicting queries* (like DROP TABLE, DROP DATABASE etc..) are executed on master whilst standby is serving the read requests. This behavior can be controlled to some extent by setting max_standby_*_delay parameters (60s in your case). If there are any read requests in process at standby database,the conflicting queries would wait for 60s - if the read requests (Generally SELECTs) do not complete, then they are forcibly cancelled In-short max_standby_*_delay parameters only effect the replication behavior when conflicting queries are coming to standby. - When master is waiting to apply VACUUM related changes (removing dead rows) at the standby site. This behavior can be controlled to some extent by "vacuum_defer_cleanup_age". This implies number of transactions executed on the master database. If there are no conflicting queries from primary waiting to be applied at standby database - then queries would continue to run without any issues until statement_timeout ( if configured ) threshold is reached. Do you see any messages like "ERROR: canceling statement due to conflict... " in standby postgres logs ? http://www.postgresql.org/docs/9.0/static/hot-standby.html#HOT-STANDBY-CONFLICT Hope this helps ! Venkata Balaji N Sr.Database Administrator Fujitsu Australia