On Thu, Mar 20, 2014 at 8:19 AM, Shaun Duncan <shaun.dun...@gmail.com>wrote:

>
> On Mar 19, 2014 5:08 PM, "Venkata Balaji Nagothi" <vbn...@gmail.com>
> wrote:
> >
> >
> > On Thu, Mar 20, 2014 at 4:12 AM, Shaun Duncan <shaun.dun...@gmail.com>
> wrote:
> >
> >> This is on a production 9.0.15 install with 1 master, 5 hot standby
> read only slaves.
> >>
> >> We've been trying to look into a situation where we're seeing that hot
> standby read slaves are receiving WAL segments, but are exceeding
> max_standby_archive_delay (60s) and max_standby_streaming_delay (60s) and
> not applying changes. The slave will get the first segment and hang (we've
> seen up to 30m before removing the slave from our read pool to catch up)
> and get further and further behind the master. Furthermore, we have seen
> that after a slave has caught up, putting it back into the read pool will
> mean will almost immediately start to see this happen again. It acts as if
> we had max_standby_* set to -1.
> >>
> >> I'm just looking for some ideas, hints, or suggestions as to what might
> be going on here or what we might be doing wrong.
> >>
> >
> > Have you noticed if the read requests on standby database are taking
> long and loading up the server ? If yes, any idea why the queries on
> standby are taking long to complete ?
> >
> > WAL replay would be paused ( until the read requests are served ) at the
> time of conflicting queries on the standby site.
> >
> > Also, Do you see any delay in master sending the WALs to standby ?
> >
> >
> > Venkata Balaji N
> >
> > Sr. Database Administrator
> > Fujitsu Australia
> >
> >
> >
> >
>
> Yes we have seen queries against the read slave degrade severely while
> this happens. We have at least optimized some inefficient queries. But I
> would think (correct me if I am wrong) that queries would be canceled after
> WAL replay has exceeded the max delay. This is something we haven't seen.
> It behaves as if it is waiting for reads to complete.
>

No. Generally queries continue to run on Standby without any issues, except
in the below 2 situations -


   - When *conflicting queries* (like DROP TABLE, DROP DATABASE etc..)  are
   executed on master whilst standby is serving the read requests. This
   behavior can be controlled to some extent by setting max_standby_*_delay
   parameters (60s in your case). If there are any read requests in process at
   standby database,the conflicting queries would wait for 60s - if the read
   requests (Generally SELECTs) do not complete, then they are forcibly
   cancelled

           In-short max_standby_*_delay parameters only effect the
replication behavior when conflicting queries are coming to standby.

   - When master is waiting to apply VACUUM related changes (removing dead
   rows) at the standby site. This behavior can be controlled to some extent
   by "vacuum_defer_cleanup_age". This implies number of transactions executed
   on the master database.


If there are no conflicting queries from primary waiting to be applied at
standby database - then queries would continue to run without any issues
until
statement_timeout ( if configured ) threshold is reached.

Do you see any messages like "ERROR: canceling statement due to conflict...
" in standby postgres logs ?

http://www.postgresql.org/docs/9.0/static/hot-standby.html#HOT-STANDBY-CONFLICT

Hope this helps !

Venkata Balaji N

Sr.Database Administrator
Fujitsu Australia

Reply via email to