On Wed, Aug 31, 2016 at 8:07 AM, Chinh Nguyen <chinh...@gmail.com> wrote:
> Hello All, > > We recently tried to reindex on a heavy used database cluster (master > + multiple hot-standby async replicas, all taking traffic) and > replicas were effectively blocked for 10 minutes resulting in drop of > a lot of read traffic. We reindex by create new index concurrently, > then rename old and new index and drop old index afterwards. > After we execute the renaming on master, we found a quick burst of > exclusive locks as expected: > > Process 34482 waits for AccessShareLock on relation 4153657 of > database 16420; blocked by process 18953. > Process 18953: ALTER INDEX public.old_index RENAME TO temp_index; > > But all hot-standby replicas started to see queries locking up for 10 > minutes, sample log from a replica is below: > > LOG: process 41040 still waiting for AccessShareLock on relation > 4153657 of database 16420 after 1000.072 ms > > We have in config: > > max_standby_streaming_delay = 10min > lock_timeout = 10s > > Any idea what happened with the replicas? Why queries on replica > locked for so long vs sub-second on master? > I think, it is not the lock release wait time, it is the delay in WAL replay on standby from master due to the parameter "max_standby_streaming_delay". If you have read queries running on standby, WAL replay waits for "max_standby_streaming_delay" duration (which is 10 mins in your case) before timing out queries on replica. Regards, Venkata B N Fujitsu Australia