On Fri, Mar 11, 2011 at 8:29 AM, Fujii Masao <masao.fu...@gmail.com> wrote: >> I think we should consider making this change for 9.1. This is a real >> wart, and it's going to become even more of a problem with sync rep, I >> think. > > Yeah, that's a welcome! Please feel free to review the patch.
I discussed this with Heikki on IM. I think we should rip all the GUC change stuff out of this patch and just decree that if you set a timeout, you get a timeout. If you set this inconsistently with wal_receiver_status_interval, then you'll get lots of disconnects. But that's your problem. This may seem a little unfriendly, but the logic in here is quite complex and still isn't going to really provide that much protection against bad configurations. The only realistic alternative I see is to define replication_timeout as a multiple of the client's wal_receiver_status_interval, but that seems quite annoyingly unfriendly. A single replication_timeout that applies to all slaves doesn't cover every configuration someone might want, but it's simple and easy to understand and should cover 95% of cases. If we find that it's really necessary to be able to customize it further, then we might go the route of adding the much-discussed standby registration stuff, where there's a separate config file or system table where you can stipulate that when a walsender with application_name=foo connects, you want it to get wal_receiver_status_interval=$FOO. But I think that complexity can certainly wait until 9.2 or later. I also think that the default for replication_timeout should not be 0. Something like 60s seems about right. That way, if you just use the default settings, you'll get pretty sane behavior - a connectivity hiccup that lasts more than a minute will bounce the client. We've already gotten reports of people who thought they were replicating when they really weren't, and had to fiddle with settings and struggle to try to make it robust. This should make things a lot nicer for people out of the box, but it won't if it's disabled out of the box. On another note, there doesn't appear to be any need to change the return value of WaitLatchOrSocket(). -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers