I'm assuming based on the "SSL error" that you have ssl set to 'on'. What's your ssl_renegotiation_limit? The default is 512MB, but setting it to 0 has solved problems for a number of people on this list, including myself.
Sherrylyn On Thu, Sep 24, 2015 at 3:57 PM, Francisco Reyes <li...@natserv.net> wrote: > Have an existing setup of 9.3 servers. Replication has been rock solid, > but recently the circuits between data centers were upgraded and > pg_basebackup now seems to fail often when setting up streaming > replication. What used to take 10+ hours now only took 68 minutes, but had > to do many retries. Many attempts fail within minutes while others go to > 90% or higher and then drop. The reason we are doing a sync is because we > have to swap data centers every so often for compliance. So I had to swap > master and slave. > > Calling pg_basebackup like this: > pg_basebackup -P -R -X s -h <HostName> -D <Folder> -U replicator > > The error we keep having is: > Sep 23 13:36:32 <HostName> postgres[16804]: [11-1] 2015-09-23 13:36:32 EDT > <IP> [unknown] replicator LOG: SSL error: bad write retry > Sep 23 13:36:32 <HostName> postgres[16804]: [12-1] 2015-09-23 13:36:32 EDT > <IP> [unknown] replicator LOG: SSL error: bad write retry > Sep 23 13:36:32 <HostName> postgres[16804]: [13-1] 2015-09-23 13:36:32 EDT > <IP> [unknown] replicator FATAL: connection to client lost > Sep 23 13:36:32 <HostName> postgres[16972]: [9-1] 2015-09-23 13:36:32 EDT > <IP> [unknown] replicator LOG: could not receive data from client: > Connection reset by peer > > I have been working with the network team and we have even been actively > monitoring the line, and running ping, as the replication is setup. At the > point the connection reset by peer error happens, we don't see any issue > with the network and ping doesn't show an issue at that point in time. > > The issue also happened on another set of machines and likewise, had to > retry many times before pg_basebackup would do the initial sync. Once the > initial sync is set, replication is fine. > > I tried both "-X s" (stream) and "-X f" (fetch) and both fail often. > > Any ideas what may be going on? > > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general >