On 11/13/2012 11:02 AM, Reinhard Nappert wrote:
Rich,
Do you know what the cause of this issue is?
No, I don't know.
You would expect that you saw this issue in different deployments, but
I only saw it in one instance.
If it turns out that the issue I see is identical the issue, you
mentioned, I’d like to know, when it was fixed.
Upon further investigation, this does not appear to be the same as
https://fedorahosted.org/389/ticket/374
I'm not sure what the problem is. I've seen timeouts when servers crash
or there are network issues.
Thanks,
-Reinhard
*From:*389-users-boun...@lists.fedoraproject.org
[mailto:389-users-boun...@lists.fedoraproject.org] *On Behalf Of
*Reinhard Nappert
*Sent:* Tuesday, November 13, 2012 12:22 PM
*To:* Rich Megginson; General discussion list for the 389 Directory
server project.
*Subject:* Re: [389-users] MMR issue ...
I use 1.2.8.2
*From:*Rich Megginson [mailto:rmegg...@redhat.com]
*Sent:* Tuesday, November 13, 2012 12:18 PM
*To:* General discussion list for the 389 Directory server project.
*Cc:* Reinhard Nappert
*Subject:* Re: [389-users] MMR issue ...
On 11/13/2012 09:24 AM, Reinhard Nappert wrote:
Hi,
I’ve encountered issues with a MMR setup, which looks like the following:
A ------- B
\ /
\ /
\ /
C
The replication works for approximately 24 hours. There are not many
changes to the content anyway. After about 1 day, the attribute value
of the type “nsds5replicaLastUpdateStatus” changes to “1 Can't
acquire busy replica “ of the replication agreement object from type
“nsDS5ReplicationAgreement”. I see this message on C for the
agreement “C-to-B”. The start-time of the last update is 01:08:33.
When I check the status on B, it looks fine for “B-to-C” and
“B-to-A”, however, the start-time of the last update is stuck at
01:08:36 for “B-to-C”, whereas A gets updated afterwards as well. I
don’t have the values for A!
When, I check errors and access on the boxes, I see the following:
Errors on A:
[10/Nov/2012:01:19:31 -0300] NSMMReplicationPlugin - agmt="cn=A-to-B"
(B:389): Warning: unable to receive endReplication extended operation
response (Timed out)
[10/Nov/2012:01:25:01 -0300] NSMMReplicationPlugin - agmt="cn=A-to-B"
(B:389): Unable to receive the response for a startReplication
extended operation to consumer (Can't contact LDAP server). Will retry
later.
[10/Nov/2012:01:25:05 -0300] NSMMReplicationPlugin - agmt="cn=A-to-B"
(B:389): Replication bind with SIMPLE auth resumed
[10/Nov/2012:02:26:29 -0300] NSMMReplicationPlugin - agmt="cn=A-to-B"
(B:389): Unable to receive the response for a startReplication
extended operation to consumer (Timed out). Will retry later.
[10/Nov/2012:02:31:55 -0300] NSMMReplicationPlugin - agmt="cn=A-to-B"
(B:389): Unable to receive the response for a startReplication
extended operation to consumer (Can't contact LDAP server). Will retry
later.
[10/Nov/2012:02:31:59 -0300] NSMMReplicationPlugin - agmt="cn=A-to-B"
(B:389): Replication bind with SIMPLE auth resumed
[10/Nov/2012:02:43:36 -0300] NSMMReplicationPlugin - agmt="cn=A-to-B"
(B:389): Unable to receive the response for a startReplication
extended operation to consumer (Timed out). Will retry later.
[10/Nov/2012:03:03:00 -0300] NSMMReplicationPlugin - agmt="cn=A-to-B"
(B:389): Unable to receive the response for a startReplication
extended operation to consumer (Timed out). Will retry later.
[10/Nov/2012:03:08:24 -0300] NSMMReplicationPlugin - agmt="cn=A-to-B"
(B:389): Unable to receive the response for a startReplication
extended operation to consumer (Can't contact LDAP server). Will retry
later.
[10/Nov/2012:03:11:35 -0300] slapi_ldap_bind - Error: could not send
bind request for id [cn=replication,cn=config] mech [SIMPLE]: error 91
(Can't connect to the LDAP server) -5961 (TCP connection reset by
peer.) 115 (Operation now in progress)
[10/Nov/2012:03:11:35 -0300] NSMMReplicationPlugin - agmt="cn=A-to-B"
(B:389): Replication bind with SIMPLE auth failed: LDAP error 91
(Can't connect to the LDAP server) ((null))
[10/Nov/2012:03:14:45 -0300] slapi_ldap_bind - Error: could not send
bind request for id [cn=replication,cn=config] mech [SIMPLE]: error 91
(Can't connect to the LDAP server) -5961 (TCP connection reset by
peer.) 115 (Operation now in progress)
[10/Nov/2012:03:14:52 -0300] NSMMReplicationPlugin - agmt="cn=A-to-B"
(B:389): Replication bind with SIMPLE auth resumed
[10/Nov/2012:03:33:29 -0300] slapi_ldap_bind - Error: could not send
bind request for id [cn=replication,cn=config] mech [SIMPLE]: error 91
(Can't connect to the LDAP server) -5961 (TCP connection reset by
peer.) 115 (Operation now in progress)
[10/Nov/2012:03:33:29 -0300] NSMMReplicationPlugin - agmt="cn=A-to-B"
(B:389): Replication bind with SIMPLE auth failed: LDAP error 91
(Can't connect to the LDAP server) ((null))
[10/Nov/2012:03:43:29 -0300] slapi_ldap_bind - Error: timeout after
[0.0] seconds reading bind response for [cn=replication,cn=config]
mech [SIMPLE]
[10/Nov/2012:03:43:29 -0300] NSMMReplicationPlugin - agmt="cn=A-to-B"
(B:389): Replication bind with SIMPLE auth failed: LDAP error 85
(Timed out) ((null))
[10/Nov/2012:03:46:39 -0300] slapi_ldap_bind - Error: could not send
bind request for id [cn=replication,cn=config] mech [SIMPLE]: error 91
(Can't connect to the LDAP server) -5961 (TCP connection reset by
peer.) 115 (Operation now in progress)
[10/Nov/2012:03:46:39 -0300] NSMMReplicationPlugin - agmt="cn=A-to-B"
(B:389): Replication bind with SIMPLE auth failed: LDAP error 91
(Can't connect to the LDAP server) ((null))
[10/Nov/2012:03:46:42 -0300] NSMMReplicationPlugin - agmt="cn=A-to-B"
(B:389): Replication bind with SIMPLE auth resumed
[10/Nov/2012:05:12:02 -0300] NSMMReplicationPlugin - agmt="cn=A-to-B"
(B:389): Unable to receive the response for a startReplication
extended operation to consumer (Timed out). Will retry later.
[10/Nov/2012:06:16:01 -0300] NSMMReplicationPlugin - agmt="cn=A-to-B"
(B:389): Unable to receive the response for a startReplication
extended operation to consumer (Timed out). Will retry later.
[10/Nov/2012:06:21:27 -0300] NSMMReplicationPlugin - agmt="cn=A-to-B"
(B:389): Unable to receive the response for a startReplication
extended operation to consumer (Can't contact LDAP server). Will retry
later.
[10/Nov/2012:06:21:46 -0300] NSMMReplicationPlugin - agmt="cn=A-to-B"
(B:389): Replication bind with SIMPLE auth resumed
[10/Nov/2012:06:39:23 -0300] NSMMReplicationPlugin - agmt="cn=A-to-B"
(B:389): Unable to receive the response for a startReplication
extended operation to consumer (Timed out). Will retry later.
[10/Nov/2012:07:43:45 -0300] NSMMReplicationPlugin - agmt="cn=A-to-B"
(B:389): Unable to receive the response for a startReplication
extended operation to consumer (Timed out). Will retry later.
[10/Nov/2012:19:38:22 -0300] NSMMReplicationPlugin - agmt="cn=A-to-B"
(B:389): Unable to receive the response for a startReplication
extended operation to consumer (Timed out). Will retry later.
[10/Nov/2012:19:43:51 -0300] NSMMReplicationPlugin - agmt="cn=A-to-B"
(B:389): Unable to receive the response for a startReplication
extended operation to consumer (Can't contact LDAP server). Will retry
later.
[10/Nov/2012:19:43:54 -0300] NSMMReplicationPlugin - agmt="cn=A-to-B"
(B:389): Replication bind with SIMPLE auth resumed
[10/Nov/2012:20:18:21 -0300] slapi_ldap_bind - Error: could not send
bind request for id [cn=replication,cn=config] mech [SIMPLE]: error 91
(Can't connect to the LDAP server) -5961 (TCP connection reset by
peer.) 115 (Operation now in progress)
[10/Nov/2012:20:18:21 -0300] NSMMReplicationPlugin - agmt="cn=A-to-B"
(B:389): Replication bind with SIMPLE auth failed: LDAP error 91
(Can't connect to the LDAP server) ((null))
[10/Nov/2012:20:23:21 -0300] slapi_ldap_bind - Error: could not send
bind request for id [cn=replication,cn=config] mech [SIMPLE]: error 91
(Can't connect to the LDAP server) -5961 (TCP connection reset by
peer.) 115 (Operation now in progress)
[10/Nov/2012:20:28:21 -0300] slapi_ldap_bind - Error: could not send
bind request for id [cn=replication,cn=config] mech [SIMPLE]: error 91
(Can't connect to the LDAP server) -5961 (TCP connection reset by
peer.) 115 (Operation now in progress)
[10/Nov/2012:20:33:20 -0300] slapi_ldap_bind - Error: could not send
bind request for id [cn=replication,cn=config] mech [SIMPLE]: error 91
(Can't connect to the LDAP server) -5961 (TCP connection reset by
peer.) 115 (Operation now in progress)
[10/Nov/2012:20:38:22 -0300] slapi_ldap_bind - Error: could not send
bind request for id [cn=replication,cn=config] mech [SIMPLE]: error 91
(Can't connect to the LDAP server) -5961 (TCP connection reset by
peer.) 115 (Operation now in progress)
[10/Nov/2012:20:43:22 -0300] slapi_ldap_bind - Error: could not send
bind request for id [cn=replication,cn=config] mech [SIMPLE]: error 91
(Can't connect to the LDAP server) -5961 (TCP connection reset by
peer.) 115 (Operation now in progress)
[10/Nov/2012:20:48:20 -0300] slapi_ldap_bind - Error: could not send
bind request for id [cn=replication,cn=config] mech [SIMPLE]: error 91
(Can't connect to the LDAP server) -5961 (TCP connection reset by
peer.) 115 (Operation now in progress)
[10/Nov/2012:21:00:15 -0300] slapi_ldap_bind - Error: timeout after
[0.0] seconds reading bind response for [cn=replication,cn=config]
mech [SIMPLE]
[10/Nov/2012:21:00:15 -0300] NSMMReplicationPlugin - agmt="cn=A-to-B"
(B:389): Replication bind with SIMPLE auth failed: LDAP error 85
(Timed out) ((null))
[10/Nov/2012:21:03:24 -0300] slapi_ldap_bind - Error: could not send
bind request for id [cn=replication,cn=config] mech [SIMPLE]: error 91
(Can't connect to the LDAP server) -5961 (TCP connection reset by
peer.) 115 (Operation now in progress)
[10/Nov/2012:21:03:24 -0300] NSMMReplicationPlugin - agmt="cn=A-to-B"
(B:389): Replication bind with SIMPLE auth failed: LDAP error 91
(Can't connect to the LDAP server) ((null))
[10/Nov/2012:21:06:36 -0300] slapi_ldap_bind - Error: could not send
bind request for id [cn=replication,cn=config] mech [SIMPLE]: error 91
(Can't connect to the LDAP server) -5961 (TCP connection reset by
peer.) 115 (Operation now in progress)
[10/Nov/2012:21:16:39 -0300] slapi_ldap_bind - Error: timeout after
[0.0] seconds reading bind response for [cn=replication,cn=config]
mech [SIMPLE]
[10/Nov/2012:21:16:39 -0300] NSMMReplicationPlugin - agmt="cn=A-to-B"
(B:389): Replication bind with SIMPLE auth failed: LDAP error 85
(Timed out) ((null))
[10/Nov/2012:21:24:14 -0300] NSMMReplicationPlugin - agmt="cn=A-to-B"
(B:389): Replication bind with SIMPLE auth resumed
Errors on B:
[10/Nov/2012:01:27:56 -0300] NSMMReplicationPlugin - agmt="cn=B-to-A"
(A:389): Replication bind with SIMPLE auth resumed
Errors on C:
No entry for the time
I also analyzed the access files and see the following:
From access on A:
I see replication sessions from B and C, established at 01:00:10 and
closed at 01:01:14.
Next replication session from C, established at 01:05:12 and closed at
01:06:12. There was NO session from B.
B and C establish the next replication session at 01:07:52. During the
session, a MOD is performed at 01:08:33. C closes this session at
01:09:35. The session from B stays open, and I see a ABANDOM request
from A at 01:17:53. Eventually, the session gets closed (no unbind) at
01:23:25.
[10/Nov/2012:01:17:53 -0300] conn=1439 op=3 ABANDON targetop=NOTFOUND
msgid=3
Replication sessions from C continue to work fine.
There is one more replication session from B, established at 01:27:56
and closed at 01:28:38. After this session, B does not establish any
replication session anymore.
From access on B:
I see replication sessions from A and C, established at 01:00:09 and
closed at 01:01:14.
Next replication session from C, established at 01:05:12 and closed at
01:06:12. There was NO session from A.
A and C establish the next replication session at 01:07:52. During the
session, a MOD is performed at 01:08:33. C closes this session at
01:09:35. The session from A stays open forever!!!
Replication sessions from C continue to work fine.
From access on C:
I see replication sessions from A and B, established at 01:00:09 and
closed at 01:01:14.
I don’t see any replication session from the 01:05 time-slot
A and B establish the next replication session at 01:07:53. During the
session, a MOD is performed at 01:08:33. A closes this session at
01:09:33 and B closes session at 01:09:38
From now on, I only see replication sessions from A!
Has anybody seen this kind of behavior? Any feedback is highly
appreciated.
This looks like https://fedorahosted.org/389/ticket/374
What version of 389-ds-base?
Thanks,
-Reinhard
--
389 users mailing list
389-us...@lists.fedoraproject.org <mailto:389-us...@lists.fedoraproject.org>
https://admin.fedoraproject.org/mailman/listinfo/389-users
--
389 users mailing list
389-us...@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/389-users
--
389 users mailing list
389-us...@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/389-users