Martin Kosek wrote:
On 07/05/2012 08:39 PM, Rob Crittenden wrote:
Martin Kosek wrote:
On 07/03/2012 04:41 PM, Rob Crittenden wrote:
Deleting a replica can leave a replication vector (RUV) on the other servers.
This can confuse things if the replica is re-added, and it also causes the
server to calculate changes against a server that may no longer exist.
389-ds-base provides a new task that self-propogates itself to all available
replicas to clean this RUV data.
This patch will create this task at deletion time to hopefully clean things up.
It isn't perfect. If any replica is down or unavailable at the time the
cleanruv task fires, and then comes back up, the old RUV data may be
re-propogated around.
To make things easier in this case I've added two new commands to
ipa-replica-manage. The first lists the replication ids of all the servers we
have a RUV for. Using this you can call clean_ruv with the replication id of a
server that no longer exists to try the cleanallruv step again.
This is quite dangerous though. If you run cleanruv against a replica id that
does exist it can cause a loss of data. I believe I've put in enough scary
warnings about this.
rob
Good work there, this should make cleaning RUVs much easier than with the
previous version.
This is what I found during review:
1) list_ruv and clean_ruv command help in man is quite lost. I think it would
help if we for example have all info for commands indented. This way user could
simply over-look the new commands in the man page.
2) I would rename new commands to clean-ruv and list-ruv to make them
consistent with the rest of the commands (re-initialize, force-sync).
3) It would be nice to be able to run clean_ruv command in an unattended way
(for better testing), i.e. respect --force option as we already do for
ipa-replica-manage del. This fix would aid test automation in the future.
4) (minor) The new question (and the del too) does not react too well for
CTRL+D:
# ipa-replica-manage clean_ruv 3 --force
Clean the Replication Update Vector for vm-055.idm.lab.bos.redhat.com:389
Cleaning the wrong replica ID will cause that server to no
longer replicate so it may miss updates while the process
is running. It would need to be re-initialized to maintain
consistency. Be very careful.
Continue to clean? [no]: unexpected error:
5) Help for clean_ruv command without a required parameter is quite confusing
as it reports that command is wrong and not the parameter:
# ipa-replica-manage clean_ruv
Usage: ipa-replica-manage [options]
ipa-replica-manage: error: must provide a command [clean_ruv | force-sync |
disconnect | connect | del | re-initialize | list | list_ruv]
It seems you just forgot to specify the error message in the command definition
6) When the remote replica is down, the clean_ruv command fails with an
unexpected error:
[root@vm-086 ~]# ipa-replica-manage clean_ruv 5
Clean the Replication Update Vector for vm-055.idm.lab.bos.redhat.com:389
Cleaning the wrong replica ID will cause that server to no
longer replicate so it may miss updates while the process
is running. It would need to be re-initialized to maintain
consistency. Be very careful.
Continue to clean? [no]: y
unexpected error: {'desc': 'Operations error'}
/var/log/dirsrv/slapd-IDM-LAB-BOS-REDHAT-COM/errors:
[04/Jul/2012:06:28:16 -0400] NSMMReplicationPlugin - cleanAllRUV_task: failed
to connect to repl agreement connection
(cn=meTovm-055.idm.lab.bos.redhat.com,cn=replica,
cn=dc\3Didm\2Cdc\3Dlab\2Cdc\3Dbos\2Cdc\3Dredhat\2Cdc\3Dcom,cn=mapping
tree,cn=config), error 105
[04/Jul/2012:06:28:16 -0400] NSMMReplicationPlugin - cleanAllRUV_task: replica
(cn=meTovm-055.idm.lab.
bos.redhat.com,cn=replica,cn=dc\3Didm\2Cdc\3Dlab\2Cdc\3Dbos\2Cdc\3Dredhat\2Cdc\3Dcom,cn=mapping
tree, cn=config) has not been cleaned. You will need to rerun the
CLEANALLRUV task on this replica.
[04/Jul/2012:06:28:16 -0400] NSMMReplicationPlugin - cleanAllRUV_task: Task
failed (1)
In this case I think we should inform user that the command failed, possibly
because of disconnected replicas and that they could enable the replicas and
try again.
7) (minor) "pass" is now redundant in replication.py:
+ except ldap.INSUFFICIENT_ACCESS:
+ # We can't make the server we're removing read-only but
+ # this isn't a show-stopper
+ root_logger.debug("No permission to switch replica to read-only,
continuing anyway")
+ pass
I think this addresses everything.
rob
Thanks, almost there! I just found one more issue which needs to be fixed
before we push:
# ipa-replica-manage del vm-055.idm.lab.bos.redhat.com --force
Directory Manager password:
Unable to connect to replica vm-055.idm.lab.bos.redhat.com, forcing removal
Failed to get data from 'vm-055.idm.lab.bos.redhat.com': {'desc': "Can't
contact LDAP server"}
Forcing removal on 'vm-086.idm.lab.bos.redhat.com'
There were issues removing a connection: %d format: a number is required, not
str
Failed to get data from 'vm-055.idm.lab.bos.redhat.com': {'desc': "Can't
contact LDAP server"}
This is a traceback I retrieved:
Traceback (most recent call last):
File "/sbin/ipa-replica-manage", line 425, in del_master
del_link(realm, r, hostname, options.dirman_passwd, force=True)
File "/sbin/ipa-replica-manage", line 271, in del_link
repl1.cleanallruv(replica_id)
File "/usr/lib/python2.7/site-packages/ipaserver/install/replication.py",
line 1094, in cleanallruv
root_logger.debug("Creating CLEANALLRUV task for replica id %d" %
replicaId)
The problem here is that you don't convert replica_id to int in this part:
+ replica_id = None
+ if repl2:
+ replica_id = repl2._get_replica_id(repl2.conn, None)
+ else:
+ servers = get_ruv(realm, replica1, dirman_passwd)
+ for (netloc, rid) in servers:
+ if netloc.startswith(replica2):
+ replica_id = rid
+ break
Martin
Updated patch using new mechanism in 389-ds-base. This should more
thoroughly clean out RUV data when a replica is being deleted, and
provide for a way to delete RUV data afterwards too if necessary.
rob
>From 5490d8c955a5e6928622118d99c5bd8df8e0c6c3 Mon Sep 17 00:00:00 2001
From: Rob Crittenden <rcrit...@redhat.com>
Date: Wed, 27 Jun 2012 14:51:45 -0400
Subject: [PATCH] Run the CLEANALLRUV task when deleting a replication
agreement.
This adds two new commands to ipa-replica-manage: list-ruv & clean-ruv
list-ruv can be use to list the update vectors the master has
configugured
clean-ruv can be used to fire off the CLEANRUV task to remove a
replication vector. It should be used with caution.
https://fedorahosted.org/freeipa/ticket/2303
---
freeipa.spec.in | 6 +-
install/share/replica-acis.ldif | 5 ++
install/tools/ipa-replica-manage | 107 +++++++++++++++++++++++++++++++-
install/tools/man/ipa-replica-manage.1 | 17 +++++
install/updates/40-replication.update | 4 ++
install/updates/Makefile.am | 1 +
ipaserver/install/replication.py | 35 +++++++++++
7 files changed, 173 insertions(+), 2 deletions(-)
create mode 100644 install/updates/40-replication.update
diff --git a/freeipa.spec.in b/freeipa.spec.in
index 08e1fa2054f366cad6429752de6a9f98edf17ab2..8e6c977abd412ca9a9d7c25024ce1d8132859a4e 100644
--- a/freeipa.spec.in
+++ b/freeipa.spec.in
@@ -101,7 +101,7 @@ Requires: %{name}-client = %{version}-%{release}
Requires: %{name}-admintools = %{version}-%{release}
Requires: %{name}-server-selinux = %{version}-%{release}
%if 0%{?fedora} >= 17
-Requires(pre): 389-ds-base >= 1.2.11.9-1
+Requires(pre): 389-ds-base >= 1.2.11.11-1
%else
Requires(pre): 389-ds-base >= 1.2.10.10-1
%endif
@@ -751,6 +751,10 @@ fi
%ghost %attr(0644,root,apache) %config(noreplace) %{_sysconfdir}/ipa/ca.crt
%changelog
+* Mon Aug 20 2012 Rob Crittenden <rcrit...@redhat.com> - 2.99.0-43
+- Set min for 389-ds-base to 1.2.11.11-1 on F17+ to pull in updated
+ RUV code and nsslapd-readonly schema.
+
* Mon Aug 20 2012 Rob Crittenden <rcrit...@redhat.com> - 2.99.0-42
- Set min for 389-ds-base to 1.2.11.9-1 on F17+ to pull in warning about
low nsslapd-cachememsize.
diff --git a/install/share/replica-acis.ldif b/install/share/replica-acis.ldif
index baa6216166eb3c661f771b8ef8346e7ee685f4f2..65dfb7a669965731dfd2c6ac1efd99209a2ea404 100644
--- a/install/share/replica-acis.ldif
+++ b/install/share/replica-acis.ldif
@@ -20,6 +20,11 @@ changetype: modify
add: aci
aci: (targetattr=*)(targetfilter="(|(objectclass=nsds5replicationagreement)(objectclass=nsDSWindowsReplicationAgreement))")(version 3.0;acl "permission:Remove Replication Agreements";allow (delete) groupdn = "ldap:///cn=Remove Replication Agreements,cn=permissions,cn=pbac,$SUFFIX";)
+dn: cn=userRoot,cn=ldbm database,cn=plugins,cn=config
+changetype: modify
+add: aci
+aci: (targetattr=nsslapd-readonly)(version 3.0; acl "Allow marking the database readonly"; allow (write) groupdn = "ldap:///cn=Remove Replication Agreements,cn=permissions,cn=pbac,$SUFFIX";)
+
dn: cn=tasks,cn=config
changetype: modify
add: aci
diff --git a/install/tools/ipa-replica-manage b/install/tools/ipa-replica-manage
index 111042ad3f890e112f039cdee8fb0429340e1d04..5023d9a8d761943515c8753237db0ed5c42ab813 100755
--- a/install/tools/ipa-replica-manage
+++ b/install/tools/ipa-replica-manage
@@ -22,6 +22,7 @@ import os
import ldap, re, krbV
import traceback
+from urllib2 import urlparse
from ipapython import ipautil
from ipaserver.install import replication, dsinstance, installutils
@@ -38,6 +39,7 @@ CACERT = "/etc/ipa/ca.crt"
# dict of command name and tuples of min/max num of args needed
commands = {
"list":(0, 1, "[master fqdn]", ""),
+ "list-ruv":(0, 0, "", ""),
"connect":(1, 2, "<master fqdn> [other master fqdn]",
"must provide the name of the servers to connect"),
"disconnect":(1, 2, "<master fqdn> [other master fqdn]",
@@ -45,7 +47,8 @@ commands = {
"del":(1, 1, "<master fqdn>",
"must provide hostname of master to delete"),
"re-initialize":(0, 0, "", ""),
- "force-sync":(0, 0, "", "")
+ "force-sync":(0, 0, "", ""),
+ "clean-ruv":(1, 1, "Replica ID of to clean", ""),
}
def parse_options():
@@ -233,6 +236,10 @@ def del_link(realm, replica1, replica2, dirman_passwd, force=False):
if repl2 and type1 == replication.IPA_REPLICA:
failed = False
try:
+ repl2.make_readonly()
+ repl2.force_sync(repl2.conn, replica1)
+ cn, dn = repl2.agreement_dn(repl1.conn.host)
+ repl2.wait_for_repl_update(repl2.conn, dn, 30)
repl2.delete_agreement(replica1)
repl2.delete_referral(replica1)
except ldap.LDAPError, e:
@@ -256,6 +263,17 @@ def del_link(realm, replica1, replica2, dirman_passwd, force=False):
repl1.delete_agreement(replica2)
repl1.delete_referral(replica2)
+ if type1 == replication.IPA_REPLICA:
+ if repl2:
+ ruv = repl2._get_replica_id(repl2.conn, None)
+ else:
+ ruv = get_ruv_by_host(realm, replica1, replica2, dirman_passwd)
+
+ try:
+ repl1.cleanallruv(ruv)
+ except KeyboardInterrupt:
+ pass
+
if type1 == replication.WINSYNC:
try:
dn = DN(('cn', replica2), ('cn', 'replicas'), ('cn', 'ipa'), ('cn', 'etc'),
@@ -272,6 +290,89 @@ def del_link(realm, replica1, replica2, dirman_passwd, force=False):
print "Deleted replication agreement from '%s' to '%s'" % (replica1, replica2)
+def get_ruv(realm, host, dirman_passwd):
+ """
+ Return the RUV entries as a list of tuples: (hostname, rid)
+ """
+ try:
+ thisrepl = replication.ReplicationManager(realm, host, dirman_passwd)
+ except Exception, e:
+ print "Failed to connect to server %s: %s" % (host, str(e))
+ sys.exit(1)
+
+ search_filter = '(&(nsuniqueid=ffffffff-ffffffff-ffffffff-ffffffff)(objectclass=nstombstone))'
+ try:
+ entries = thisrepl.conn.search_s(api.env.basedn, ldap.SCOPE_ONELEVEL,
+ search_filter, ['nsds50ruv'])
+ except ldap.NO_SUCH_OBJECT:
+ print "No RUV records found."
+ sys.exit(0)
+
+ servers = []
+ for ruv in entries[0][1]['nsds50ruv']:
+ if ruv.startswith('{replicageneration'):
+ continue
+ data = re.match('\{replica (\d+) (ldap://.*:\d+)\}\s+\w+\s+\w*', ruv)
+ if data:
+ rid = data.group(1)
+ (scheme, netloc, path, params, query, fragment) = urlparse.urlparse(data.group(2))
+ servers.append((netloc, rid))
+ else:
+ print "unable to decode: %s" % ruv
+
+ return servers
+
+def list_ruv(realm, host, dirman_passwd, verbose):
+ """
+ List the Replica Update Vectors on this host to get the available
+ replica IDs.
+ """
+ servers = get_ruv(realm, host, dirman_passwd)
+ for (netloc, rid) in servers:
+ print "%s: %s" % (netloc, rid)
+
+def get_ruv_by_host(realm, sourcehost, host, dirman_passwd):
+ """
+ Try to determine the RUV by host name.
+ """
+ servers = get_ruv(realm, sourcehost, dirman_passwd)
+ for (netloc, rid) in servers:
+ if '%s:389' % host == netloc:
+ return int(rid)
+
+def clean_ruv(realm, ruv, options):
+ """
+ Given an RID create a CLEANALLRUV task to clean it up.
+ """
+ try:
+ ruv = int(ruv)
+ except ValueError:
+ sys.exit("Replica ID must be an integer: %s" % ruv)
+
+ servers = get_ruv(realm, options.host, options.dirman_passwd)
+ found = False
+ for (netloc, rid) in servers:
+ if ruv == int(rid):
+ found = True
+ hostname = netloc
+ break
+
+ if not found:
+ sys.exit("Replica ID %s not found" % ruv)
+
+ print "Clean the Replication Update Vector for %s" % hostname
+ print
+ print "Cleaning the wrong replica ID will cause that server to no"
+ print "longer replicate so it may miss updates while the process"
+ print "is running. It would need to be re-initialized to maintain"
+ print "consistency. Be very careful."
+ if not ipautil.user_input("Continue to clean?", False):
+ sys.exit("Aborted")
+ thisrepl = replication.ReplicationManager(realm, options.host,
+ options.dirman_passwd)
+ thisrepl.cleanallruv(ruv)
+ print "Cleanup task created"
+
def del_master(realm, hostname, options):
force_del = False
@@ -513,6 +614,8 @@ def main():
if len(args) == 2:
replica = args[1]
list_replicas(realm, host, replica, dirman_passwd, options.verbose)
+ elif args[0] == "list-ruv":
+ list_ruv(realm, host, dirman_passwd, options.verbose)
elif args[0] == "del":
del_master(realm, args[1], options)
elif args[0] == "re-initialize":
@@ -541,6 +644,8 @@ def main():
replica1 = host
replica2 = args[1]
del_link(realm, replica1, replica2, dirman_passwd)
+ elif args[0] == "clean-ruv":
+ clean_ruv(realm, args[1], options)
try:
main()
diff --git a/install/tools/man/ipa-replica-manage.1 b/install/tools/man/ipa-replica-manage.1
index 98103ffdd416f11c44e147e6b4eb84c682da39e0..4a1c489f33591ff6ac98fe7f9a16ebb6a52ee28a 100644
--- a/install/tools/man/ipa-replica-manage.1
+++ b/install/tools/man/ipa-replica-manage.1
@@ -42,11 +42,23 @@ Manages the replication agreements of an IPA server.
\fBforce\-sync\fR
\- Immediately flush any data to be replicated from a server specified with the \-\-from option
.TP
+\fBlist\-ruv\fR
+\- List the replication IDs on this server.
+.TP
+\fBclean\-ruv\fR [REPLICATION_ID]
+\- Run the CLEANALLRUV task to remove a replication ID.
+.TP
The connect and disconnect options are used to manage the replication topology. When a replica is created it is only connected with the master that created it. The connect option may be used to connect it to other existing replicas.
.TP
The disconnect option cannot be used to remove the last link of a replica. To remove a replica from the topology use the del option.
.TP
If a replica is deleted and then re\-added within a short time\-frame then the 389\-ds instance on the master that created it should be restarted before re\-installing the replica. The master will have the old service principals cached which will cause replication to fail.
+.TP
+Each IPA master server has a unique replication ID. This ID is used by 389\-ds\-base when storing information about replication status. The output consists of the masters and their respective replication ID. See \fBclean\-ruv\fR
+.TP
+When a master is removed, all other masters need to remove its replication ID from the list of masters. Normally this occurs automatically when a master is deleted with ipa\-replica\-manage. If one or more masters was down or unreachable when ipa\-replica\-manage was executed then this replica ID may still exist. The clean\-ruv command may be used to clean up an unused replication ID.
+.TP
+\fBNOTE\fR: clean\-ruv is \fBVERY DANGEROUS\fR. Execution against the wrong replication ID can result in inconsistent data on that master. The master should be re\-initialized from another if this happens.
.SH "OPTIONS"
.TP
\fB\-H\fR \fIHOST\fR, \fB\-\-host\fR=\fIHOST\fR
@@ -112,6 +124,11 @@ Completely remove a replica:
# ipa\-replica\-manage del srv4.example.com
.TP
Using connect/disconnect you can manage the replication topology.
+.TP
+List the replication IDs in use:
+ # ipa\-replica\-manage list\-ruv
+ srv1.example.com:389: 7
+ srv2.example.com:389: 4
.SH "WINSYNC"
Creating a Windows AD Synchronization agreement is similar to creating an IPA replication agreement, there are just a couple of extra steps.
diff --git a/install/updates/40-replication.update b/install/updates/40-replication.update
new file mode 100644
index 0000000000000000000000000000000000000000..f9e0496be336ec7653e6b1688ad28245014ce6a0
--- /dev/null
+++ b/install/updates/40-replication.update
@@ -0,0 +1,4 @@
+# Let a delegated user put the database into read-only mode when deleting
+# an agreement.
+dn: cn=userRoot,cn=ldbm database,cn=plugins,cn=config
+add:aci: '(targetattr=nsslapd-readonly)(version 3.0; acl "Allow marking the database readonly"; allow (write) groupdn = "ldap:///cn=Remove Replication Agreements,cn=permissions,cn=pbac,$SUFFIX";)'
diff --git a/install/updates/Makefile.am b/install/updates/Makefile.am
index bc7945d7a5cd77469f7fe7175ebd9da66b9119d1..434fe5bae085c0d2911a5a7592fe9b26f4bbb418 100644
--- a/install/updates/Makefile.am
+++ b/install/updates/Makefile.am
@@ -25,6 +25,7 @@ app_DATA = \
21-ca_renewal_container.update \
30-s4u2proxy.update \
40-delegation.update \
+ 40-replication.update \
40-dns.update \
40-automember.update \
45-roles.update \
diff --git a/ipaserver/install/replication.py b/ipaserver/install/replication.py
index 950e8ffc65795da4533612250725b7997a6f6e60..89a615b65dfa9c2e336b0948aaa2e1a14e543497 100644
--- a/ipaserver/install/replication.py
+++ b/ipaserver/install/replication.py
@@ -1080,3 +1080,38 @@ class ReplicationManager(object):
if err:
raise err #pylint: disable=E0702
+
+ def make_readonly(self):
+ """
+ Make the current replication agreement read-only.
+ """
+ dn = DN(('cn', 'userRoot'), ('cn', 'ldbm database'),
+ ('cn', 'plugins'), ('cn', 'config'))
+
+ mod = [(ldap.MOD_REPLACE, 'nsslapd-readonly', 'on')]
+ try:
+ self.conn.modify_s(dn, mod)
+ except ldap.INSUFFICIENT_ACCESS:
+ # We can't make the server we're removing read-only but
+ # this isn't a show-stopper
+ root_logger.debug("No permission to switch replica to read-only, continuing anyway")
+ pass
+
+ def cleanallruv(self, replicaId):
+ """
+ Create a CLEANALLRUV task and monitor it until it has
+ completed.
+ """
+ root_logger.debug("Creating CLEANALLRUV task for replica id %d" % replicaId)
+
+ dn = DN(('cn', 'clean %d' % replicaId), ('cn', 'cleanallruv'),('cn', 'tasks'), ('cn', 'config'))
+ e = ipaldap.Entry(dn)
+ e.setValues('objectclass', ['top', 'extensibleObject'])
+ e.setValue('replica-base-dn', api.env.basedn)
+ e.setValue('replica-id', replicaId)
+ e.setValue('cn', 'clean %d' % replicaId)
+ self.conn.addEntry(e)
+
+ print "Background task created to clean replication data"
+
+ self.conn.checkTask(dn, dowait=True)
--
1.7.10.4
_______________________________________________
Freeipa-devel mailing list
Freeipa-devel@redhat.com
https://www.redhat.com/mailman/listinfo/freeipa-devel