Martin Kosek wrote:
On 09/17/2012 04:04 PM, Rob Crittenden wrote:
Martin Kosek wrote:
On 09/14/2012 09:17 PM, Rob Crittenden wrote:
Martin Kosek wrote:
On 09/06/2012 11:17 PM, Rob Crittenden wrote:
Martin Kosek wrote:
On 09/06/2012 05:55 PM, Rob Crittenden wrote:
Rob Crittenden wrote:
Rob Crittenden wrote:
Martin Kosek wrote:
On 09/05/2012 08:06 PM, Rob Crittenden wrote:
Rob Crittenden wrote:
Martin Kosek wrote:
On 07/05/2012 08:39 PM, Rob Crittenden wrote:
Martin Kosek wrote:
On 07/03/2012 04:41 PM, Rob Crittenden wrote:
Deleting a replica can leave a replication vector (RUV) on the
other servers.
This can confuse things if the replica is re-added, and it also
causes the
server to calculate changes against a server that may no longer
exist.
389-ds-base provides a new task that self-propogates itself to all
available
replicas to clean this RUV data.
This patch will create this task at deletion time to hopefully
clean things up.
It isn't perfect. If any replica is down or unavailable at the
time
the
cleanruv task fires, and then comes back up, the old RUV data
may be
re-propogated around.
To make things easier in this case I've added two new commands to
ipa-replica-manage. The first lists the replication ids of all the
servers we
have a RUV for. Using this you can call clean_ruv with the
replication id of a
server that no longer exists to try the cleanallruv step again.
This is quite dangerous though. If you run cleanruv against a
replica id that
does exist it can cause a loss of data. I believe I've put in
enough scary
warnings about this.
rob
Good work there, this should make cleaning RUVs much easier than
with the
previous version.
This is what I found during review:
1) list_ruv and clean_ruv command help in man is quite lost. I
think
it would
help if we for example have all info for commands indented. This
way
user could
simply over-look the new commands in the man page.
2) I would rename new commands to clean-ruv and list-ruv to make
them
consistent with the rest of the commands (re-initialize,
force-sync).
3) It would be nice to be able to run clean_ruv command in an
unattended way
(for better testing), i.e. respect --force option as we already
do for
ipa-replica-manage del. This fix would aid test automation in the
future.
4) (minor) The new question (and the del too) does not react too
well for
CTRL+D:
# ipa-replica-manage clean_ruv 3 --force
Clean the Replication Update Vector for
vm-055.idm.lab.bos.redhat.com:389
Cleaning the wrong replica ID will cause that server to no
longer replicate so it may miss updates while the process
is running. It would need to be re-initialized to maintain
consistency. Be very careful.
Continue to clean? [no]: unexpected error:
5) Help for clean_ruv command without a required parameter is quite
confusing
as it reports that command is wrong and not the parameter:
# ipa-replica-manage clean_ruv
Usage: ipa-replica-manage [options]
ipa-replica-manage: error: must provide a command [clean_ruv |
force-sync |
disconnect | connect | del | re-initialize | list | list_ruv]
It seems you just forgot to specify the error message in the
command
definition
6) When the remote replica is down, the clean_ruv command fails
with an
unexpected error:
[root@vm-086 ~]# ipa-replica-manage clean_ruv 5
Clean the Replication Update Vector for
vm-055.idm.lab.bos.redhat.com:389
Cleaning the wrong replica ID will cause that server to no
longer replicate so it may miss updates while the process
is running. It would need to be re-initialized to maintain
consistency. Be very careful.
Continue to clean? [no]: y
unexpected error: {'desc': 'Operations error'}
/var/log/dirsrv/slapd-IDM-LAB-BOS-REDHAT-COM/errors:
[04/Jul/2012:06:28:16 -0400] NSMMReplicationPlugin -
cleanAllRUV_task: failed
to connect to repl agreement connection
(cn=meTovm-055.idm.lab.bos.redhat.com,cn=replica,
cn=dc\3Didm\2Cdc\3Dlab\2Cdc\3Dbos\2Cdc\3Dredhat\2Cdc\3Dcom,cn=mapping
tree,cn=config), error 105
[04/Jul/2012:06:28:16 -0400] NSMMReplicationPlugin -
cleanAllRUV_task: replica
(cn=meTovm-055.idm.lab.
bos.redhat.com,cn=replica,cn=dc\3Didm\2Cdc\3Dlab\2Cdc\3Dbos\2Cdc\3Dredhat\2Cdc\3Dcom,cn=mapping
tree, cn=config) has not been cleaned. You will need to rerun
the
CLEANALLRUV task on this replica.
[04/Jul/2012:06:28:16 -0400] NSMMReplicationPlugin -
cleanAllRUV_task: Task
failed (1)
In this case I think we should inform user that the command failed,
possibly
because of disconnected replicas and that they could enable the
replicas and
try again.
7) (minor) "pass" is now redundant in replication.py:
+ except ldap.INSUFFICIENT_ACCESS:
+ # We can't make the server we're removing read-only
but
+ # this isn't a show-stopper
+ root_logger.debug("No permission to switch replica to
read-only,
continuing anyway")
+ pass
I think this addresses everything.
rob
Thanks, almost there! I just found one more issue which needs to be
fixed
before we push:
# ipa-replica-manage del vm-055.idm.lab.bos.redhat.com --force
Directory Manager password:
Unable to connect to replica vm-055.idm.lab.bos.redhat.com, forcing
removal
Failed to get data from 'vm-055.idm.lab.bos.redhat.com': {'desc':
"Can't
contact LDAP server"}
Forcing removal on 'vm-086.idm.lab.bos.redhat.com'
There were issues removing a connection: %d format: a number is
required, not str
Failed to get data from 'vm-055.idm.lab.bos.redhat.com': {'desc':
"Can't
contact LDAP server"}
This is a traceback I retrieved:
Traceback (most recent call last):
File "/sbin/ipa-replica-manage", line 425, in del_master
del_link(realm, r, hostname, options.dirman_passwd,
force=True)
File "/sbin/ipa-replica-manage", line 271, in del_link
repl1.cleanallruv(replica_id)
File
"/usr/lib/python2.7/site-packages/ipaserver/install/replication.py",
line 1094, in cleanallruv
root_logger.debug("Creating CLEANALLRUV task for replica id
%d" %
replicaId)
The problem here is that you don't convert replica_id to int in this
part:
+ replica_id = None
+ if repl2:
+ replica_id = repl2._get_replica_id(repl2.conn, None)
+ else:
+ servers = get_ruv(realm, replica1, dirman_passwd)
+ for (netloc, rid) in servers:
+ if netloc.startswith(replica2):
+ replica_id = rid
+ break
Martin
Updated patch using new mechanism in 389-ds-base. This should more
thoroughly clean out RUV data when a replica is being deleted, and
provide for a way to delete RUV data afterwards too if necessary.
rob
Rebased patch
rob
0) As I wrote in a review for your patch 1041, changelog entry slipped
elsewhere.
1) The following KeyboardInterrupt except class looks suspicious. I
know why
you have it there, but since it is generally a bad thing to do, some
comment
why it is needed would be useful.
@@ -256,6 +263,17 @@ def del_link(realm, replica1, replica2,
dirman_passwd,
force=False):
repl1.delete_agreement(replica2)
repl1.delete_referral(replica2)
+ if type1 == replication.IPA_REPLICA:
+ if repl2:
+ ruv = repl2._get_replica_id(repl2.conn, None)
+ else:
+ ruv = get_ruv_by_host(realm, replica1, replica2,
dirman_passwd)
+
+ try:
+ repl1.cleanallruv(ruv)
+ except KeyboardInterrupt:
+ pass
+
Maybe you just wanted to do some cleanup and then "raise" again?
No, it is there because it is safe to break out of it. The task will
continue to run. I added some verbiage.
2) This is related to 1), but when some replica is down,
"ipa-replica-manage
del" may wait indefinitely when some remote replica is down, right?
# ipa-replica-manage del vm-055.idm.lab.bos.redhat.com
Deleting a master is irreversible.
To reconnect to the remote master you will need to prepare a new
replica file
and re-install.
Continue to delete? [no]: y
ipa: INFO: Setting agreement
cn=meTovm-086.idm.lab.bos.redhat.com,cn=replica,cn=dc\=idm\,dc\=lab\,dc\=bos\,dc\=redhat\,dc\=com,cn=mapping
tree,cn=config schedule to 2358-2359 0 to force synch
ipa: INFO: Deleting schedule 2358-2359 0 from agreement
cn=meTovm-086.idm.lab.bos.redhat.com,cn=replica,cn=dc\=idm\,dc\=lab\,dc\=bos\,dc\=redhat\,dc\=com,cn=mapping
tree,cn=config
ipa: INFO: Replication Update in progress: FALSE: status: 0 Replica
acquired
successfully: Incremental update succeeded: start: 0: end: 0
Background task created to clean replication data
... after about a minute I hit CTRL+C
^CDeleted replication agreement from 'vm-086.idm.lab.bos.redhat.com' to
'vm-055.idm.lab.bos.redhat.com'
Failed to cleanup vm-055.idm.lab.bos.redhat.com DNS entries: NS record
does not
contain 'vm-055.idm.lab.bos.redhat.com.'
You may need to manually remove them from the tree
I think it would be better to inform user that some remote replica is
down or
at least that we are waiting for the task to complete. Something like
that:
# ipa-replica-manage del vm-055.idm.lab.bos.redhat.com
...
Background task created to clean replication data
Replication data clean up may take very long time if some replica is
unreachable
Hit CTRL+C to interrupt the wait
^C Clean up wait interrupted
....
[continue with del]
Yup, did this in #1.
3) (minor) When there is a cleanruv task running and you run
"ipa-replica-manage del", there is a unexpected error message with
duplicate
task object in LDAP:
# ipa-replica-manage del vm-072.idm.lab.bos.redhat.com --force
Unable to connect to replica vm-072.idm.lab.bos.redhat.com, forcing
removal
FAIL
Failed to get data from 'vm-072.idm.lab.bos.redhat.com': {'desc': "Can't
contact LDAP server"}
Forcing removal on 'vm-086.idm.lab.bos.redhat.com'
There were issues removing a connection: This entry already exists
<<<<<<<<<
Failed to get data from 'vm-072.idm.lab.bos.redhat.com': {'desc': "Can't
contact LDAP server"}
Failed to cleanup vm-072.idm.lab.bos.redhat.com DNS entries: NS record
does not
contain 'vm-072.idm.lab.bos.redhat.com.'
You may need to manually remove them from the tree
I think it should be enough to just catch for "entry already exists" in
cleanallruv function, and in such case print a relevant error message
bail out.
Thus, self.conn.checkTask(dn, dowait=True) would not be called too.
Good catch, fixed.
4) (minor): In make_readonly function, there is a redundant "pass"
statement:
+ def make_readonly(self):
+ """
+ Make the current replication agreement read-only.
+ """
+ dn = DN(('cn', 'userRoot'), ('cn', 'ldbm database'),
+ ('cn', 'plugins'), ('cn', 'config'))
+
+ mod = [(ldap.MOD_REPLACE, 'nsslapd-readonly', 'on')]
+ try:
+ self.conn.modify_s(dn, mod)
+ except ldap.INSUFFICIENT_ACCESS:
+ # We can't make the server we're removing read-only but
+ # this isn't a show-stopper
+ root_logger.debug("No permission to switch replica to
read-only,
continuing anyway")
+ pass <<<<<<<<<<<<<<<
Yeah, this is one of my common mistakes. I put in a pass initially, then
add logging in front of it and forget to delete the pass. Its gone now.
5) In clean_ruv, I think allowing a --force option to bypass the
user_input
would be helpful (at least for test automation):
+ if not ipautil.user_input("Continue to clean?", False):
+ sys.exit("Aborted")
Yup, added.
rob
Slightly revised patch. I still had a window open with one unsaved change.
rob
Apparently there were two unsaved changes, one of which was lost. This
adds in
the 'entry already exists' fix.
rob
Just one last thing (otherwise the patch is OK) - I don't think this is
what we
want :-)
# ipa-replica-manage clean-ruv 8
Clean the Replication Update Vector for vm-055.idm.lab.bos.redhat.com:389
Cleaning the wrong replica ID will cause that server to no
longer replicate so it may miss updates while the process
is running. It would need to be re-initialized to maintain
consistency. Be very careful.
Continue to clean? [no]: y <<<<<<
Aborted
Nor this exception, (your are checking for wrong exception):
# ipa-replica-manage clean-ruv 8
Clean the Replication Update Vector for vm-055.idm.lab.bos.redhat.com:389
Cleaning the wrong replica ID will cause that server to no
longer replicate so it may miss updates while the process
is running. It would need to be re-initialized to maintain
consistency. Be very careful.
Continue to clean? [no]:
unexpected error: This entry already exists
This is the exception:
Traceback (most recent call last):
File "/sbin/ipa-replica-manage", line 651, in <module>
main()
File "/sbin/ipa-replica-manage", line 648, in main
clean_ruv(realm, args[1], options)
File "/sbin/ipa-replica-manage", line 373, in clean_ruv
thisrepl.cleanallruv(ruv)
File
"/usr/lib/python2.7/site-packages/ipaserver/install/replication.py",
line 1136, in cleanallruv
self.conn.addEntry(e)
File "/usr/lib/python2.7/site-packages/ipaserver/ipaldap.py", line
503, in
addEntry
self.__handle_errors(e, arg_desc=arg_desc)
File "/usr/lib/python2.7/site-packages/ipaserver/ipaldap.py", line
321, in
__handle_errors
raise errors.DuplicateEntry()
ipalib.errors.DuplicateEntry: This entry already exists
Martin
Fixed that and a couple of other problems. When doing a disconnect we should
not also call clean-ruv.
Ah, good self-catch.
I also got tired of seeing crappy error messages so I added a little convert
utility.
rob
1) There is CLEANALLRUV stuff included in 1050-3 and not here. There are also
some finding for this new code.
2) We may want to bump Requires to higher version of 389-ds-base
(389-ds-base-1.2.11.14-1) - it contains a fix for CLEANALLRUV+winsync bug I
found earlier.
3) I just discovered another suspicious behavior. When we are deleting a
master
that has links also to other master(s) we delete those too. But we also
automatically run CLEANALLRUV in these cases, so we may end up in multiple
tasks being started on different masters - this does not look right.
I think we may rather want to at first delete all links and then run
CLEANALLRUV task, just for one time. This is what I get with current code:
# ipa-replica-manage del vm-072.idm.lab.bos.redhat.com
Directory Manager password:
Deleting a master is irreversible.
To reconnect to the remote master you will need to prepare a new replica file
and re-install.
Continue to delete? [no]: yes
ipa: INFO: Setting agreement
cn=meTovm-055.idm.lab.bos.redhat.com,cn=replica,cn=dc\=idm\,dc\=lab\,dc\=bos\,dc\=redhat\,dc\=com,cn=mapping
tree,cn=config schedule to 2358-2359 0 to force synch
ipa: INFO: Deleting schedule 2358-2359 0 from agreement
cn=meTovm-055.idm.lab.bos.redhat.com,cn=replica,cn=dc\=idm\,dc\=lab\,dc\=bos\,dc\=redhat\,dc\=com,cn=mapping
tree,cn=config
ipa: INFO: Replication Update in progress: FALSE: status: 0 Replica acquired
successfully: Incremental update succeeded: start: 0: end: 0
Background task created to clean replication data. This may take a while.
This may be safely interrupted with Ctrl+C
^CWait for task interrupted. It will continue to run in the background
Deleted replication agreement from 'vm-055.idm.lab.bos.redhat.com' to
'vm-072.idm.lab.bos.redhat.com'
ipa: INFO: Setting agreement
cn=meTovm-086.idm.lab.bos.redhat.com,cn=replica,cn=dc\=idm\,dc\=lab\,dc\=bos\,dc\=redhat\,dc\=com,cn=mapping
tree,cn=config schedule to 2358-2359 0 to force synch
ipa: INFO: Deleting schedule 2358-2359 0 from agreement
cn=meTovm-086.idm.lab.bos.redhat.com,cn=replica,cn=dc\=idm\,dc\=lab\,dc\=bos\,dc\=redhat\,dc\=com,cn=mapping
tree,cn=config
ipa: INFO: Replication Update in progress: FALSE: status: 0 Replica acquired
successfully: Incremental update succeeded: start: 0: end: 0
Background task created to clean replication data. This may take a while.
This may be safely interrupted with Ctrl+C
^CWait for task interrupted. It will continue to run in the background
Deleted replication agreement from 'vm-086.idm.lab.bos.redhat.com' to
'vm-072.idm.lab.bos.redhat.com'
Failed to cleanup vm-072.idm.lab.bos.redhat.com DNS entries: NS record does
not
contain 'vm-072.idm.lab.bos.redhat.com.'
You may need to manually remove them from the tree
Martin
All issues addressed and I pulled in abort-clean-ruv from 1050. I added a
list-clean-ruv command as well.
rob
1) Patch 1031-9 needs to get squashed with 1031-8
2) Patch needs a rebase (conflict in freeipa.spec.in)
3) New list-clean-ruv man entry is not right:
list-clean-ruv [REPLICATION_ID]
- List all running CLEANALLRUV and abort CLEANALLRUV tasks.
REPLICATION_ID is not its argument.
Fixed 1-3.
Btw. new list-clean-ruv command proved very useful for me.
4) I just found out we need to do a better job with make_readonly() command. I
get into trouble when disconnecting one link to a remote replica as it was
marked readonly and then I was then unable to manage the disconnected replica
properly (vm-072 is the replica made readonly):
Ok, I reset read-only after we delete the agreements. That fixed things up for
me. I disconnected a replica and was able to modify entries on that replica
afterwards.
This affected the --cleanup command too, it would otherwise have succeeded I
think.
I tested with an A - B - C - A agreement loop. I disconnected A and C and
confirmed I could still update entries on C. Then I deleted C, then B, and made
sure output looked right, I could still manage entries, etc.
rob
[root@vm-055 ~]# ipa-replica-manage disconnect vm-072.idm.lab.bos.redhat.com
[root@vm-072 ~]# ipa-replica-manage del vm-055.idm.lab.bos.redhat.com
Deleting a master is irreversible.
To reconnect to the remote master you will need to prepare a new replica file
and re-install.
Continue to delete? [no]: yes
Deleting replication agreements between vm-055.idm.lab.bos.redhat.com and
vm-072.idm.lab.bos.redhat.com
ipa: INFO: Setting agreement
cn=meTovm-072.idm.lab.bos.redhat.com,cn=replica,cn=dc\=idm\,dc\=lab\,dc\=bos\,dc\=redhat\,dc\=com,cn=mapping
tree,cn=config schedule to 2358-2359 0 to force synch
ipa: INFO: Deleting schedule 2358-2359 0 from agreement
cn=meTovm-072.idm.lab.bos.redhat.com,cn=replica,cn=dc\=idm\,dc\=lab\,dc\=bos\,dc\=redhat\,dc\=com,cn=mapping
tree,cn=config
ipa: INFO: Replication Update in progress: FALSE: status: 0 Replica acquired
successfully: Incremental update succeeded: start: 0: end: 0
Deleted replication agreement from 'vm-072.idm.lab.bos.redhat.com' to
'vm-055.idm.lab.bos.redhat.com'
Unable to remove replication agreement for vm-055.idm.lab.bos.redhat.com from
vm-072.idm.lab.bos.redhat.com.
Background task created to clean replication data. This may take a while.
This may be safely interrupted with Ctrl+C
^CWait for task interrupted. It will continue to run in the background
Failed to cleanup vm-055.idm.lab.bos.redhat.com entries: Server is unwilling to
perform: database is read-only arguments:
dn=krbprincipalname=ldap/vm-055.idm.lab.bos.redhat....@idm.lab.bos.redhat.com,cn=services,cn=accounts,dc=idm,dc=lab,dc=bos,dc=redhat,dc=com
You may need to manually remove them from the tree
ipa: INFO: Unhandled LDAPError: {'info': 'database is read-only', 'desc':
'Server is unwilling to perform'}
Failed to cleanup vm-055.idm.lab.bos.redhat.com DNS entries: Server is
unwilling to perform: database is read-only
You may need to manually remove them from the tree
--cleanup did not work for me as well:
[root@vm-072 ~]# ipa-replica-manage del vm-055.idm.lab.bos.redhat.com --force
--cleanup
Cleaning a master is irreversible.
This should not normally be require, so use cautiously.
Continue to clean master? [no]: yes
unexpected error: Server is unwilling to perform: database is read-only
arguments:
dn=krbprincipalname=ldap/vm-055.idm.lab.bos.redhat....@idm.lab.bos.redhat.com,cn=services,cn=accounts,dc=idm,dc=lab,dc=bos,dc=redhat,dc=com
Martin
I think you sent a wrong patch...
Martin
I hate Mondays.
rob
>From a682c572b30b50fe05aebfb30cc57cb3eb8e925b Mon Sep 17 00:00:00 2001
From: Rob Crittenden <rcrit...@redhat.com>
Date: Wed, 27 Jun 2012 14:51:45 -0400
Subject: [PATCH] Run the CLEANALLRUV task when deleting a replication
agreement.
This adds two new commands to ipa-replica-manage: list-ruv & clean-ruv
list-ruv can be use to list the update vectors the master has
configugured
clean-ruv can be used to fire off the CLEANRUV task to remove a
replication vector. It should be used with caution.
https://fedorahosted.org/freeipa/ticket/2303
---
freeipa.spec.in | 12 +-
install/share/replica-acis.ldif | 5 +
install/tools/ipa-replica-manage | 260 ++++++++++++++++++++++++++++++---
install/tools/man/ipa-replica-manage.1 | 23 +++
install/updates/40-replication.update | 4 +
install/updates/Makefile.am | 1 +
ipaserver/install/replication.py | 68 +++++++++
7 files changed, 343 insertions(+), 30 deletions(-)
create mode 100644 install/updates/40-replication.update
diff --git a/freeipa.spec.in b/freeipa.spec.in
index 2173b55b9f4ff4e72d8c9f569c610025926a3a08..104dec6a819d7436dfe29649f17b42ad7e57f986 100644
--- a/freeipa.spec.in
+++ b/freeipa.spec.in
@@ -24,7 +24,7 @@ Source0: freeipa-%{version}.tar.gz
BuildRoot: %{_tmppath}/%{name}-%{version}-%{release}-root-%(%{__id_u} -n)
%if ! %{ONLY_CLIENT}
-BuildRequires: 389-ds-base-devel >= 1.2.10-0.6.a6
+BuildRequires: 389-ds-base-devel >= 1.2.11.14
BuildRequires: svrcore-devel
BuildRequires: /usr/share/selinux/devel/Makefile
BuildRequires: policycoreutils >= %{POLICYCOREUTILSVER}
@@ -100,11 +100,7 @@ Requires: %{name}-python = %{version}-%{release}
Requires: %{name}-client = %{version}-%{release}
Requires: %{name}-admintools = %{version}-%{release}
Requires: %{name}-server-selinux = %{version}-%{release}
-%if 0%{?fedora} >= 17
-Requires(pre): 389-ds-base >= 1.2.11.8-1
-%else
-Requires(pre): 389-ds-base >= 1.2.10.10-1
-%endif
+Requires(pre): 389-ds-base >= 1.2.11.14-1
Requires: openldap-clients
Requires: nss
Requires: nss-tools
@@ -752,6 +748,10 @@ fi
%ghost %attr(0644,root,apache) %config(noreplace) %{_sysconfdir}/ipa/ca.crt
%changelog
+* Mon Sep 17 2012 Rob Crittenden <rcrit...@redhat.com> - 2.99.0-45
+- Set min for 389-ds-base to 1.2.11.14-1 on F17+ to pull in updated
+ RUV code and nsslapd-readonly schema.
+
* Fri Sep 14 2012 Sumit Bose <sb...@redhat.com> - 2.99.0-44
- Updated samba4-devel dependency due to API change
diff --git a/install/share/replica-acis.ldif b/install/share/replica-acis.ldif
index baa6216166eb3c661f771b8ef8346e7ee685f4f2..65dfb7a669965731dfd2c6ac1efd99209a2ea404 100644
--- a/install/share/replica-acis.ldif
+++ b/install/share/replica-acis.ldif
@@ -20,6 +20,11 @@ changetype: modify
add: aci
aci: (targetattr=*)(targetfilter="(|(objectclass=nsds5replicationagreement)(objectclass=nsDSWindowsReplicationAgreement))")(version 3.0;acl "permission:Remove Replication Agreements";allow (delete) groupdn = "ldap:///cn=Remove Replication Agreements,cn=permissions,cn=pbac,$SUFFIX";)
+dn: cn=userRoot,cn=ldbm database,cn=plugins,cn=config
+changetype: modify
+add: aci
+aci: (targetattr=nsslapd-readonly)(version 3.0; acl "Allow marking the database readonly"; allow (write) groupdn = "ldap:///cn=Remove Replication Agreements,cn=permissions,cn=pbac,$SUFFIX";)
+
dn: cn=tasks,cn=config
changetype: modify
add: aci
diff --git a/install/tools/ipa-replica-manage b/install/tools/ipa-replica-manage
index 111042ad3f890e112f039cdee8fb0429340e1d04..dcd44f3c7d21cbf025fcce4bbc609c58b5a6e8f4 100755
--- a/install/tools/ipa-replica-manage
+++ b/install/tools/ipa-replica-manage
@@ -22,6 +22,7 @@ import os
import ldap, re, krbV
import traceback
+from urllib2 import urlparse
from ipapython import ipautil
from ipaserver.install import replication, dsinstance, installutils
@@ -38,6 +39,7 @@ CACERT = "/etc/ipa/ca.crt"
# dict of command name and tuples of min/max num of args needed
commands = {
"list":(0, 1, "[master fqdn]", ""),
+ "list-ruv":(0, 0, "", ""),
"connect":(1, 2, "<master fqdn> [other master fqdn]",
"must provide the name of the servers to connect"),
"disconnect":(1, 2, "<master fqdn> [other master fqdn]",
@@ -45,9 +47,23 @@ commands = {
"del":(1, 1, "<master fqdn>",
"must provide hostname of master to delete"),
"re-initialize":(0, 0, "", ""),
- "force-sync":(0, 0, "", "")
+ "force-sync":(0, 0, "", ""),
+ "clean-ruv":(1, 1, "Replica ID of to clean", "must provide replica ID to clean"),
+ "abort-clean-ruv":(1, 1, "Replica ID to abort cleaning", "must provide replica ID to abort cleaning"),
+ "list-clean-ruv":(0, 0, "", ""),
}
+def convert_error(exc):
+ """
+ LDAP exceptions are a dictionary, make them prettier.
+ """
+ if isinstance(exc, ldap.LDAPError):
+ desc = exc.args[0]['desc'].strip()
+ info = exc.args[0].get('info', '').strip()
+ return '%s %s' % (desc, info)
+ else:
+ return str(exc)
+
def parse_options():
parser = IPAOptionParser(version=version.VERSION)
parser.add_option("-H", "--host", dest="host", help="starting host")
@@ -132,7 +148,7 @@ def list_replicas(realm, host, replica, dirman_passwd, verbose):
try:
entries = conn.getList(dn, ldap.SCOPE_ONELEVEL)
except:
- print "Failed read master data from '%s': %s" % (host, str(e))
+ print "Failed to read master data from '%s': %s" % (host, str(e))
return
else:
for ent in entries:
@@ -177,7 +193,7 @@ def list_replicas(realm, host, replica, dirman_passwd, verbose):
entries = repl.find_replication_agreements()
ent_type = 'replica'
except Exception, e:
- print "Failed to get data from '%s': %s" % (replica, str(e))
+ print "Failed to get data from '%s': %s" % (replica, convert_error(e))
return
for entry in entries:
@@ -190,6 +206,15 @@ def list_replicas(realm, host, replica, dirman_passwd, verbose):
print " last update ended: %s" % str(ipautil.parse_generalized_time(entry.getValue('nsds5replicalastupdateend')))
def del_link(realm, replica1, replica2, dirman_passwd, force=False):
+ """
+ Delete a replication agreement from host A to host B.
+
+ @realm: the Kerberos realm
+ @replica1: the hostname of master A
+ @replica2: the hostname of master B
+ @dirman_passwd: the Directory Manager password
+ @force: force deletion even if one server is down
+ """
repl2 = None
@@ -202,14 +227,14 @@ def del_link(realm, replica1, replica2, dirman_passwd, force=False):
if not force and len(repl_list) <= 1 and type1 == replication.IPA_REPLICA:
print "Cannot remove the last replication link of '%s'" % replica1
print "Please use the 'del' command to remove it from the domain"
- return
+ return False
except (ldap.NO_SUCH_OBJECT, errors.NotFound):
print "'%s' has no replication agreement for '%s'" % (replica1, replica2)
- return
+ return False
except Exception, e:
- print "Failed to get data from '%s': %s" % (replica1, str(e))
- return
+ print "Failed to determine agreement type for '%s': %s" % (replica1, convert_error(e))
+ return False
if type1 == replication.IPA_REPLICA:
try:
@@ -219,36 +244,41 @@ def del_link(realm, replica1, replica2, dirman_passwd, force=False):
if not force and len(repl_list) <= 1:
print "Cannot remove the last replication link of '%s'" % replica2
print "Please use the 'del' command to remove it from the domain"
- return
+ return False
except (ldap.NO_SUCH_OBJECT, errors.NotFound):
print "'%s' has no replication agreement for '%s'" % (replica2, replica1)
if not force:
- return
+ return False
except Exception, e:
- print "Failed to get data from '%s': %s" % (replica2, str(e))
+ print "Failed to get list of agreements from '%s': %s" % (replica2, convert_error(e))
if not force:
- return
+ return False
if repl2 and type1 == replication.IPA_REPLICA:
failed = False
try:
+ repl2.set_readonly(readonly=True)
+ repl2.force_sync(repl2.conn, replica1)
+ cn, dn = repl2.agreement_dn(repl1.conn.host)
+ repl2.wait_for_repl_update(repl2.conn, dn, 30)
repl2.delete_agreement(replica1)
repl2.delete_referral(replica1)
+ repl2.set_readonly(readonly=False)
except ldap.LDAPError, e:
desc = e.args[0]['desc'].strip()
info = e.args[0].get('info', '').strip()
print "Unable to remove agreement on %s: %s: %s" % (replica2, desc, info)
failed = True
except Exception, e:
- print "Unable to remove agreement on %s: %s" % (replica2, str(e))
+ print "Unable to remove agreement on %s: %s" % (replica2, convert_error(e))
failed = True
if failed:
if force:
print "Forcing removal on '%s'" % replica1
else:
- return
+ return False
if not repl2 and force:
print "Forcing removal on '%s'" % replica1
@@ -268,10 +298,171 @@ def del_link(realm, replica1, replica2, dirman_passwd, force=False):
for dn in dns:
repl1.conn.deleteEntry(dn)
except Exception, e:
- print "Error deleting winsync replica shared info: %s" % str(e)
+ print "Error deleting winsync replica shared info: %s" % convert_error(e)
print "Deleted replication agreement from '%s' to '%s'" % (replica1, replica2)
+ return True
+
+def get_ruv(realm, host, dirman_passwd):
+ """
+ Return the RUV entries as a list of tuples: (hostname, rid)
+ """
+ try:
+ thisrepl = replication.ReplicationManager(realm, host, dirman_passwd)
+ except Exception, e:
+ print "Failed to connect to server %s: %s" % (host, convert_error(e))
+ sys.exit(1)
+
+ search_filter = '(&(nsuniqueid=ffffffff-ffffffff-ffffffff-ffffffff)(objectclass=nstombstone))'
+ try:
+ entries = thisrepl.conn.search_s(api.env.basedn, ldap.SCOPE_ONELEVEL,
+ search_filter, ['nsds50ruv'])
+ except ldap.NO_SUCH_OBJECT:
+ print "No RUV records found."
+ sys.exit(0)
+
+ servers = []
+ for ruv in entries[0][1]['nsds50ruv']:
+ if ruv.startswith('{replicageneration'):
+ continue
+ data = re.match('\{replica (\d+) (ldap://.*:\d+)\}(\s+\w+\s+\w*){0,1}', ruv)
+ if data:
+ rid = data.group(1)
+ (scheme, netloc, path, params, query, fragment) = urlparse.urlparse(data.group(2))
+ servers.append((netloc, rid))
+ else:
+ print "unable to decode: %s" % ruv
+
+ return servers
+
+def list_ruv(realm, host, dirman_passwd, verbose):
+ """
+ List the Replica Update Vectors on this host to get the available
+ replica IDs.
+ """
+ servers = get_ruv(realm, host, dirman_passwd)
+ for (netloc, rid) in servers:
+ print "%s: %s" % (netloc, rid)
+
+def get_rid_by_host(realm, sourcehost, host, dirman_passwd):
+ """
+ Try to determine the RID by host name.
+ """
+ servers = get_ruv(realm, sourcehost, dirman_passwd)
+ for (netloc, rid) in servers:
+ if '%s:389' % host == netloc:
+ return int(rid)
+
+def clean_ruv(realm, ruv, options):
+ """
+ Given an RID create a CLEANALLRUV task to clean it up.
+ """
+ try:
+ ruv = int(ruv)
+ except ValueError:
+ sys.exit("Replica ID must be an integer: %s" % ruv)
+
+ servers = get_ruv(realm, options.host, options.dirman_passwd)
+ found = False
+ for (netloc, rid) in servers:
+ if ruv == int(rid):
+ found = True
+ hostname = netloc
+ break
+
+ if not found:
+ sys.exit("Replica ID %s not found" % ruv)
+
+ print "Clean the Replication Update Vector for %s" % hostname
+ print
+ print "Cleaning the wrong replica ID will cause that server to no"
+ print "longer replicate so it may miss updates while the process"
+ print "is running. It would need to be re-initialized to maintain"
+ print "consistency. Be very careful."
+ if not options.force and not ipautil.user_input("Continue to clean?", False):
+ sys.exit("Aborted")
+ thisrepl = replication.ReplicationManager(realm, options.host,
+ options.dirman_passwd)
+ thisrepl.cleanallruv(ruv)
+ print "Cleanup task created"
+
+def abort_clean_ruv(realm, ruv, options):
+ """
+ Given an RID abort a CLEANALLRUV task.
+ """
+ try:
+ ruv = int(ruv)
+ except ValueError:
+ sys.exit("Replica ID must be an integer: %s" % ruv)
+
+ servers = get_ruv(realm, options.host, options.dirman_passwd)
+ found = False
+ for (netloc, rid) in servers:
+ if ruv == int(rid):
+ found = True
+ hostname = netloc
+ break
+
+ if not found:
+ sys.exit("Replica ID %s not found" % ruv)
+
+ servers = get_ruv(realm, options.host, options.dirman_passwd)
+ found = False
+ for (netloc, rid) in servers:
+ if ruv == int(rid):
+ found = True
+ hostname = netloc
+ break
+
+ if not found:
+ sys.exit("Replica ID %s not found" % ruv)
+
+ print "Aborting the clean Replication Update Vector task for %s" % hostname
+ print
+ thisrepl = replication.ReplicationManager(realm, options.host,
+ options.dirman_passwd)
+ thisrepl.abortcleanallruv(ruv)
+
+ print "Cleanup task stopped"
+
+def list_clean_ruv(realm, host, dirman_passwd, verbose):
+ """
+ List all clean RUV tasks.
+ """
+ repl = replication.ReplicationManager(realm, host, dirman_passwd)
+ dn = DN(('cn', 'cleanallruv'),('cn', 'tasks'), ('cn', 'config'))
+ try:
+ entries = repl.conn.getList(dn, ldap.SCOPE_ONELEVEL)
+ except errors.NotFound:
+ print "No CLEANALLRUV tasks running"
+ else:
+ print "CLEANALLRUV tasks"
+ for entry in entries:
+ name = entry.getValue('cn').replace('clean ', '')
+ status = entry.getValue('nsTaskStatus')
+ print "RID %s: %s" % (name, status)
+ if verbose:
+ print str(dn)
+ print entry.getValue('nstasklog')
+
+ print
+
+ dn = DN(('cn', 'abort cleanallruv'),('cn', 'tasks'), ('cn', 'config'))
+ try:
+ entries = repl.conn.getList(dn, ldap.SCOPE_ONELEVEL)
+ except errors.NotFound:
+ print "No abort CLEANALLRUV tasks running"
+ else:
+ print "Abort CLEANALLRUV tasks"
+ for entry in entries:
+ name = entry.getValue('cn').replace('abort ', '')
+ status = entry.getValue('nsTaskStatus')
+ print "RID %s: %s" % (name, status)
+ if verbose:
+ print str(dn)
+ print entry.getValue('nstasklog')
+
def del_master(realm, hostname, options):
force_del = False
@@ -281,7 +472,7 @@ def del_master(realm, hostname, options):
thisrepl = replication.ReplicationManager(realm, options.host,
options.dirman_passwd)
except Exception, e:
- print "Failed to connect to server %s: %s" % (options.host, str(e))
+ print "Failed to connect to server %s: %s" % (options.host, convert_error(e))
sys.exit(1)
# 2. Ensure we have an agreement with the master
@@ -297,7 +488,7 @@ def del_master(realm, hostname, options):
delrepl = replication.ReplicationManager(realm, hostname, options.dirman_passwd)
except Exception, e:
if not options.force:
- print "Unable to delete replica %s: %s" % (hostname, str(e))
+ print "Unable to delete replica %s: %s" % (hostname, convert_error(e))
sys.exit(1)
else:
print "Unable to connect to replica %s, forcing removal" % hostname
@@ -325,21 +516,35 @@ def del_master(realm, hostname, options):
if not ipautil.user_input("Continue to delete?", False):
sys.exit("Deletion aborted")
+ # Save the RID value before we start deleting
+ if repltype == replication.IPA_REPLICA:
+ rid = get_rid_by_host(realm, options.host, hostname, options.dirman_passwd)
+
# 4. Remove each agreement
+
+ print "Deleting replication agreements between %s and %s" % (hostname, ', '.join(replica_names))
for r in replica_names:
try:
- del_link(realm, r, hostname, options.dirman_passwd, force=True)
+ if not del_link(realm, r, hostname, options.dirman_passwd, force=True):
+ print "Unable to remove replication agreement for %s from %s." % (hostname, r)
except Exception, e:
- print "There were issues removing a connection: %s" % str(e)
+ print "There were issues removing a connection: %s" % convert_error(e)
+
+ # 5. Clean RUV for the deleted master
+ if repltype == replication.IPA_REPLICA:
+ try:
+ thisrepl.cleanallruv(rid)
+ except KeyboardInterrupt:
+ print "Wait for task interrupted. It will continue to run in the background"
- # 5. Finally clean up the removed replica common entries.
+ # 6. Finally clean up the removed replica common entries.
try:
thisrepl.replica_cleanup(hostname, realm, force=True)
except Exception, e:
- print "Failed to cleanup %s entries: %s" % (hostname, str(e))
+ print "Failed to cleanup %s entries: %s" % (hostname, convert_error(e))
print "You may need to manually remove them from the tree"
- # 6. And clean up the removed replica DNS entries if any.
+ # 7. And clean up the removed replica DNS entries if any.
try:
if bindinstance.dns_container_exists(options.host, thisrepl.suffix,
dm_password=options.dirman_passwd):
@@ -352,7 +557,7 @@ def del_master(realm, hostname, options):
bind = bindinstance.BindInstance()
bind.remove_master_dns_records(hostname, realm, realm.lower())
except Exception, e:
- print "Failed to cleanup %s DNS entries: %s" % (hostname, str(e))
+ print "Failed to cleanup %s DNS entries: %s" % (hostname, convert_error(e))
print "You may need to manually remove them from the tree"
def add_link(realm, replica1, replica2, dirman_passwd, options):
@@ -391,12 +596,11 @@ def add_link(realm, replica1, replica2, dirman_passwd, options):
# the directory server and kill the connection
try:
repl1 = replication.ReplicationManager(realm, replica1, dirman_passwd)
-
except (ldap.NO_SUCH_OBJECT, errors.NotFound):
print "Cannot find replica '%s'" % replica1
return
except Exception, e:
- print "Failed to get data from '%s': %s" % (replica1, str(e))
+ print "Failed to connect to '%s': %s" % (replica1, convert_error(e))
return
if options.winsync:
@@ -513,6 +717,8 @@ def main():
if len(args) == 2:
replica = args[1]
list_replicas(realm, host, replica, dirman_passwd, options.verbose)
+ elif args[0] == "list-ruv":
+ list_ruv(realm, host, dirman_passwd, options.verbose)
elif args[0] == "del":
del_master(realm, args[1], options)
elif args[0] == "re-initialize":
@@ -541,6 +747,12 @@ def main():
replica1 = host
replica2 = args[1]
del_link(realm, replica1, replica2, dirman_passwd)
+ elif args[0] == "clean-ruv":
+ clean_ruv(realm, args[1], options)
+ elif args[0] == "abort-clean-ruv":
+ abort_clean_ruv(realm, args[1], options)
+ elif args[0] == "list-clean-ruv":
+ list_clean_ruv(realm, host, dirman_passwd, options.verbose)
try:
main()
diff --git a/install/tools/man/ipa-replica-manage.1 b/install/tools/man/ipa-replica-manage.1
index 98103ffdd416f11c44e147e6b4eb84c682da39e0..98d70c6fd09fc6267881a6bf64c30dbe8f0389e3 100644
--- a/install/tools/man/ipa-replica-manage.1
+++ b/install/tools/man/ipa-replica-manage.1
@@ -42,11 +42,29 @@ Manages the replication agreements of an IPA server.
\fBforce\-sync\fR
\- Immediately flush any data to be replicated from a server specified with the \-\-from option
.TP
+\fBlist\-ruv\fR
+\- List the replication IDs on this server.
+.TP
+\fBclean\-ruv\fR [REPLICATION_ID]
+\- Run the CLEANALLRUV task to remove a replication ID.
+.TP
+\fBabort\-clean\-ruv\fR [REPLICATION_ID]
+\- Abort a running CLEANALLRUV task.
+.TP
+\fBlist\-clean\-ruv\fR
+\- List all running CLEANALLRUV and abort CLEANALLRUV tasks.
+.TP
The connect and disconnect options are used to manage the replication topology. When a replica is created it is only connected with the master that created it. The connect option may be used to connect it to other existing replicas.
.TP
The disconnect option cannot be used to remove the last link of a replica. To remove a replica from the topology use the del option.
.TP
If a replica is deleted and then re\-added within a short time\-frame then the 389\-ds instance on the master that created it should be restarted before re\-installing the replica. The master will have the old service principals cached which will cause replication to fail.
+.TP
+Each IPA master server has a unique replication ID. This ID is used by 389\-ds\-base when storing information about replication status. The output consists of the masters and their respective replication ID. See \fBclean\-ruv\fR
+.TP
+When a master is removed, all other masters need to remove its replication ID from the list of masters. Normally this occurs automatically when a master is deleted with ipa\-replica\-manage. If one or more masters was down or unreachable when ipa\-replica\-manage was executed then this replica ID may still exist. The clean\-ruv command may be used to clean up an unused replication ID.
+.TP
+\fBNOTE\fR: clean\-ruv is \fBVERY DANGEROUS\fR. Execution against the wrong replication ID can result in inconsistent data on that master. The master should be re\-initialized from another if this happens.
.SH "OPTIONS"
.TP
\fB\-H\fR \fIHOST\fR, \fB\-\-host\fR=\fIHOST\fR
@@ -112,6 +130,11 @@ Completely remove a replica:
# ipa\-replica\-manage del srv4.example.com
.TP
Using connect/disconnect you can manage the replication topology.
+.TP
+List the replication IDs in use:
+ # ipa\-replica\-manage list\-ruv
+ srv1.example.com:389: 7
+ srv2.example.com:389: 4
.SH "WINSYNC"
Creating a Windows AD Synchronization agreement is similar to creating an IPA replication agreement, there are just a couple of extra steps.
diff --git a/install/updates/40-replication.update b/install/updates/40-replication.update
new file mode 100644
index 0000000000000000000000000000000000000000..f9e0496be336ec7653e6b1688ad28245014ce6a0
--- /dev/null
+++ b/install/updates/40-replication.update
@@ -0,0 +1,4 @@
+# Let a delegated user put the database into read-only mode when deleting
+# an agreement.
+dn: cn=userRoot,cn=ldbm database,cn=plugins,cn=config
+add:aci: '(targetattr=nsslapd-readonly)(version 3.0; acl "Allow marking the database readonly"; allow (write) groupdn = "ldap:///cn=Remove Replication Agreements,cn=permissions,cn=pbac,$SUFFIX";)'
diff --git a/install/updates/Makefile.am b/install/updates/Makefile.am
index bc7945d7a5cd77469f7fe7175ebd9da66b9119d1..434fe5bae085c0d2911a5a7592fe9b26f4bbb418 100644
--- a/install/updates/Makefile.am
+++ b/install/updates/Makefile.am
@@ -25,6 +25,7 @@ app_DATA = \
21-ca_renewal_container.update \
30-s4u2proxy.update \
40-delegation.update \
+ 40-replication.update \
40-dns.update \
40-automember.update \
45-roles.update \
diff --git a/ipaserver/install/replication.py b/ipaserver/install/replication.py
index f015c4efd25d05d395f049c3bb91a5553d3ed2c8..52cc1d52ec1ea787a7b918222138c6f50ccbfafe 100644
--- a/ipaserver/install/replication.py
+++ b/ipaserver/install/replication.py
@@ -1103,3 +1103,71 @@ class ReplicationManager(object):
if err:
raise err #pylint: disable=E0702
+
+ def set_readonly(self, readonly, critical=False):
+ """
+ Set the database readonly status.
+
+ @readonly: boolean for read-only status
+ @critical: boolean to raise an exception on failure, default False.
+ """
+ dn = DN(('cn', 'userRoot'), ('cn', 'ldbm database'),
+ ('cn', 'plugins'), ('cn', 'config'))
+
+ mod = [(ldap.MOD_REPLACE, 'nsslapd-readonly', 'on' if readonly else 'off')]
+ try:
+ self.conn.modify_s(dn, mod)
+ except ldap.INSUFFICIENT_ACCESS, e:
+ # We can't modify the read-only status on the remote server.
+ # This usually isn't a show-stopper.
+ if critical:
+ raise e
+ root_logger.debug("No permission to modify replica read-only status, continuing anyway")
+
+ def cleanallruv(self, replicaId):
+ """
+ Create a CLEANALLRUV task and monitor it until it has
+ completed.
+ """
+ root_logger.debug("Creating CLEANALLRUV task for replica id %d" % replicaId)
+
+ dn = DN(('cn', 'clean %d' % replicaId), ('cn', 'cleanallruv'),('cn', 'tasks'), ('cn', 'config'))
+ e = ipaldap.Entry(dn)
+ e.setValues('objectclass', ['top', 'extensibleObject'])
+ e.setValue('replica-base-dn', api.env.basedn)
+ e.setValue('replica-id', replicaId)
+ e.setValue('cn', 'clean %d' % replicaId)
+ try:
+ self.conn.addEntry(e)
+ except errors.DuplicateEntry:
+ print "CLEANALLRUV task for replica id %d already exists." % replicaId
+ else:
+ print "Background task created to clean replication data. This may take a while."
+
+ print "This may be safely interrupted with Ctrl+C"
+
+ self.conn.checkTask(dn, dowait=True)
+
+ def abortcleanallruv(self, replicaId):
+ """
+ Create a task to abort a CLEANALLRUV operation.
+ """
+ root_logger.debug("Creating task to abort a CLEANALLRUV operation for replica id %d" % replicaId)
+
+ dn = DN(('cn', 'abort %d' % replicaId), ('cn', 'abort cleanallruv'),('cn', 'tasks'), ('cn', 'config'))
+ e = ipaldap.Entry(dn)
+ e.setValues('objectclass', ['top', 'extensibleObject'])
+ e.setValue('replica-base-dn', api.env.basedn)
+ e.setValue('replica-id', replicaId)
+ e.setValue('cn', 'abort %d' % replicaId)
+ try:
+ self.conn.addEntry(e)
+ except errors.DuplicateEntry:
+ print "An abort CLEANALLRUV task for replica id %d already exists." % replicaId
+ else:
+ print "Background task created. This may take a while."
+
+ print "This may be safely interrupted with Ctrl+C"
+
+ self.conn.checkTask(dn, dowait=True)
+
--
1.7.11.4
_______________________________________________
Freeipa-devel mailing list
Freeipa-devel@redhat.com
https://www.redhat.com/mailman/listinfo/freeipa-devel