Re: [Freeipa-devel] [PATCH] 1031 run cleanallruv task

Rob Crittenden Mon, 17 Sep 2012 07:05:21 -0700

Martin Kosek wrote:

On 09/14/2012 09:17 PM, Rob Crittenden wrote:

Martin Kosek wrote:

On 09/06/2012 11:17 PM, Rob Crittenden wrote:

Martin Kosek wrote:

On 09/06/2012 05:55 PM, Rob Crittenden wrote:

Rob Crittenden wrote:

Rob Crittenden wrote:

Martin Kosek wrote:

On 09/05/2012 08:06 PM, Rob Crittenden wrote:

Rob Crittenden wrote:

Martin Kosek wrote:

On 07/05/2012 08:39 PM, Rob Crittenden wrote:

Martin Kosek wrote:

On 07/03/2012 04:41 PM, Rob Crittenden wrote:

Deleting a replica can leave a replication vector (RUV) on the
other servers.
This can confuse things if the replica is re-added, and it also
causes the
server to calculate changes against a server that may no longer
exist.


389-ds-base provides a new task that self-propogates itself to all
available
replicas to clean this RUV data.

This patch will create this task at deletion time to hopefully
clean things up.

It isn't perfect. If any replica is down or unavailable at the
time
the
cleanruv task fires, and then comes back up, the old RUV data
may be
re-propogated around.

To make things easier in this case I've added two new commands to
ipa-replica-manage. The first lists the replication ids of all the
servers we
have a RUV for. Using this you can call clean_ruv with the
replication id of a
server that no longer exists to try the cleanallruv step again.

This is quite dangerous though. If you run cleanruv against a
replica id that
does exist it can cause a loss of data. I believe I've put in
enough scary
warnings about this.

rob


Good work there, this should make cleaning RUVs much easier than
with the
previous version.

This is what I found during review:

1) list_ruv and clean_ruv command help in man is quite lost. I
think
it would
help if we for example have all info for commands indented. This
way
user could
simply over-look the new commands in the man page.


2) I would rename new commands to clean-ruv and list-ruv to make
them
consistent with the rest of the commands (re-initialize,
force-sync).


3) It would be nice to be able to run clean_ruv command in an
unattended way
(for better testing), i.e. respect --force option as we already
do for
ipa-replica-manage del. This fix would aid test automation in the
future.


4) (minor) The new question (and the del too) does not react too
well for
CTRL+D:

# ipa-replica-manage clean_ruv 3 --force
Clean the Replication Update Vector for
vm-055.idm.lab.bos.redhat.com:389

Cleaning the wrong replica ID will cause that server to no
longer replicate so it may miss updates while the process
is running. It would need to be re-initialized to maintain
consistency. Be very careful.
Continue to clean? [no]: unexpected error:


5) Help for clean_ruv command without a required parameter is quite
confusing
as it reports that command is wrong and not the parameter:

# ipa-replica-manage clean_ruv
Usage: ipa-replica-manage [options]

ipa-replica-manage: error: must provide a command [clean_ruv |
force-sync |
disconnect | connect | del | re-initialize | list | list_ruv]

It seems you just forgot to specify the error message in the
command
definition


6) When the remote replica is down, the clean_ruv command fails
with an
unexpected error:

[root@vm-086 ~]# ipa-replica-manage clean_ruv 5
Clean the Replication Update Vector for
vm-055.idm.lab.bos.redhat.com:389

Cleaning the wrong replica ID will cause that server to no
longer replicate so it may miss updates while the process
is running. It would need to be re-initialized to maintain
consistency. Be very careful.
Continue to clean? [no]: y
unexpected error: {'desc': 'Operations error'}


/var/log/dirsrv/slapd-IDM-LAB-BOS-REDHAT-COM/errors:
[04/Jul/2012:06:28:16 -0400] NSMMReplicationPlugin -
cleanAllRUV_task: failed
to connect to repl        agreement connection
(cn=meTovm-055.idm.lab.bos.redhat.com,cn=replica,

cn=dc\3Didm\2Cdc\3Dlab\2Cdc\3Dbos\2Cdc\3Dredhat\2Cdc\3Dcom,cn=mapping


tree,cn=config), error 105
[04/Jul/2012:06:28:16 -0400] NSMMReplicationPlugin -
cleanAllRUV_task: replica
(cn=meTovm-055.idm.lab.
bos.redhat.com,cn=replica,cn=dc\3Didm\2Cdc\3Dlab\2Cdc\3Dbos\2Cdc\3Dredhat\2Cdc\3Dcom,cn=mapping








tree,   cn=config) has not been cleaned.  You will need to rerun
the
CLEANALLRUV task on this replica.
[04/Jul/2012:06:28:16 -0400] NSMMReplicationPlugin -
cleanAllRUV_task: Task
failed (1)

In this case I think we should inform user that the command failed,
possibly
because of disconnected replicas and that they could enable the
replicas and
try again.


7) (minor) "pass" is now redundant in replication.py:
+        except ldap.INSUFFICIENT_ACCESS:
+            # We can't make the server we're removing read-only
but
+            # this isn't a show-stopper
+            root_logger.debug("No permission to switch replica to
read-only,
continuing anyway")
+            pass


I think this addresses everything.

rob


Thanks, almost there! I just found one more issue which needs to be
fixed
before we push:

# ipa-replica-manage del vm-055.idm.lab.bos.redhat.com --force
Directory Manager password:

Unable to connect to replica vm-055.idm.lab.bos.redhat.com, forcing
removal
Failed to get data from 'vm-055.idm.lab.bos.redhat.com': {'desc':
"Can't
contact LDAP server"}
Forcing removal on 'vm-086.idm.lab.bos.redhat.com'

There were issues removing a connection: %d format: a number is
required, not str

Failed to get data from 'vm-055.idm.lab.bos.redhat.com': {'desc':
"Can't
contact LDAP server"}

This is a traceback I retrieved:
Traceback (most recent call last):
       File "/sbin/ipa-replica-manage", line 425, in del_master
         del_link(realm, r, hostname, options.dirman_passwd, force=True)
       File "/sbin/ipa-replica-manage", line 271, in del_link
         repl1.cleanallruv(replica_id)
       File
"/usr/lib/python2.7/site-packages/ipaserver/install/replication.py",
line 1094, in cleanallruv
         root_logger.debug("Creating CLEANALLRUV task for replica id
%d" %
replicaId)


The problem here is that you don't convert replica_id to int in this
part:
+    replica_id = None
+    if repl2:
+        replica_id = repl2._get_replica_id(repl2.conn, None)
+    else:
+        servers = get_ruv(realm, replica1, dirman_passwd)
+        for (netloc, rid) in servers:
+            if netloc.startswith(replica2):
+                replica_id = rid
+                break

Martin


Updated patch using new mechanism in 389-ds-base. This should more
thoroughly clean out RUV data when a replica is being deleted, and
provide for a way to delete RUV data afterwards too if necessary.

rob


Rebased patch

rob


0) As I wrote in a review for your patch 1041, changelog entry slipped
elsewhere.

1) The following KeyboardInterrupt except class looks suspicious. I
know why
you have it there, but since it is generally a bad thing to do, some
comment
why it is needed would be useful.

@@ -256,6 +263,17 @@ def del_link(realm, replica1, replica2,
dirman_passwd,
force=False):
         repl1.delete_agreement(replica2)
         repl1.delete_referral(replica2)

+    if type1 == replication.IPA_REPLICA:
+        if repl2:
+            ruv = repl2._get_replica_id(repl2.conn, None)
+        else:
+            ruv = get_ruv_by_host(realm, replica1, replica2,
dirman_passwd)
+
+        try:
+            repl1.cleanallruv(ruv)
+        except KeyboardInterrupt:
+            pass
+

Maybe you just wanted to do some cleanup and then "raise" again?


No, it is there because it is safe to break out of it. The task will
continue to run. I added some verbiage.


2) This is related to 1), but when some replica is down,
"ipa-replica-manage
del" may wait indefinitely when some remote replica is down, right?

# ipa-replica-manage del vm-055.idm.lab.bos.redhat.com
Deleting a master is irreversible.
To reconnect to the remote master you will need to prepare a new
replica file
and re-install.
Continue to delete? [no]: y
ipa: INFO: Setting agreement
cn=meTovm-086.idm.lab.bos.redhat.com,cn=replica,cn=dc\=idm\,dc\=lab\,dc\=bos\,dc\=redhat\,dc\=com,cn=mapping





tree,cn=config schedule to 2358-2359 0 to force synch
ipa: INFO: Deleting schedule 2358-2359 0 from agreement
cn=meTovm-086.idm.lab.bos.redhat.com,cn=replica,cn=dc\=idm\,dc\=lab\,dc\=bos\,dc\=redhat\,dc\=com,cn=mapping





tree,cn=config
ipa: INFO: Replication Update in progress: FALSE: status: 0 Replica
acquired
successfully: Incremental update succeeded: start: 0: end: 0
Background task created to clean replication data

... after about a minute I hit CTRL+C

^CDeleted replication agreement from 'vm-086.idm.lab.bos.redhat.com' to
'vm-055.idm.lab.bos.redhat.com'
Failed to cleanup vm-055.idm.lab.bos.redhat.com DNS entries: NS record
does not
contain 'vm-055.idm.lab.bos.redhat.com.'
You may need to manually remove them from the tree

I think it would be better to inform user that some remote replica is
down or
at least that we are waiting for the task to complete. Something like
that:

# ipa-replica-manage del vm-055.idm.lab.bos.redhat.com
...
Background task created to clean replication data
Replication data clean up may take very long time if some replica is
unreachable
Hit CTRL+C to interrupt the wait
^C Clean up wait interrupted
....
[continue with del]


Yup, did this in #1.


3) (minor) When there is a cleanruv task running and you run
"ipa-replica-manage del", there is a unexpected error message with
duplicate
task object in LDAP:

# ipa-replica-manage del vm-072.idm.lab.bos.redhat.com --force
Unable to connect to replica vm-072.idm.lab.bos.redhat.com, forcing
removal
FAIL
Failed to get data from 'vm-072.idm.lab.bos.redhat.com': {'desc': "Can't
contact LDAP server"}
Forcing removal on 'vm-086.idm.lab.bos.redhat.com'

There were issues removing a connection: This entry already exists
<<<<<<<<<

Failed to get data from 'vm-072.idm.lab.bos.redhat.com': {'desc': "Can't
contact LDAP server"}
Failed to cleanup vm-072.idm.lab.bos.redhat.com DNS entries: NS record
does not
contain 'vm-072.idm.lab.bos.redhat.com.'
You may need to manually remove them from the tree


I think it should be enough to just catch for "entry already exists" in
cleanallruv function, and in such case print a relevant error message
bail out.
Thus, self.conn.checkTask(dn, dowait=True) would not be called too.


Good catch, fixed.



4) (minor): In make_readonly function, there is a redundant "pass"
statement:

+    def make_readonly(self):
+        """
+        Make the current replication agreement read-only.
+        """
+        dn = DN(('cn', 'userRoot'), ('cn', 'ldbm database'),
+                ('cn', 'plugins'), ('cn', 'config'))
+
+        mod = [(ldap.MOD_REPLACE, 'nsslapd-readonly', 'on')]
+        try:
+            self.conn.modify_s(dn, mod)
+        except ldap.INSUFFICIENT_ACCESS:
+            # We can't make the server we're removing read-only but
+            # this isn't a show-stopper
+            root_logger.debug("No permission to switch replica to
read-only,
continuing anyway")
+            pass         <<<<<<<<<<<<<<<


Yeah, this is one of my common mistakes. I put in a pass initially, then
add logging in front of it and forget to delete the pass. Its gone now.



5) In clean_ruv, I think allowing a --force option to bypass the
user_input
would be helpful (at least for test automation):

+    if not ipautil.user_input("Continue to clean?", False):
+        sys.exit("Aborted")


Yup, added.

rob


Slightly revised patch. I still had a window open with one unsaved change.

rob


Apparently there were two unsaved changes, one of which was lost. This
adds in
the 'entry already exists' fix.

rob


Just one last thing (otherwise the patch is OK) - I don't think this is
what we
want :-)

# ipa-replica-manage clean-ruv 8
Clean the Replication Update Vector for vm-055.idm.lab.bos.redhat.com:389

Cleaning the wrong replica ID will cause that server to no
longer replicate so it may miss updates while the process
is running. It would need to be re-initialized to maintain
consistency. Be very careful.
Continue to clean? [no]: y   <<<<<<
Aborted


Nor this exception, (your are checking for wrong exception):

# ipa-replica-manage clean-ruv 8
Clean the Replication Update Vector for vm-055.idm.lab.bos.redhat.com:389

Cleaning the wrong replica ID will cause that server to no
longer replicate so it may miss updates while the process
is running. It would need to be re-initialized to maintain
consistency. Be very careful.
Continue to clean? [no]:
unexpected error: This entry already exists

This is the exception:

Traceback (most recent call last):
     File "/sbin/ipa-replica-manage", line 651, in <module>
       main()
     File "/sbin/ipa-replica-manage", line 648, in main
       clean_ruv(realm, args[1], options)
     File "/sbin/ipa-replica-manage", line 373, in clean_ruv
       thisrepl.cleanallruv(ruv)
     File "/usr/lib/python2.7/site-packages/ipaserver/install/replication.py",
line 1136, in cleanallruv
       self.conn.addEntry(e)
     File "/usr/lib/python2.7/site-packages/ipaserver/ipaldap.py", line 503, in
addEntry
       self.__handle_errors(e, arg_desc=arg_desc)
     File "/usr/lib/python2.7/site-packages/ipaserver/ipaldap.py", line 321, in
__handle_errors
       raise errors.DuplicateEntry()
ipalib.errors.DuplicateEntry: This entry already exists

Martin


Fixed that and a couple of other problems. When doing a disconnect we should
not also call clean-ruv.


Ah, good self-catch.


I also got tired of seeing crappy error messages so I added a little convert
utility.

rob


1) There is CLEANALLRUV stuff included in 1050-3 and not here. There are also
some finding for this new code.


2) We may want to bump Requires to higher version of 389-ds-base
(389-ds-base-1.2.11.14-1) - it contains a fix for CLEANALLRUV+winsync bug I
found earlier.


3) I just discovered another suspicious behavior. When we are deleting a master
that has links also to other master(s) we delete those too. But we also
automatically run CLEANALLRUV in these cases, so we may end up in multiple
tasks being started on different masters - this does not look right.

I think we may rather want to at first delete all links and then run
CLEANALLRUV task, just for one time. This is what I get with current code:

# ipa-replica-manage del vm-072.idm.lab.bos.redhat.com
Directory Manager password:

Deleting a master is irreversible.
To reconnect to the remote master you will need to prepare a new replica file
and re-install.
Continue to delete? [no]: yes
ipa: INFO: Setting agreement
cn=meTovm-055.idm.lab.bos.redhat.com,cn=replica,cn=dc\=idm\,dc\=lab\,dc\=bos\,dc\=redhat\,dc\=com,cn=mapping

tree,cn=config schedule to 2358-2359 0 to force synch
ipa: INFO: Deleting schedule 2358-2359 0 from agreement
cn=meTovm-055.idm.lab.bos.redhat.com,cn=replica,cn=dc\=idm\,dc\=lab\,dc\=bos\,dc\=redhat\,dc\=com,cn=mapping

tree,cn=config
ipa: INFO: Replication Update in progress: FALSE: status: 0 Replica acquired
successfully: Incremental update succeeded: start: 0: end: 0
Background task created to clean replication data. This may take a while.
This may be safely interrupted with Ctrl+C

^CWait for task interrupted. It will continue to run in the background

Deleted replication agreement from 'vm-055.idm.lab.bos.redhat.com' to
'vm-072.idm.lab.bos.redhat.com'
ipa: INFO: Setting agreement
cn=meTovm-086.idm.lab.bos.redhat.com,cn=replica,cn=dc\=idm\,dc\=lab\,dc\=bos\,dc\=redhat\,dc\=com,cn=mapping

tree,cn=config schedule to 2358-2359 0 to force synch
ipa: INFO: Deleting schedule 2358-2359 0 from agreement
cn=meTovm-086.idm.lab.bos.redhat.com,cn=replica,cn=dc\=idm\,dc\=lab\,dc\=bos\,dc\=redhat\,dc\=com,cn=mapping

tree,cn=config
ipa: INFO: Replication Update in progress: FALSE: status: 0 Replica acquired
successfully: Incremental update succeeded: start: 0: end: 0
Background task created to clean replication data. This may take a while.
This may be safely interrupted with Ctrl+C

^CWait for task interrupted. It will continue to run in the background

Deleted replication agreement from 'vm-086.idm.lab.bos.redhat.com' to
'vm-072.idm.lab.bos.redhat.com'
Failed to cleanup vm-072.idm.lab.bos.redhat.com DNS entries: NS record does not
contain 'vm-072.idm.lab.bos.redhat.com.'
You may need to manually remove them from the tree

Martin


All issues addressed and I pulled in abort-clean-ruv from 1050. I added a
list-clean-ruv command as well.

rob


1) Patch 1031-9 needs to get squashed with 1031-8


2) Patch needs a rebase (conflict in freeipa.spec.in)


3) New list-clean-ruv man entry is not right:

        list-clean-ruv [REPLICATION_ID]
               - List all running CLEANALLRUV and abort CLEANALLRUV tasks.

REPLICATION_ID is not its argument.


Fixed 1-3.

Btw. new list-clean-ruv command proved very useful for me.

4) I just found out we need to do a better job with make_readonly() command. I
get into trouble when disconnecting one link to a remote replica as it was
marked readonly and then I was then unable to manage the disconnected replica
properly (vm-072 is the replica made readonly):

Ok, I reset read-only after we delete the agreements. That fixed thingsup for me. I disconnected a replica and was able to modify entries onthat replica afterwards.

This affected the --cleanup command too, it would otherwise havesucceeded I think.

I tested with an A - B - C - A agreement loop. I disconnected A and Cand confirmed I could still update entries on C. Then I deleted C, thenB, and made sure output looked right, I could still manage entries, etc.

rob


[root@vm-055 ~]# ipa-replica-manage disconnect vm-072.idm.lab.bos.redhat.com

[root@vm-072 ~]# ipa-replica-manage del vm-055.idm.lab.bos.redhat.com
Deleting a master is irreversible.
To reconnect to the remote master you will need to prepare a new replica file
and re-install.
Continue to delete? [no]: yes
Deleting replication agreements between vm-055.idm.lab.bos.redhat.com and
vm-072.idm.lab.bos.redhat.com
ipa: INFO: Setting agreement
cn=meTovm-072.idm.lab.bos.redhat.com,cn=replica,cn=dc\=idm\,dc\=lab\,dc\=bos\,dc\=redhat\,dc\=com,cn=mapping
tree,cn=config schedule to 2358-2359 0 to force synch
ipa: INFO: Deleting schedule 2358-2359 0 from agreement
cn=meTovm-072.idm.lab.bos.redhat.com,cn=replica,cn=dc\=idm\,dc\=lab\,dc\=bos\,dc\=redhat\,dc\=com,cn=mapping
tree,cn=config
ipa: INFO: Replication Update in progress: FALSE: status: 0 Replica acquired
successfully: Incremental update succeeded: start: 0: end: 0
Deleted replication agreement from 'vm-072.idm.lab.bos.redhat.com' to
'vm-055.idm.lab.bos.redhat.com'
Unable to remove replication agreement for vm-055.idm.lab.bos.redhat.com from
vm-072.idm.lab.bos.redhat.com.
Background task created to clean replication data. This may take a while.
This may be safely interrupted with Ctrl+C
^CWait for task interrupted. It will continue to run in the background

Failed to cleanup vm-055.idm.lab.bos.redhat.com entries: Server is unwilling to
perform: database is read-only arguments:
dn=krbprincipalname=ldap/vm-055.idm.lab.bos.redhat....@idm.lab.bos.redhat.com,cn=services,cn=accounts,dc=idm,dc=lab,dc=bos,dc=redhat,dc=com

You may need to manually remove them from the tree
ipa: INFO: Unhandled LDAPError: {'info': 'database is read-only', 'desc':
'Server is unwilling to perform'}

Failed to cleanup vm-055.idm.lab.bos.redhat.com DNS entries: Server is
unwilling to perform: database is read-only

You may need to manually remove them from the tree


--cleanup did not work for me as well:
[root@vm-072 ~]# ipa-replica-manage del vm-055.idm.lab.bos.redhat.com --force
--cleanup
Cleaning a master is irreversible.
This should not normally be require, so use cautiously.
Continue to clean master? [no]: yes
unexpected error: Server is unwilling to perform: database is read-only
arguments:
dn=krbprincipalname=ldap/vm-055.idm.lab.bos.redhat....@idm.lab.bos.redhat.com,cn=services,cn=accounts,dc=idm,dc=lab,dc=bos,dc=redhat,dc=com

Martin

>From a4f24aea067144ca8f5158579ce875fe06825a0f Mon Sep 17 00:00:00 2001
From: Rob Crittenden <rcrit...@redhat.com>
Date: Fri, 14 Sep 2012 15:03:12 -0400
Subject: [PATCH] When deleting a master, try to prevent orphaning other
 servers.

If you have a replication topology like A <-> B <-> C and you try
to delete server B that will leave A and C orphaned. It may also
prevent re-installation of a new master on B because the cn=masters
entry for it probably still exists on at least one of the other masters.

Check on each master that it connects to to ensure that it isn't the
last link, and fail if it is. If any of the masters are not up then
warn that this could be a bad thing but let the user continue if
they want.

Add a new option to the del command, --cleanup, which runs the
replica_cleanup() routine to completely clean up references to a master.

https://fedorahosted.org/freeipa/ticket/2797
---
 install/tools/ipa-replica-manage       | 85 +++++++++++++++++++++++++++++++++-
 install/tools/man/ipa-replica-manage.1 | 14 ++++++
 2 files changed, 98 insertions(+), 1 deletion(-)

diff --git a/install/tools/ipa-replica-manage b/install/tools/ipa-replica-manage
index dcd44f3c7d21cbf025fcce4bbc609c58b5a6e8f4..897d117681d3e1559d5710366101b50540b705c8 100755
--- a/install/tools/ipa-replica-manage
+++ b/install/tools/ipa-replica-manage
@@ -72,6 +72,8 @@ def parse_options():
                       help="provide additional information")
     parser.add_option("-f", "--force", dest="force", action="store_true", default=False,
                       help="ignore some types of errors")
+    parser.add_option("-c", "--cleanup", dest="cleanup", action="store_true", default=False,
+                      help="DANGER: clean up references to a ghost master")
     parser.add_option("--binddn", dest="binddn", default=None, type="dn",
                       help="Bind DN to use with remote server")
     parser.add_option("--bindpw", dest="bindpw", default=None,
@@ -463,9 +465,53 @@ def list_clean_ruv(realm, host, dirman_passwd, verbose):
                 print str(dn)
                 print entry.getValue('nstasklog')
 
+def check_last_link(delrepl, realm, dirman_passwd, force):
+    """
+    We don't want to orphan a server when deleting another one. If you have
+    a topology that looks like this:
+
+             A     B
+             |     |
+             |     |
+             |     |
+             C---- D
+
+    If we try to delete host D it will orphan host B.
+
+    What we need to do is if the master being deleted has only a single
+    agreement, connect to that master and make sure it has agreements with
+    more than just this master.
+
+    @delrepl: a ReplicationManager object of the master being deleted
+
+    returns: hostname of orphaned server or None
+    """
+    replica_names = delrepl.find_ipa_replication_agreements()
+
+    orphaned = []
+    # Connect to each remote server and see what agreements it has
+    for replica in replica_names:
+        try:
+            repl = replication.ReplicationManager(realm, replica, dirman_passwd)
+        except ldap.SERVER_DOWN, e:
+            print "Unable to validate that '%s' will not be orphaned." % replica
+
+            if not force and not ipautil.user_input("Continue to delete?", False):
+                sys.exit("Aborted")
+            continue
+        names = repl.find_ipa_replication_agreements()
+        if len(names) == 1 and names[0] == delrepl.hostname:
+            orphaned.append(replica)
+
+    if len(orphaned):
+        return ', '.join(orphaned)
+    else:
+        return None
+
 def del_master(realm, hostname, options):
 
     force_del = False
+    delrepl = None
 
     # 1. Connect to the local server
     try:
@@ -478,7 +524,21 @@ def del_master(realm, hostname, options):
     # 2. Ensure we have an agreement with the master
     agreement = thisrepl.get_replication_agreement(hostname)
     if agreement is None:
-        sys.exit("'%s' has no replication agreement for '%s'" % (options.host, hostname))
+        if options.cleanup:
+            """
+            We have no agreement with the current master, so this is a
+            candidate for cleanup. This is VERY dangerous to do because it
+            removes that master from the list of masters. If the master
+            were to try to come back online it wouldn't work at all.
+            """
+            print "Cleaning a master is irreversible."
+            print "This should not normally be require, so use cautiously."
+            if not ipautil.user_input("Continue to clean master?", False):
+                sys.exit("Cleanup aborted")
+            thisrepl.replica_cleanup(hostname, realm, force=True)
+            sys.exit(0)
+        else:
+            sys.exit("'%s' has no replication agreement for '%s'" % (options.host, hostname))
 
     # 3. If an IPA agreement connect to the master to be removed.
     repltype = thisrepl.get_agreement_type(hostname)
@@ -516,6 +576,29 @@ def del_master(realm, hostname, options):
         if not ipautil.user_input("Continue to delete?", False):
             sys.exit("Deletion aborted")
 
+    # Check for orphans if the remote server is up.
+    if delrepl and not winsync:
+        masters_dn = DN(('cn', 'masters'), ('cn', 'ipa'), ('cn', 'etc'), ipautil.realm_to_suffix(realm))
+        try:
+            masters = delrepl.conn.getList(masters_dn, ldap.SCOPE_ONELEVEL)
+        except Exception, e:
+            masters = []
+            print "Failed to read masters data from '%s': %s" % (delrepl.hostname, convert_error(e))
+            print "Skipping calculation to determine if one or more masters would be orphaned."
+            if not options.force:
+                sys.exit(1)
+
+        # This only applies if we have more than 2 IPA servers, otherwise
+        # there is no chance of an orphan.
+        if len(masters) > 2:
+            orphaned_server = check_last_link(delrepl, realm, options.dirman_passwd, options.force)
+            if orphaned_server is not None:
+                print "Deleting this server will orphan '%s'. " % orphaned_server
+                print "You will need to reconfigure your replication topology to delete this server."
+                sys.exit(1)
+    else:
+        print "Skipping calculation to determine if one or more masters would be orphaned."
+
     # Save the RID value before we start deleting
     if repltype == replication.IPA_REPLICA:
         rid = get_rid_by_host(realm, options.host, hostname, options.dirman_passwd)
diff --git a/install/tools/man/ipa-replica-manage.1 b/install/tools/man/ipa-replica-manage.1
index 98d70c6fd09fc6267881a6bf64c30dbe8f0389e3..b750f8fc9cdfcfaa3668468db4682a018a29794d 100644
--- a/install/tools/man/ipa-replica-manage.1
+++ b/install/tools/man/ipa-replica-manage.1
@@ -65,6 +65,14 @@ Each IPA master server has a unique replication ID. This ID is used by 389\-ds\-
 When a master is removed, all other masters need to remove its replication ID from the list of masters. Normally this occurs automatically when a master is deleted with ipa\-replica\-manage. If one or more masters was down or unreachable when ipa\-replica\-manage was executed then this replica ID may still exist. The clean\-ruv command may be used to clean up an unused replication ID.
 .TP
 \fBNOTE\fR: clean\-ruv is \fBVERY DANGEROUS\fR. Execution against the wrong replication ID can result in inconsistent data on that master. The master should be re\-initialized from another if this happens.
+.TP
+The replication topology is examined when a master is deleted and will attempt to prevent a master from being orphaned. For example, if your topology is A <\-> B <\-> C and you attempt to delete master B it will fail because that would leave masters and A and C orphaned.
+.TP
+The list of masters is stored in cn=masters,cn=ipa,cn=etc,dc=example,dc=com. This should be cleaned up automatically when a master is deleted. If it occurs that you have deleted the master and all the agreements but these entries still exist then you will not be able to re\-install IPA on it, the installation will fail with:
+.TP
+An IPA master host cannot be deleted or disabled using standard commands (host\-del, for example).
+.TP
+An orphaned master may be cleaned up using the del directive with the \-\-cleanup option. This will remove the entries from cn=masters,cn=ipa,cn=etc that otherwise prevent host\-del from working, its dna profile, s4u2proxy configuration, service principals and remove it from the default DUA profile defaultServerList.
 .SH "OPTIONS"
 .TP
 \fB\-H\fR \fIHOST\fR, \fB\-\-host\fR=\fIHOST\fR
@@ -81,6 +89,9 @@ Provide additional information
 \fB\-f\fR, \fB\-\-force\fR
 Ignore some types of errors, don't prompt when deleting a master
 .TP
+\fB\-c\fR, \fB\-\-cleanup\fR
+When deleting a master with the --force flag, remove leftover references to an already deleted master.
+.TP
 \fB\-\-binddn\fR=\fIADMIN_DN\fR
 Bind DN to use with remote server (default is cn=Directory Manager) \- Be careful to quote this value on the command line
 .TP
@@ -135,6 +146,9 @@ List the replication IDs in use:
  # ipa\-replica\-manage list\-ruv
  srv1.example.com:389: 7
  srv2.example.com:389: 4
+.TP
+Remove references to an orphaned and deleted master:
+ # ipa\-replica\-manage del \-\-force \-\-cleanup master.example.com
 .SH "WINSYNC"
 Creating a Windows AD Synchronization agreement is similar to creating an IPA replication agreement, there are just a couple of extra steps.
 
-- 
1.7.11.4

_______________________________________________
Freeipa-devel mailing list
Freeipa-devel@redhat.com
https://www.redhat.com/mailman/listinfo/freeipa-devel

Re: [Freeipa-devel] [PATCH] 1031 run cleanallruv task

Reply via email to