On 09/17/2012 04:04 PM, Rob Crittenden wrote: > Martin Kosek wrote: >> On 09/14/2012 09:17 PM, Rob Crittenden wrote: >>> Martin Kosek wrote: >>>> On 09/06/2012 11:17 PM, Rob Crittenden wrote: >>>>> Martin Kosek wrote: >>>>>> On 09/06/2012 05:55 PM, Rob Crittenden wrote: >>>>>>> Rob Crittenden wrote: >>>>>>>> Rob Crittenden wrote: >>>>>>>>> Martin Kosek wrote: >>>>>>>>>> On 09/05/2012 08:06 PM, Rob Crittenden wrote: >>>>>>>>>>> Rob Crittenden wrote: >>>>>>>>>>>> Martin Kosek wrote: >>>>>>>>>>>>> On 07/05/2012 08:39 PM, Rob Crittenden wrote: >>>>>>>>>>>>>> Martin Kosek wrote: >>>>>>>>>>>>>>> On 07/03/2012 04:41 PM, Rob Crittenden wrote: >>>>>>>>>>>>>>>> Deleting a replica can leave a replication vector (RUV) on the >>>>>>>>>>>>>>>> other servers. >>>>>>>>>>>>>>>> This can confuse things if the replica is re-added, and it also >>>>>>>>>>>>>>>> causes the >>>>>>>>>>>>>>>> server to calculate changes against a server that may no longer >>>>>>>>>>>>>>>> exist. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> 389-ds-base provides a new task that self-propogates itself to >>>>>>>>>>>>>>>> all >>>>>>>>>>>>>>>> available >>>>>>>>>>>>>>>> replicas to clean this RUV data. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> This patch will create this task at deletion time to hopefully >>>>>>>>>>>>>>>> clean things up. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> It isn't perfect. If any replica is down or unavailable at the >>>>>>>>>>>>>>>> time >>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>> cleanruv task fires, and then comes back up, the old RUV data >>>>>>>>>>>>>>>> may be >>>>>>>>>>>>>>>> re-propogated around. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> To make things easier in this case I've added two new commands >>>>>>>>>>>>>>>> to >>>>>>>>>>>>>>>> ipa-replica-manage. The first lists the replication ids of all >>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>> servers we >>>>>>>>>>>>>>>> have a RUV for. Using this you can call clean_ruv with the >>>>>>>>>>>>>>>> replication id of a >>>>>>>>>>>>>>>> server that no longer exists to try the cleanallruv step again. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> This is quite dangerous though. If you run cleanruv against a >>>>>>>>>>>>>>>> replica id that >>>>>>>>>>>>>>>> does exist it can cause a loss of data. I believe I've put in >>>>>>>>>>>>>>>> enough scary >>>>>>>>>>>>>>>> warnings about this. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> rob >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Good work there, this should make cleaning RUVs much easier than >>>>>>>>>>>>>>> with the >>>>>>>>>>>>>>> previous version. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> This is what I found during review: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> 1) list_ruv and clean_ruv command help in man is quite lost. I >>>>>>>>>>>>>>> think >>>>>>>>>>>>>>> it would >>>>>>>>>>>>>>> help if we for example have all info for commands indented. This >>>>>>>>>>>>>>> way >>>>>>>>>>>>>>> user could >>>>>>>>>>>>>>> simply over-look the new commands in the man page. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> 2) I would rename new commands to clean-ruv and list-ruv to make >>>>>>>>>>>>>>> them >>>>>>>>>>>>>>> consistent with the rest of the commands (re-initialize, >>>>>>>>>>>>>>> force-sync). >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> 3) It would be nice to be able to run clean_ruv command in an >>>>>>>>>>>>>>> unattended way >>>>>>>>>>>>>>> (for better testing), i.e. respect --force option as we already >>>>>>>>>>>>>>> do for >>>>>>>>>>>>>>> ipa-replica-manage del. This fix would aid test automation in >>>>>>>>>>>>>>> the >>>>>>>>>>>>>>> future. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> 4) (minor) The new question (and the del too) does not react too >>>>>>>>>>>>>>> well for >>>>>>>>>>>>>>> CTRL+D: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> # ipa-replica-manage clean_ruv 3 --force >>>>>>>>>>>>>>> Clean the Replication Update Vector for >>>>>>>>>>>>>>> vm-055.idm.lab.bos.redhat.com:389 >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Cleaning the wrong replica ID will cause that server to no >>>>>>>>>>>>>>> longer replicate so it may miss updates while the process >>>>>>>>>>>>>>> is running. It would need to be re-initialized to maintain >>>>>>>>>>>>>>> consistency. Be very careful. >>>>>>>>>>>>>>> Continue to clean? [no]: unexpected error: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> 5) Help for clean_ruv command without a required parameter is >>>>>>>>>>>>>>> quite >>>>>>>>>>>>>>> confusing >>>>>>>>>>>>>>> as it reports that command is wrong and not the parameter: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> # ipa-replica-manage clean_ruv >>>>>>>>>>>>>>> Usage: ipa-replica-manage [options] >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> ipa-replica-manage: error: must provide a command [clean_ruv | >>>>>>>>>>>>>>> force-sync | >>>>>>>>>>>>>>> disconnect | connect | del | re-initialize | list | list_ruv] >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> It seems you just forgot to specify the error message in the >>>>>>>>>>>>>>> command >>>>>>>>>>>>>>> definition >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> 6) When the remote replica is down, the clean_ruv command fails >>>>>>>>>>>>>>> with an >>>>>>>>>>>>>>> unexpected error: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> [root@vm-086 ~]# ipa-replica-manage clean_ruv 5 >>>>>>>>>>>>>>> Clean the Replication Update Vector for >>>>>>>>>>>>>>> vm-055.idm.lab.bos.redhat.com:389 >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Cleaning the wrong replica ID will cause that server to no >>>>>>>>>>>>>>> longer replicate so it may miss updates while the process >>>>>>>>>>>>>>> is running. It would need to be re-initialized to maintain >>>>>>>>>>>>>>> consistency. Be very careful. >>>>>>>>>>>>>>> Continue to clean? [no]: y >>>>>>>>>>>>>>> unexpected error: {'desc': 'Operations error'} >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> /var/log/dirsrv/slapd-IDM-LAB-BOS-REDHAT-COM/errors: >>>>>>>>>>>>>>> [04/Jul/2012:06:28:16 -0400] NSMMReplicationPlugin - >>>>>>>>>>>>>>> cleanAllRUV_task: failed >>>>>>>>>>>>>>> to connect to repl agreement connection >>>>>>>>>>>>>>> (cn=meTovm-055.idm.lab.bos.redhat.com,cn=replica, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> cn=dc\3Didm\2Cdc\3Dlab\2Cdc\3Dbos\2Cdc\3Dredhat\2Cdc\3Dcom,cn=mapping >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> tree,cn=config), error 105 >>>>>>>>>>>>>>> [04/Jul/2012:06:28:16 -0400] NSMMReplicationPlugin - >>>>>>>>>>>>>>> cleanAllRUV_task: replica >>>>>>>>>>>>>>> (cn=meTovm-055.idm.lab. >>>>>>>>>>>>>>> bos.redhat.com,cn=replica,cn=dc\3Didm\2Cdc\3Dlab\2Cdc\3Dbos\2Cdc\3Dredhat\2Cdc\3Dcom,cn=mapping >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> tree, cn=config) has not been cleaned. You will need to rerun >>>>>>>>>>>>>>> the >>>>>>>>>>>>>>> CLEANALLRUV task on this replica. >>>>>>>>>>>>>>> [04/Jul/2012:06:28:16 -0400] NSMMReplicationPlugin - >>>>>>>>>>>>>>> cleanAllRUV_task: Task >>>>>>>>>>>>>>> failed (1) >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> In this case I think we should inform user that the command >>>>>>>>>>>>>>> failed, >>>>>>>>>>>>>>> possibly >>>>>>>>>>>>>>> because of disconnected replicas and that they could enable the >>>>>>>>>>>>>>> replicas and >>>>>>>>>>>>>>> try again. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> 7) (minor) "pass" is now redundant in replication.py: >>>>>>>>>>>>>>> + except ldap.INSUFFICIENT_ACCESS: >>>>>>>>>>>>>>> + # We can't make the server we're removing read-only >>>>>>>>>>>>>>> but >>>>>>>>>>>>>>> + # this isn't a show-stopper >>>>>>>>>>>>>>> + root_logger.debug("No permission to switch replica >>>>>>>>>>>>>>> to >>>>>>>>>>>>>>> read-only, >>>>>>>>>>>>>>> continuing anyway") >>>>>>>>>>>>>>> + pass >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> I think this addresses everything. >>>>>>>>>>>>>> >>>>>>>>>>>>>> rob >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks, almost there! I just found one more issue which needs to >>>>>>>>>>>>> be >>>>>>>>>>>>> fixed >>>>>>>>>>>>> before we push: >>>>>>>>>>>>> >>>>>>>>>>>>> # ipa-replica-manage del vm-055.idm.lab.bos.redhat.com --force >>>>>>>>>>>>> Directory Manager password: >>>>>>>>>>>>> >>>>>>>>>>>>> Unable to connect to replica vm-055.idm.lab.bos.redhat.com, >>>>>>>>>>>>> forcing >>>>>>>>>>>>> removal >>>>>>>>>>>>> Failed to get data from 'vm-055.idm.lab.bos.redhat.com': {'desc': >>>>>>>>>>>>> "Can't >>>>>>>>>>>>> contact LDAP server"} >>>>>>>>>>>>> Forcing removal on 'vm-086.idm.lab.bos.redhat.com' >>>>>>>>>>>>> >>>>>>>>>>>>> There were issues removing a connection: %d format: a number is >>>>>>>>>>>>> required, not str >>>>>>>>>>>>> >>>>>>>>>>>>> Failed to get data from 'vm-055.idm.lab.bos.redhat.com': {'desc': >>>>>>>>>>>>> "Can't >>>>>>>>>>>>> contact LDAP server"} >>>>>>>>>>>>> >>>>>>>>>>>>> This is a traceback I retrieved: >>>>>>>>>>>>> Traceback (most recent call last): >>>>>>>>>>>>> File "/sbin/ipa-replica-manage", line 425, in del_master >>>>>>>>>>>>> del_link(realm, r, hostname, options.dirman_passwd, >>>>>>>>>>>>> force=True) >>>>>>>>>>>>> File "/sbin/ipa-replica-manage", line 271, in del_link >>>>>>>>>>>>> repl1.cleanallruv(replica_id) >>>>>>>>>>>>> File >>>>>>>>>>>>> "/usr/lib/python2.7/site-packages/ipaserver/install/replication.py", >>>>>>>>>>>>> line 1094, in cleanallruv >>>>>>>>>>>>> root_logger.debug("Creating CLEANALLRUV task for replica >>>>>>>>>>>>> id >>>>>>>>>>>>> %d" % >>>>>>>>>>>>> replicaId) >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> The problem here is that you don't convert replica_id to int in >>>>>>>>>>>>> this >>>>>>>>>>>>> part: >>>>>>>>>>>>> + replica_id = None >>>>>>>>>>>>> + if repl2: >>>>>>>>>>>>> + replica_id = repl2._get_replica_id(repl2.conn, None) >>>>>>>>>>>>> + else: >>>>>>>>>>>>> + servers = get_ruv(realm, replica1, dirman_passwd) >>>>>>>>>>>>> + for (netloc, rid) in servers: >>>>>>>>>>>>> + if netloc.startswith(replica2): >>>>>>>>>>>>> + replica_id = rid >>>>>>>>>>>>> + break >>>>>>>>>>>>> >>>>>>>>>>>>> Martin >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Updated patch using new mechanism in 389-ds-base. This should more >>>>>>>>>>>> thoroughly clean out RUV data when a replica is being deleted, and >>>>>>>>>>>> provide for a way to delete RUV data afterwards too if necessary. >>>>>>>>>>>> >>>>>>>>>>>> rob >>>>>>>>>>> >>>>>>>>>>> Rebased patch >>>>>>>>>>> >>>>>>>>>>> rob >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> 0) As I wrote in a review for your patch 1041, changelog entry >>>>>>>>>> slipped >>>>>>>>>> elsewhere. >>>>>>>>>> >>>>>>>>>> 1) The following KeyboardInterrupt except class looks suspicious. I >>>>>>>>>> know why >>>>>>>>>> you have it there, but since it is generally a bad thing to do, some >>>>>>>>>> comment >>>>>>>>>> why it is needed would be useful. >>>>>>>>>> >>>>>>>>>> @@ -256,6 +263,17 @@ def del_link(realm, replica1, replica2, >>>>>>>>>> dirman_passwd, >>>>>>>>>> force=False): >>>>>>>>>> repl1.delete_agreement(replica2) >>>>>>>>>> repl1.delete_referral(replica2) >>>>>>>>>> >>>>>>>>>> + if type1 == replication.IPA_REPLICA: >>>>>>>>>> + if repl2: >>>>>>>>>> + ruv = repl2._get_replica_id(repl2.conn, None) >>>>>>>>>> + else: >>>>>>>>>> + ruv = get_ruv_by_host(realm, replica1, replica2, >>>>>>>>>> dirman_passwd) >>>>>>>>>> + >>>>>>>>>> + try: >>>>>>>>>> + repl1.cleanallruv(ruv) >>>>>>>>>> + except KeyboardInterrupt: >>>>>>>>>> + pass >>>>>>>>>> + >>>>>>>>>> >>>>>>>>>> Maybe you just wanted to do some cleanup and then "raise" again? >>>>>>>>> >>>>>>>>> No, it is there because it is safe to break out of it. The task will >>>>>>>>> continue to run. I added some verbiage. >>>>>>>>> >>>>>>>>>> >>>>>>>>>> 2) This is related to 1), but when some replica is down, >>>>>>>>>> "ipa-replica-manage >>>>>>>>>> del" may wait indefinitely when some remote replica is down, right? >>>>>>>>>> >>>>>>>>>> # ipa-replica-manage del vm-055.idm.lab.bos.redhat.com >>>>>>>>>> Deleting a master is irreversible. >>>>>>>>>> To reconnect to the remote master you will need to prepare a new >>>>>>>>>> replica file >>>>>>>>>> and re-install. >>>>>>>>>> Continue to delete? [no]: y >>>>>>>>>> ipa: INFO: Setting agreement >>>>>>>>>> cn=meTovm-086.idm.lab.bos.redhat.com,cn=replica,cn=dc\=idm\,dc\=lab\,dc\=bos\,dc\=redhat\,dc\=com,cn=mapping >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> tree,cn=config schedule to 2358-2359 0 to force synch >>>>>>>>>> ipa: INFO: Deleting schedule 2358-2359 0 from agreement >>>>>>>>>> cn=meTovm-086.idm.lab.bos.redhat.com,cn=replica,cn=dc\=idm\,dc\=lab\,dc\=bos\,dc\=redhat\,dc\=com,cn=mapping >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> tree,cn=config >>>>>>>>>> ipa: INFO: Replication Update in progress: FALSE: status: 0 Replica >>>>>>>>>> acquired >>>>>>>>>> successfully: Incremental update succeeded: start: 0: end: 0 >>>>>>>>>> Background task created to clean replication data >>>>>>>>>> >>>>>>>>>> ... after about a minute I hit CTRL+C >>>>>>>>>> >>>>>>>>>> ^CDeleted replication agreement from 'vm-086.idm.lab.bos.redhat.com' >>>>>>>>>> to >>>>>>>>>> 'vm-055.idm.lab.bos.redhat.com' >>>>>>>>>> Failed to cleanup vm-055.idm.lab.bos.redhat.com DNS entries: NS >>>>>>>>>> record >>>>>>>>>> does not >>>>>>>>>> contain 'vm-055.idm.lab.bos.redhat.com.' >>>>>>>>>> You may need to manually remove them from the tree >>>>>>>>>> >>>>>>>>>> I think it would be better to inform user that some remote replica is >>>>>>>>>> down or >>>>>>>>>> at least that we are waiting for the task to complete. Something like >>>>>>>>>> that: >>>>>>>>>> >>>>>>>>>> # ipa-replica-manage del vm-055.idm.lab.bos.redhat.com >>>>>>>>>> ... >>>>>>>>>> Background task created to clean replication data >>>>>>>>>> Replication data clean up may take very long time if some replica is >>>>>>>>>> unreachable >>>>>>>>>> Hit CTRL+C to interrupt the wait >>>>>>>>>> ^C Clean up wait interrupted >>>>>>>>>> .... >>>>>>>>>> [continue with del] >>>>>>>>> >>>>>>>>> Yup, did this in #1. >>>>>>>>> >>>>>>>>>> >>>>>>>>>> 3) (minor) When there is a cleanruv task running and you run >>>>>>>>>> "ipa-replica-manage del", there is a unexpected error message with >>>>>>>>>> duplicate >>>>>>>>>> task object in LDAP: >>>>>>>>>> >>>>>>>>>> # ipa-replica-manage del vm-072.idm.lab.bos.redhat.com --force >>>>>>>>>> Unable to connect to replica vm-072.idm.lab.bos.redhat.com, forcing >>>>>>>>>> removal >>>>>>>>>> FAIL >>>>>>>>>> Failed to get data from 'vm-072.idm.lab.bos.redhat.com': {'desc': >>>>>>>>>> "Can't >>>>>>>>>> contact LDAP server"} >>>>>>>>>> Forcing removal on 'vm-086.idm.lab.bos.redhat.com' >>>>>>>>>> >>>>>>>>>> There were issues removing a connection: This entry already exists >>>>>>>>>> <<<<<<<<< >>>>>>>>>> >>>>>>>>>> Failed to get data from 'vm-072.idm.lab.bos.redhat.com': {'desc': >>>>>>>>>> "Can't >>>>>>>>>> contact LDAP server"} >>>>>>>>>> Failed to cleanup vm-072.idm.lab.bos.redhat.com DNS entries: NS >>>>>>>>>> record >>>>>>>>>> does not >>>>>>>>>> contain 'vm-072.idm.lab.bos.redhat.com.' >>>>>>>>>> You may need to manually remove them from the tree >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> I think it should be enough to just catch for "entry already exists" >>>>>>>>>> in >>>>>>>>>> cleanallruv function, and in such case print a relevant error message >>>>>>>>>> bail out. >>>>>>>>>> Thus, self.conn.checkTask(dn, dowait=True) would not be called too. >>>>>>>>> >>>>>>>>> Good catch, fixed. >>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> 4) (minor): In make_readonly function, there is a redundant "pass" >>>>>>>>>> statement: >>>>>>>>>> >>>>>>>>>> + def make_readonly(self): >>>>>>>>>> + """ >>>>>>>>>> + Make the current replication agreement read-only. >>>>>>>>>> + """ >>>>>>>>>> + dn = DN(('cn', 'userRoot'), ('cn', 'ldbm database'), >>>>>>>>>> + ('cn', 'plugins'), ('cn', 'config')) >>>>>>>>>> + >>>>>>>>>> + mod = [(ldap.MOD_REPLACE, 'nsslapd-readonly', 'on')] >>>>>>>>>> + try: >>>>>>>>>> + self.conn.modify_s(dn, mod) >>>>>>>>>> + except ldap.INSUFFICIENT_ACCESS: >>>>>>>>>> + # We can't make the server we're removing read-only but >>>>>>>>>> + # this isn't a show-stopper >>>>>>>>>> + root_logger.debug("No permission to switch replica to >>>>>>>>>> read-only, >>>>>>>>>> continuing anyway") >>>>>>>>>> + pass <<<<<<<<<<<<<<< >>>>>>>>> >>>>>>>>> Yeah, this is one of my common mistakes. I put in a pass initially, >>>>>>>>> then >>>>>>>>> add logging in front of it and forget to delete the pass. Its gone >>>>>>>>> now. >>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> 5) In clean_ruv, I think allowing a --force option to bypass the >>>>>>>>>> user_input >>>>>>>>>> would be helpful (at least for test automation): >>>>>>>>>> >>>>>>>>>> + if not ipautil.user_input("Continue to clean?", False): >>>>>>>>>> + sys.exit("Aborted") >>>>>>>>> >>>>>>>>> Yup, added. >>>>>>>>> >>>>>>>>> rob >>>>>>>> >>>>>>>> Slightly revised patch. I still had a window open with one unsaved >>>>>>>> change. >>>>>>>> >>>>>>>> rob >>>>>>>> >>>>>>> >>>>>>> Apparently there were two unsaved changes, one of which was lost. This >>>>>>> adds in >>>>>>> the 'entry already exists' fix. >>>>>>> >>>>>>> rob >>>>>>> >>>>>> >>>>>> Just one last thing (otherwise the patch is OK) - I don't think this is >>>>>> what we >>>>>> want :-) >>>>>> >>>>>> # ipa-replica-manage clean-ruv 8 >>>>>> Clean the Replication Update Vector for vm-055.idm.lab.bos.redhat.com:389 >>>>>> >>>>>> Cleaning the wrong replica ID will cause that server to no >>>>>> longer replicate so it may miss updates while the process >>>>>> is running. It would need to be re-initialized to maintain >>>>>> consistency. Be very careful. >>>>>> Continue to clean? [no]: y <<<<<< >>>>>> Aborted >>>>>> >>>>>> >>>>>> Nor this exception, (your are checking for wrong exception): >>>>>> >>>>>> # ipa-replica-manage clean-ruv 8 >>>>>> Clean the Replication Update Vector for vm-055.idm.lab.bos.redhat.com:389 >>>>>> >>>>>> Cleaning the wrong replica ID will cause that server to no >>>>>> longer replicate so it may miss updates while the process >>>>>> is running. It would need to be re-initialized to maintain >>>>>> consistency. Be very careful. >>>>>> Continue to clean? [no]: >>>>>> unexpected error: This entry already exists >>>>>> >>>>>> This is the exception: >>>>>> >>>>>> Traceback (most recent call last): >>>>>> File "/sbin/ipa-replica-manage", line 651, in <module> >>>>>> main() >>>>>> File "/sbin/ipa-replica-manage", line 648, in main >>>>>> clean_ruv(realm, args[1], options) >>>>>> File "/sbin/ipa-replica-manage", line 373, in clean_ruv >>>>>> thisrepl.cleanallruv(ruv) >>>>>> File >>>>>> "/usr/lib/python2.7/site-packages/ipaserver/install/replication.py", >>>>>> line 1136, in cleanallruv >>>>>> self.conn.addEntry(e) >>>>>> File "/usr/lib/python2.7/site-packages/ipaserver/ipaldap.py", line >>>>>> 503, in >>>>>> addEntry >>>>>> self.__handle_errors(e, arg_desc=arg_desc) >>>>>> File "/usr/lib/python2.7/site-packages/ipaserver/ipaldap.py", line >>>>>> 321, in >>>>>> __handle_errors >>>>>> raise errors.DuplicateEntry() >>>>>> ipalib.errors.DuplicateEntry: This entry already exists >>>>>> >>>>>> Martin >>>>>> >>>>> >>>>> Fixed that and a couple of other problems. When doing a disconnect we >>>>> should >>>>> not also call clean-ruv. >>>> >>>> Ah, good self-catch. >>>> >>>>> >>>>> I also got tired of seeing crappy error messages so I added a little >>>>> convert >>>>> utility. >>>>> >>>>> rob >>>> >>>> 1) There is CLEANALLRUV stuff included in 1050-3 and not here. There are >>>> also >>>> some finding for this new code. >>>> >>>> >>>> 2) We may want to bump Requires to higher version of 389-ds-base >>>> (389-ds-base-1.2.11.14-1) - it contains a fix for CLEANALLRUV+winsync bug I >>>> found earlier. >>>> >>>> >>>> 3) I just discovered another suspicious behavior. When we are deleting a >>>> master >>>> that has links also to other master(s) we delete those too. But we also >>>> automatically run CLEANALLRUV in these cases, so we may end up in multiple >>>> tasks being started on different masters - this does not look right. >>>> >>>> I think we may rather want to at first delete all links and then run >>>> CLEANALLRUV task, just for one time. This is what I get with current code: >>>> >>>> # ipa-replica-manage del vm-072.idm.lab.bos.redhat.com >>>> Directory Manager password: >>>> >>>> Deleting a master is irreversible. >>>> To reconnect to the remote master you will need to prepare a new replica >>>> file >>>> and re-install. >>>> Continue to delete? [no]: yes >>>> ipa: INFO: Setting agreement >>>> cn=meTovm-055.idm.lab.bos.redhat.com,cn=replica,cn=dc\=idm\,dc\=lab\,dc\=bos\,dc\=redhat\,dc\=com,cn=mapping >>>> >>>> >>>> tree,cn=config schedule to 2358-2359 0 to force synch >>>> ipa: INFO: Deleting schedule 2358-2359 0 from agreement >>>> cn=meTovm-055.idm.lab.bos.redhat.com,cn=replica,cn=dc\=idm\,dc\=lab\,dc\=bos\,dc\=redhat\,dc\=com,cn=mapping >>>> >>>> >>>> tree,cn=config >>>> ipa: INFO: Replication Update in progress: FALSE: status: 0 Replica >>>> acquired >>>> successfully: Incremental update succeeded: start: 0: end: 0 >>>> Background task created to clean replication data. This may take a while. >>>> This may be safely interrupted with Ctrl+C >>>> >>>> ^CWait for task interrupted. It will continue to run in the background >>>> >>>> Deleted replication agreement from 'vm-055.idm.lab.bos.redhat.com' to >>>> 'vm-072.idm.lab.bos.redhat.com' >>>> ipa: INFO: Setting agreement >>>> cn=meTovm-086.idm.lab.bos.redhat.com,cn=replica,cn=dc\=idm\,dc\=lab\,dc\=bos\,dc\=redhat\,dc\=com,cn=mapping >>>> >>>> >>>> tree,cn=config schedule to 2358-2359 0 to force synch >>>> ipa: INFO: Deleting schedule 2358-2359 0 from agreement >>>> cn=meTovm-086.idm.lab.bos.redhat.com,cn=replica,cn=dc\=idm\,dc\=lab\,dc\=bos\,dc\=redhat\,dc\=com,cn=mapping >>>> >>>> >>>> tree,cn=config >>>> ipa: INFO: Replication Update in progress: FALSE: status: 0 Replica >>>> acquired >>>> successfully: Incremental update succeeded: start: 0: end: 0 >>>> Background task created to clean replication data. This may take a while. >>>> This may be safely interrupted with Ctrl+C >>>> >>>> ^CWait for task interrupted. It will continue to run in the background >>>> >>>> Deleted replication agreement from 'vm-086.idm.lab.bos.redhat.com' to >>>> 'vm-072.idm.lab.bos.redhat.com' >>>> Failed to cleanup vm-072.idm.lab.bos.redhat.com DNS entries: NS record does >>>> not >>>> contain 'vm-072.idm.lab.bos.redhat.com.' >>>> You may need to manually remove them from the tree >>>> >>>> Martin >>>> >>> >>> All issues addressed and I pulled in abort-clean-ruv from 1050. I added a >>> list-clean-ruv command as well. >>> >>> rob >> >> 1) Patch 1031-9 needs to get squashed with 1031-8 >> >> >> 2) Patch needs a rebase (conflict in freeipa.spec.in) >> >> >> 3) New list-clean-ruv man entry is not right: >> >> list-clean-ruv [REPLICATION_ID] >> - List all running CLEANALLRUV and abort CLEANALLRUV tasks. >> >> REPLICATION_ID is not its argument. > > Fixed 1-3. > >> Btw. new list-clean-ruv command proved very useful for me. >> >> 4) I just found out we need to do a better job with make_readonly() command. >> I >> get into trouble when disconnecting one link to a remote replica as it was >> marked readonly and then I was then unable to manage the disconnected replica >> properly (vm-072 is the replica made readonly): > > Ok, I reset read-only after we delete the agreements. That fixed things up for > me. I disconnected a replica and was able to modify entries on that replica > afterwards. > > This affected the --cleanup command too, it would otherwise have succeeded I > think. > > I tested with an A - B - C - A agreement loop. I disconnected A and C and > confirmed I could still update entries on C. Then I deleted C, then B, and > made > sure output looked right, I could still manage entries, etc. > > rob > >> >> [root@vm-055 ~]# ipa-replica-manage disconnect vm-072.idm.lab.bos.redhat.com >> >> [root@vm-072 ~]# ipa-replica-manage del vm-055.idm.lab.bos.redhat.com >> Deleting a master is irreversible. >> To reconnect to the remote master you will need to prepare a new replica file >> and re-install. >> Continue to delete? [no]: yes >> Deleting replication agreements between vm-055.idm.lab.bos.redhat.com and >> vm-072.idm.lab.bos.redhat.com >> ipa: INFO: Setting agreement >> cn=meTovm-072.idm.lab.bos.redhat.com,cn=replica,cn=dc\=idm\,dc\=lab\,dc\=bos\,dc\=redhat\,dc\=com,cn=mapping >> >> tree,cn=config schedule to 2358-2359 0 to force synch >> ipa: INFO: Deleting schedule 2358-2359 0 from agreement >> cn=meTovm-072.idm.lab.bos.redhat.com,cn=replica,cn=dc\=idm\,dc\=lab\,dc\=bos\,dc\=redhat\,dc\=com,cn=mapping >> >> tree,cn=config >> ipa: INFO: Replication Update in progress: FALSE: status: 0 Replica acquired >> successfully: Incremental update succeeded: start: 0: end: 0 >> Deleted replication agreement from 'vm-072.idm.lab.bos.redhat.com' to >> 'vm-055.idm.lab.bos.redhat.com' >> Unable to remove replication agreement for vm-055.idm.lab.bos.redhat.com from >> vm-072.idm.lab.bos.redhat.com. >> Background task created to clean replication data. This may take a while. >> This may be safely interrupted with Ctrl+C >> ^CWait for task interrupted. It will continue to run in the background >> >> Failed to cleanup vm-055.idm.lab.bos.redhat.com entries: Server is unwilling >> to >> perform: database is read-only arguments: >> dn=krbprincipalname=ldap/vm-055.idm.lab.bos.redhat....@idm.lab.bos.redhat.com,cn=services,cn=accounts,dc=idm,dc=lab,dc=bos,dc=redhat,dc=com >> >> >> You may need to manually remove them from the tree >> ipa: INFO: Unhandled LDAPError: {'info': 'database is read-only', 'desc': >> 'Server is unwilling to perform'} >> >> Failed to cleanup vm-055.idm.lab.bos.redhat.com DNS entries: Server is >> unwilling to perform: database is read-only >> >> You may need to manually remove them from the tree >> >> >> --cleanup did not work for me as well: >> [root@vm-072 ~]# ipa-replica-manage del vm-055.idm.lab.bos.redhat.com --force >> --cleanup >> Cleaning a master is irreversible. >> This should not normally be require, so use cautiously. >> Continue to clean master? [no]: yes >> unexpected error: Server is unwilling to perform: database is read-only >> arguments: >> dn=krbprincipalname=ldap/vm-055.idm.lab.bos.redhat....@idm.lab.bos.redhat.com,cn=services,cn=accounts,dc=idm,dc=lab,dc=bos,dc=redhat,dc=com >> >> >> Martin >> >
I think you sent a wrong patch... Martin _______________________________________________ Freeipa-devel mailing list Freeipa-devel@redhat.com https://www.redhat.com/mailman/listinfo/freeipa-devel