On 09/17/2012 04:15 PM, Rob Crittenden wrote: > Martin Kosek wrote: >> On 09/17/2012 04:04 PM, Rob Crittenden wrote: >>> Martin Kosek wrote: >>>> On 09/14/2012 09:17 PM, Rob Crittenden wrote: >>>>> Martin Kosek wrote: >>>>>> On 09/06/2012 11:17 PM, Rob Crittenden wrote: >>>>>>> Martin Kosek wrote: >>>>>>>> On 09/06/2012 05:55 PM, Rob Crittenden wrote: >>>>>>>>> Rob Crittenden wrote: >>>>>>>>>> Rob Crittenden wrote: >>>>>>>>>>> Martin Kosek wrote: >>>>>>>>>>>> On 09/05/2012 08:06 PM, Rob Crittenden wrote: >>>>>>>>>>>>> Rob Crittenden wrote: >>>>>>>>>>>>>> Martin Kosek wrote: >>>>>>>>>>>>>>> On 07/05/2012 08:39 PM, Rob Crittenden wrote: >>>>>>>>>>>>>>>> Martin Kosek wrote: >>>>>>>>>>>>>>>>> On 07/03/2012 04:41 PM, Rob Crittenden wrote: >>>>>>>>>>>>>>>>>> Deleting a replica can leave a replication vector (RUV) on >>>>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>> other servers. >>>>>>>>>>>>>>>>>> This can confuse things if the replica is re-added, and it >>>>>>>>>>>>>>>>>> also >>>>>>>>>>>>>>>>>> causes the >>>>>>>>>>>>>>>>>> server to calculate changes against a server that may no >>>>>>>>>>>>>>>>>> longer >>>>>>>>>>>>>>>>>> exist. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> 389-ds-base provides a new task that self-propogates itself >>>>>>>>>>>>>>>>>> to all >>>>>>>>>>>>>>>>>> available >>>>>>>>>>>>>>>>>> replicas to clean this RUV data. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> This patch will create this task at deletion time to >>>>>>>>>>>>>>>>>> hopefully >>>>>>>>>>>>>>>>>> clean things up. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> It isn't perfect. If any replica is down or unavailable at >>>>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>> time >>>>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>> cleanruv task fires, and then comes back up, the old RUV data >>>>>>>>>>>>>>>>>> may be >>>>>>>>>>>>>>>>>> re-propogated around. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> To make things easier in this case I've added two new >>>>>>>>>>>>>>>>>> commands to >>>>>>>>>>>>>>>>>> ipa-replica-manage. The first lists the replication ids of >>>>>>>>>>>>>>>>>> all the >>>>>>>>>>>>>>>>>> servers we >>>>>>>>>>>>>>>>>> have a RUV for. Using this you can call clean_ruv with the >>>>>>>>>>>>>>>>>> replication id of a >>>>>>>>>>>>>>>>>> server that no longer exists to try the cleanallruv step >>>>>>>>>>>>>>>>>> again. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> This is quite dangerous though. If you run cleanruv against a >>>>>>>>>>>>>>>>>> replica id that >>>>>>>>>>>>>>>>>> does exist it can cause a loss of data. I believe I've put in >>>>>>>>>>>>>>>>>> enough scary >>>>>>>>>>>>>>>>>> warnings about this. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> rob >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Good work there, this should make cleaning RUVs much easier >>>>>>>>>>>>>>>>> than >>>>>>>>>>>>>>>>> with the >>>>>>>>>>>>>>>>> previous version. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> This is what I found during review: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> 1) list_ruv and clean_ruv command help in man is quite lost. I >>>>>>>>>>>>>>>>> think >>>>>>>>>>>>>>>>> it would >>>>>>>>>>>>>>>>> help if we for example have all info for commands indented. >>>>>>>>>>>>>>>>> This >>>>>>>>>>>>>>>>> way >>>>>>>>>>>>>>>>> user could >>>>>>>>>>>>>>>>> simply over-look the new commands in the man page. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> 2) I would rename new commands to clean-ruv and list-ruv to >>>>>>>>>>>>>>>>> make >>>>>>>>>>>>>>>>> them >>>>>>>>>>>>>>>>> consistent with the rest of the commands (re-initialize, >>>>>>>>>>>>>>>>> force-sync). >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> 3) It would be nice to be able to run clean_ruv command in an >>>>>>>>>>>>>>>>> unattended way >>>>>>>>>>>>>>>>> (for better testing), i.e. respect --force option as we >>>>>>>>>>>>>>>>> already >>>>>>>>>>>>>>>>> do for >>>>>>>>>>>>>>>>> ipa-replica-manage del. This fix would aid test automation in >>>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>> future. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> 4) (minor) The new question (and the del too) does not react >>>>>>>>>>>>>>>>> too >>>>>>>>>>>>>>>>> well for >>>>>>>>>>>>>>>>> CTRL+D: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> # ipa-replica-manage clean_ruv 3 --force >>>>>>>>>>>>>>>>> Clean the Replication Update Vector for >>>>>>>>>>>>>>>>> vm-055.idm.lab.bos.redhat.com:389 >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Cleaning the wrong replica ID will cause that server to no >>>>>>>>>>>>>>>>> longer replicate so it may miss updates while the process >>>>>>>>>>>>>>>>> is running. It would need to be re-initialized to maintain >>>>>>>>>>>>>>>>> consistency. Be very careful. >>>>>>>>>>>>>>>>> Continue to clean? [no]: unexpected error: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> 5) Help for clean_ruv command without a required parameter is >>>>>>>>>>>>>>>>> quite >>>>>>>>>>>>>>>>> confusing >>>>>>>>>>>>>>>>> as it reports that command is wrong and not the parameter: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> # ipa-replica-manage clean_ruv >>>>>>>>>>>>>>>>> Usage: ipa-replica-manage [options] >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> ipa-replica-manage: error: must provide a command [clean_ruv | >>>>>>>>>>>>>>>>> force-sync | >>>>>>>>>>>>>>>>> disconnect | connect | del | re-initialize | list | list_ruv] >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> It seems you just forgot to specify the error message in the >>>>>>>>>>>>>>>>> command >>>>>>>>>>>>>>>>> definition >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> 6) When the remote replica is down, the clean_ruv command >>>>>>>>>>>>>>>>> fails >>>>>>>>>>>>>>>>> with an >>>>>>>>>>>>>>>>> unexpected error: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> [root@vm-086 ~]# ipa-replica-manage clean_ruv 5 >>>>>>>>>>>>>>>>> Clean the Replication Update Vector for >>>>>>>>>>>>>>>>> vm-055.idm.lab.bos.redhat.com:389 >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Cleaning the wrong replica ID will cause that server to no >>>>>>>>>>>>>>>>> longer replicate so it may miss updates while the process >>>>>>>>>>>>>>>>> is running. It would need to be re-initialized to maintain >>>>>>>>>>>>>>>>> consistency. Be very careful. >>>>>>>>>>>>>>>>> Continue to clean? [no]: y >>>>>>>>>>>>>>>>> unexpected error: {'desc': 'Operations error'} >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> /var/log/dirsrv/slapd-IDM-LAB-BOS-REDHAT-COM/errors: >>>>>>>>>>>>>>>>> [04/Jul/2012:06:28:16 -0400] NSMMReplicationPlugin - >>>>>>>>>>>>>>>>> cleanAllRUV_task: failed >>>>>>>>>>>>>>>>> to connect to repl agreement connection >>>>>>>>>>>>>>>>> (cn=meTovm-055.idm.lab.bos.redhat.com,cn=replica, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> cn=dc\3Didm\2Cdc\3Dlab\2Cdc\3Dbos\2Cdc\3Dredhat\2Cdc\3Dcom,cn=mapping >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> tree,cn=config), error 105 >>>>>>>>>>>>>>>>> [04/Jul/2012:06:28:16 -0400] NSMMReplicationPlugin - >>>>>>>>>>>>>>>>> cleanAllRUV_task: replica >>>>>>>>>>>>>>>>> (cn=meTovm-055.idm.lab. >>>>>>>>>>>>>>>>> bos.redhat.com,cn=replica,cn=dc\3Didm\2Cdc\3Dlab\2Cdc\3Dbos\2Cdc\3Dredhat\2Cdc\3Dcom,cn=mapping >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> tree, cn=config) has not been cleaned. You will need to >>>>>>>>>>>>>>>>> rerun >>>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>> CLEANALLRUV task on this replica. >>>>>>>>>>>>>>>>> [04/Jul/2012:06:28:16 -0400] NSMMReplicationPlugin - >>>>>>>>>>>>>>>>> cleanAllRUV_task: Task >>>>>>>>>>>>>>>>> failed (1) >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> In this case I think we should inform user that the command >>>>>>>>>>>>>>>>> failed, >>>>>>>>>>>>>>>>> possibly >>>>>>>>>>>>>>>>> because of disconnected replicas and that they could enable >>>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>> replicas and >>>>>>>>>>>>>>>>> try again. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> 7) (minor) "pass" is now redundant in replication.py: >>>>>>>>>>>>>>>>> + except ldap.INSUFFICIENT_ACCESS: >>>>>>>>>>>>>>>>> + # We can't make the server we're removing >>>>>>>>>>>>>>>>> read-only >>>>>>>>>>>>>>>>> but >>>>>>>>>>>>>>>>> + # this isn't a show-stopper >>>>>>>>>>>>>>>>> + root_logger.debug("No permission to switch >>>>>>>>>>>>>>>>> replica to >>>>>>>>>>>>>>>>> read-only, >>>>>>>>>>>>>>>>> continuing anyway") >>>>>>>>>>>>>>>>> + pass >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I think this addresses everything. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> rob >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Thanks, almost there! I just found one more issue which needs >>>>>>>>>>>>>>> to be >>>>>>>>>>>>>>> fixed >>>>>>>>>>>>>>> before we push: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> # ipa-replica-manage del vm-055.idm.lab.bos.redhat.com --force >>>>>>>>>>>>>>> Directory Manager password: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Unable to connect to replica vm-055.idm.lab.bos.redhat.com, >>>>>>>>>>>>>>> forcing >>>>>>>>>>>>>>> removal >>>>>>>>>>>>>>> Failed to get data from 'vm-055.idm.lab.bos.redhat.com': >>>>>>>>>>>>>>> {'desc': >>>>>>>>>>>>>>> "Can't >>>>>>>>>>>>>>> contact LDAP server"} >>>>>>>>>>>>>>> Forcing removal on 'vm-086.idm.lab.bos.redhat.com' >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> There were issues removing a connection: %d format: a number is >>>>>>>>>>>>>>> required, not str >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Failed to get data from 'vm-055.idm.lab.bos.redhat.com': >>>>>>>>>>>>>>> {'desc': >>>>>>>>>>>>>>> "Can't >>>>>>>>>>>>>>> contact LDAP server"} >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> This is a traceback I retrieved: >>>>>>>>>>>>>>> Traceback (most recent call last): >>>>>>>>>>>>>>> File "/sbin/ipa-replica-manage", line 425, in del_master >>>>>>>>>>>>>>> del_link(realm, r, hostname, options.dirman_passwd, >>>>>>>>>>>>>>> force=True) >>>>>>>>>>>>>>> File "/sbin/ipa-replica-manage", line 271, in del_link >>>>>>>>>>>>>>> repl1.cleanallruv(replica_id) >>>>>>>>>>>>>>> File >>>>>>>>>>>>>>> "/usr/lib/python2.7/site-packages/ipaserver/install/replication.py", >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> line 1094, in cleanallruv >>>>>>>>>>>>>>> root_logger.debug("Creating CLEANALLRUV task for >>>>>>>>>>>>>>> replica id >>>>>>>>>>>>>>> %d" % >>>>>>>>>>>>>>> replicaId) >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> The problem here is that you don't convert replica_id to int in >>>>>>>>>>>>>>> this >>>>>>>>>>>>>>> part: >>>>>>>>>>>>>>> + replica_id = None >>>>>>>>>>>>>>> + if repl2: >>>>>>>>>>>>>>> + replica_id = repl2._get_replica_id(repl2.conn, None) >>>>>>>>>>>>>>> + else: >>>>>>>>>>>>>>> + servers = get_ruv(realm, replica1, dirman_passwd) >>>>>>>>>>>>>>> + for (netloc, rid) in servers: >>>>>>>>>>>>>>> + if netloc.startswith(replica2): >>>>>>>>>>>>>>> + replica_id = rid >>>>>>>>>>>>>>> + break >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Martin >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Updated patch using new mechanism in 389-ds-base. This should >>>>>>>>>>>>>> more >>>>>>>>>>>>>> thoroughly clean out RUV data when a replica is being deleted, >>>>>>>>>>>>>> and >>>>>>>>>>>>>> provide for a way to delete RUV data afterwards too if necessary. >>>>>>>>>>>>>> >>>>>>>>>>>>>> rob >>>>>>>>>>>>> >>>>>>>>>>>>> Rebased patch >>>>>>>>>>>>> >>>>>>>>>>>>> rob >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> 0) As I wrote in a review for your patch 1041, changelog entry >>>>>>>>>>>> slipped >>>>>>>>>>>> elsewhere. >>>>>>>>>>>> >>>>>>>>>>>> 1) The following KeyboardInterrupt except class looks suspicious. I >>>>>>>>>>>> know why >>>>>>>>>>>> you have it there, but since it is generally a bad thing to do, >>>>>>>>>>>> some >>>>>>>>>>>> comment >>>>>>>>>>>> why it is needed would be useful. >>>>>>>>>>>> >>>>>>>>>>>> @@ -256,6 +263,17 @@ def del_link(realm, replica1, replica2, >>>>>>>>>>>> dirman_passwd, >>>>>>>>>>>> force=False): >>>>>>>>>>>> repl1.delete_agreement(replica2) >>>>>>>>>>>> repl1.delete_referral(replica2) >>>>>>>>>>>> >>>>>>>>>>>> + if type1 == replication.IPA_REPLICA: >>>>>>>>>>>> + if repl2: >>>>>>>>>>>> + ruv = repl2._get_replica_id(repl2.conn, None) >>>>>>>>>>>> + else: >>>>>>>>>>>> + ruv = get_ruv_by_host(realm, replica1, replica2, >>>>>>>>>>>> dirman_passwd) >>>>>>>>>>>> + >>>>>>>>>>>> + try: >>>>>>>>>>>> + repl1.cleanallruv(ruv) >>>>>>>>>>>> + except KeyboardInterrupt: >>>>>>>>>>>> + pass >>>>>>>>>>>> + >>>>>>>>>>>> >>>>>>>>>>>> Maybe you just wanted to do some cleanup and then "raise" again? >>>>>>>>>>> >>>>>>>>>>> No, it is there because it is safe to break out of it. The task will >>>>>>>>>>> continue to run. I added some verbiage. >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> 2) This is related to 1), but when some replica is down, >>>>>>>>>>>> "ipa-replica-manage >>>>>>>>>>>> del" may wait indefinitely when some remote replica is down, right? >>>>>>>>>>>> >>>>>>>>>>>> # ipa-replica-manage del vm-055.idm.lab.bos.redhat.com >>>>>>>>>>>> Deleting a master is irreversible. >>>>>>>>>>>> To reconnect to the remote master you will need to prepare a new >>>>>>>>>>>> replica file >>>>>>>>>>>> and re-install. >>>>>>>>>>>> Continue to delete? [no]: y >>>>>>>>>>>> ipa: INFO: Setting agreement >>>>>>>>>>>> cn=meTovm-086.idm.lab.bos.redhat.com,cn=replica,cn=dc\=idm\,dc\=lab\,dc\=bos\,dc\=redhat\,dc\=com,cn=mapping >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> tree,cn=config schedule to 2358-2359 0 to force synch >>>>>>>>>>>> ipa: INFO: Deleting schedule 2358-2359 0 from agreement >>>>>>>>>>>> cn=meTovm-086.idm.lab.bos.redhat.com,cn=replica,cn=dc\=idm\,dc\=lab\,dc\=bos\,dc\=redhat\,dc\=com,cn=mapping >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> tree,cn=config >>>>>>>>>>>> ipa: INFO: Replication Update in progress: FALSE: status: 0 Replica >>>>>>>>>>>> acquired >>>>>>>>>>>> successfully: Incremental update succeeded: start: 0: end: 0 >>>>>>>>>>>> Background task created to clean replication data >>>>>>>>>>>> >>>>>>>>>>>> ... after about a minute I hit CTRL+C >>>>>>>>>>>> >>>>>>>>>>>> ^CDeleted replication agreement from >>>>>>>>>>>> 'vm-086.idm.lab.bos.redhat.com' to >>>>>>>>>>>> 'vm-055.idm.lab.bos.redhat.com' >>>>>>>>>>>> Failed to cleanup vm-055.idm.lab.bos.redhat.com DNS entries: NS >>>>>>>>>>>> record >>>>>>>>>>>> does not >>>>>>>>>>>> contain 'vm-055.idm.lab.bos.redhat.com.' >>>>>>>>>>>> You may need to manually remove them from the tree >>>>>>>>>>>> >>>>>>>>>>>> I think it would be better to inform user that some remote replica >>>>>>>>>>>> is >>>>>>>>>>>> down or >>>>>>>>>>>> at least that we are waiting for the task to complete. Something >>>>>>>>>>>> like >>>>>>>>>>>> that: >>>>>>>>>>>> >>>>>>>>>>>> # ipa-replica-manage del vm-055.idm.lab.bos.redhat.com >>>>>>>>>>>> ... >>>>>>>>>>>> Background task created to clean replication data >>>>>>>>>>>> Replication data clean up may take very long time if some replica >>>>>>>>>>>> is >>>>>>>>>>>> unreachable >>>>>>>>>>>> Hit CTRL+C to interrupt the wait >>>>>>>>>>>> ^C Clean up wait interrupted >>>>>>>>>>>> .... >>>>>>>>>>>> [continue with del] >>>>>>>>>>> >>>>>>>>>>> Yup, did this in #1. >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> 3) (minor) When there is a cleanruv task running and you run >>>>>>>>>>>> "ipa-replica-manage del", there is a unexpected error message with >>>>>>>>>>>> duplicate >>>>>>>>>>>> task object in LDAP: >>>>>>>>>>>> >>>>>>>>>>>> # ipa-replica-manage del vm-072.idm.lab.bos.redhat.com --force >>>>>>>>>>>> Unable to connect to replica vm-072.idm.lab.bos.redhat.com, forcing >>>>>>>>>>>> removal >>>>>>>>>>>> FAIL >>>>>>>>>>>> Failed to get data from 'vm-072.idm.lab.bos.redhat.com': {'desc': >>>>>>>>>>>> "Can't >>>>>>>>>>>> contact LDAP server"} >>>>>>>>>>>> Forcing removal on 'vm-086.idm.lab.bos.redhat.com' >>>>>>>>>>>> >>>>>>>>>>>> There were issues removing a connection: This entry already exists >>>>>>>>>>>> <<<<<<<<< >>>>>>>>>>>> >>>>>>>>>>>> Failed to get data from 'vm-072.idm.lab.bos.redhat.com': {'desc': >>>>>>>>>>>> "Can't >>>>>>>>>>>> contact LDAP server"} >>>>>>>>>>>> Failed to cleanup vm-072.idm.lab.bos.redhat.com DNS entries: NS >>>>>>>>>>>> record >>>>>>>>>>>> does not >>>>>>>>>>>> contain 'vm-072.idm.lab.bos.redhat.com.' >>>>>>>>>>>> You may need to manually remove them from the tree >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> I think it should be enough to just catch for "entry already >>>>>>>>>>>> exists" in >>>>>>>>>>>> cleanallruv function, and in such case print a relevant error >>>>>>>>>>>> message >>>>>>>>>>>> bail out. >>>>>>>>>>>> Thus, self.conn.checkTask(dn, dowait=True) would not be called too. >>>>>>>>>>> >>>>>>>>>>> Good catch, fixed. >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> 4) (minor): In make_readonly function, there is a redundant "pass" >>>>>>>>>>>> statement: >>>>>>>>>>>> >>>>>>>>>>>> + def make_readonly(self): >>>>>>>>>>>> + """ >>>>>>>>>>>> + Make the current replication agreement read-only. >>>>>>>>>>>> + """ >>>>>>>>>>>> + dn = DN(('cn', 'userRoot'), ('cn', 'ldbm database'), >>>>>>>>>>>> + ('cn', 'plugins'), ('cn', 'config')) >>>>>>>>>>>> + >>>>>>>>>>>> + mod = [(ldap.MOD_REPLACE, 'nsslapd-readonly', 'on')] >>>>>>>>>>>> + try: >>>>>>>>>>>> + self.conn.modify_s(dn, mod) >>>>>>>>>>>> + except ldap.INSUFFICIENT_ACCESS: >>>>>>>>>>>> + # We can't make the server we're removing read-only >>>>>>>>>>>> but >>>>>>>>>>>> + # this isn't a show-stopper >>>>>>>>>>>> + root_logger.debug("No permission to switch replica to >>>>>>>>>>>> read-only, >>>>>>>>>>>> continuing anyway") >>>>>>>>>>>> + pass <<<<<<<<<<<<<<< >>>>>>>>>>> >>>>>>>>>>> Yeah, this is one of my common mistakes. I put in a pass initially, >>>>>>>>>>> then >>>>>>>>>>> add logging in front of it and forget to delete the pass. Its gone >>>>>>>>>>> now. >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> 5) In clean_ruv, I think allowing a --force option to bypass the >>>>>>>>>>>> user_input >>>>>>>>>>>> would be helpful (at least for test automation): >>>>>>>>>>>> >>>>>>>>>>>> + if not ipautil.user_input("Continue to clean?", False): >>>>>>>>>>>> + sys.exit("Aborted") >>>>>>>>>>> >>>>>>>>>>> Yup, added. >>>>>>>>>>> >>>>>>>>>>> rob >>>>>>>>>> >>>>>>>>>> Slightly revised patch. I still had a window open with one unsaved >>>>>>>>>> change. >>>>>>>>>> >>>>>>>>>> rob >>>>>>>>>> >>>>>>>>> >>>>>>>>> Apparently there were two unsaved changes, one of which was lost. This >>>>>>>>> adds in >>>>>>>>> the 'entry already exists' fix. >>>>>>>>> >>>>>>>>> rob >>>>>>>>> >>>>>>>> >>>>>>>> Just one last thing (otherwise the patch is OK) - I don't think this is >>>>>>>> what we >>>>>>>> want :-) >>>>>>>> >>>>>>>> # ipa-replica-manage clean-ruv 8 >>>>>>>> Clean the Replication Update Vector for >>>>>>>> vm-055.idm.lab.bos.redhat.com:389 >>>>>>>> >>>>>>>> Cleaning the wrong replica ID will cause that server to no >>>>>>>> longer replicate so it may miss updates while the process >>>>>>>> is running. It would need to be re-initialized to maintain >>>>>>>> consistency. Be very careful. >>>>>>>> Continue to clean? [no]: y <<<<<< >>>>>>>> Aborted >>>>>>>> >>>>>>>> >>>>>>>> Nor this exception, (your are checking for wrong exception): >>>>>>>> >>>>>>>> # ipa-replica-manage clean-ruv 8 >>>>>>>> Clean the Replication Update Vector for >>>>>>>> vm-055.idm.lab.bos.redhat.com:389 >>>>>>>> >>>>>>>> Cleaning the wrong replica ID will cause that server to no >>>>>>>> longer replicate so it may miss updates while the process >>>>>>>> is running. It would need to be re-initialized to maintain >>>>>>>> consistency. Be very careful. >>>>>>>> Continue to clean? [no]: >>>>>>>> unexpected error: This entry already exists >>>>>>>> >>>>>>>> This is the exception: >>>>>>>> >>>>>>>> Traceback (most recent call last): >>>>>>>> File "/sbin/ipa-replica-manage", line 651, in <module> >>>>>>>> main() >>>>>>>> File "/sbin/ipa-replica-manage", line 648, in main >>>>>>>> clean_ruv(realm, args[1], options) >>>>>>>> File "/sbin/ipa-replica-manage", line 373, in clean_ruv >>>>>>>> thisrepl.cleanallruv(ruv) >>>>>>>> File >>>>>>>> "/usr/lib/python2.7/site-packages/ipaserver/install/replication.py", >>>>>>>> line 1136, in cleanallruv >>>>>>>> self.conn.addEntry(e) >>>>>>>> File "/usr/lib/python2.7/site-packages/ipaserver/ipaldap.py", >>>>>>>> line >>>>>>>> 503, in >>>>>>>> addEntry >>>>>>>> self.__handle_errors(e, arg_desc=arg_desc) >>>>>>>> File "/usr/lib/python2.7/site-packages/ipaserver/ipaldap.py", >>>>>>>> line >>>>>>>> 321, in >>>>>>>> __handle_errors >>>>>>>> raise errors.DuplicateEntry() >>>>>>>> ipalib.errors.DuplicateEntry: This entry already exists >>>>>>>> >>>>>>>> Martin >>>>>>>> >>>>>>> >>>>>>> Fixed that and a couple of other problems. When doing a disconnect we >>>>>>> should >>>>>>> not also call clean-ruv. >>>>>> >>>>>> Ah, good self-catch. >>>>>> >>>>>>> >>>>>>> I also got tired of seeing crappy error messages so I added a little >>>>>>> convert >>>>>>> utility. >>>>>>> >>>>>>> rob >>>>>> >>>>>> 1) There is CLEANALLRUV stuff included in 1050-3 and not here. There are >>>>>> also >>>>>> some finding for this new code. >>>>>> >>>>>> >>>>>> 2) We may want to bump Requires to higher version of 389-ds-base >>>>>> (389-ds-base-1.2.11.14-1) - it contains a fix for CLEANALLRUV+winsync >>>>>> bug I >>>>>> found earlier. >>>>>> >>>>>> >>>>>> 3) I just discovered another suspicious behavior. When we are deleting a >>>>>> master >>>>>> that has links also to other master(s) we delete those too. But we also >>>>>> automatically run CLEANALLRUV in these cases, so we may end up in >>>>>> multiple >>>>>> tasks being started on different masters - this does not look right. >>>>>> >>>>>> I think we may rather want to at first delete all links and then run >>>>>> CLEANALLRUV task, just for one time. This is what I get with current >>>>>> code: >>>>>> >>>>>> # ipa-replica-manage del vm-072.idm.lab.bos.redhat.com >>>>>> Directory Manager password: >>>>>> >>>>>> Deleting a master is irreversible. >>>>>> To reconnect to the remote master you will need to prepare a new replica >>>>>> file >>>>>> and re-install. >>>>>> Continue to delete? [no]: yes >>>>>> ipa: INFO: Setting agreement >>>>>> cn=meTovm-055.idm.lab.bos.redhat.com,cn=replica,cn=dc\=idm\,dc\=lab\,dc\=bos\,dc\=redhat\,dc\=com,cn=mapping >>>>>> >>>>>> >>>>>> >>>>>> tree,cn=config schedule to 2358-2359 0 to force synch >>>>>> ipa: INFO: Deleting schedule 2358-2359 0 from agreement >>>>>> cn=meTovm-055.idm.lab.bos.redhat.com,cn=replica,cn=dc\=idm\,dc\=lab\,dc\=bos\,dc\=redhat\,dc\=com,cn=mapping >>>>>> >>>>>> >>>>>> >>>>>> tree,cn=config >>>>>> ipa: INFO: Replication Update in progress: FALSE: status: 0 Replica >>>>>> acquired >>>>>> successfully: Incremental update succeeded: start: 0: end: 0 >>>>>> Background task created to clean replication data. This may take a while. >>>>>> This may be safely interrupted with Ctrl+C >>>>>> >>>>>> ^CWait for task interrupted. It will continue to run in the background >>>>>> >>>>>> Deleted replication agreement from 'vm-055.idm.lab.bos.redhat.com' to >>>>>> 'vm-072.idm.lab.bos.redhat.com' >>>>>> ipa: INFO: Setting agreement >>>>>> cn=meTovm-086.idm.lab.bos.redhat.com,cn=replica,cn=dc\=idm\,dc\=lab\,dc\=bos\,dc\=redhat\,dc\=com,cn=mapping >>>>>> >>>>>> >>>>>> >>>>>> tree,cn=config schedule to 2358-2359 0 to force synch >>>>>> ipa: INFO: Deleting schedule 2358-2359 0 from agreement >>>>>> cn=meTovm-086.idm.lab.bos.redhat.com,cn=replica,cn=dc\=idm\,dc\=lab\,dc\=bos\,dc\=redhat\,dc\=com,cn=mapping >>>>>> >>>>>> >>>>>> >>>>>> tree,cn=config >>>>>> ipa: INFO: Replication Update in progress: FALSE: status: 0 Replica >>>>>> acquired >>>>>> successfully: Incremental update succeeded: start: 0: end: 0 >>>>>> Background task created to clean replication data. This may take a while. >>>>>> This may be safely interrupted with Ctrl+C >>>>>> >>>>>> ^CWait for task interrupted. It will continue to run in the background >>>>>> >>>>>> Deleted replication agreement from 'vm-086.idm.lab.bos.redhat.com' to >>>>>> 'vm-072.idm.lab.bos.redhat.com' >>>>>> Failed to cleanup vm-072.idm.lab.bos.redhat.com DNS entries: NS record >>>>>> does >>>>>> not >>>>>> contain 'vm-072.idm.lab.bos.redhat.com.' >>>>>> You may need to manually remove them from the tree >>>>>> >>>>>> Martin >>>>>> >>>>> >>>>> All issues addressed and I pulled in abort-clean-ruv from 1050. I added a >>>>> list-clean-ruv command as well. >>>>> >>>>> rob >>>> >>>> 1) Patch 1031-9 needs to get squashed with 1031-8 >>>> >>>> >>>> 2) Patch needs a rebase (conflict in freeipa.spec.in) >>>> >>>> >>>> 3) New list-clean-ruv man entry is not right: >>>> >>>> list-clean-ruv [REPLICATION_ID] >>>> - List all running CLEANALLRUV and abort CLEANALLRUV tasks. >>>> >>>> REPLICATION_ID is not its argument. >>> >>> Fixed 1-3. >>> >>>> Btw. new list-clean-ruv command proved very useful for me. >>>> >>>> 4) I just found out we need to do a better job with make_readonly() >>>> command. I >>>> get into trouble when disconnecting one link to a remote replica as it was >>>> marked readonly and then I was then unable to manage the disconnected >>>> replica >>>> properly (vm-072 is the replica made readonly): >>> >>> Ok, I reset read-only after we delete the agreements. That fixed things up >>> for >>> me. I disconnected a replica and was able to modify entries on that replica >>> afterwards. >>> >>> This affected the --cleanup command too, it would otherwise have succeeded I >>> think. >>> >>> I tested with an A - B - C - A agreement loop. I disconnected A and C and >>> confirmed I could still update entries on C. Then I deleted C, then B, and >>> made >>> sure output looked right, I could still manage entries, etc. >>> >>> rob >>> >>>> >>>> [root@vm-055 ~]# ipa-replica-manage disconnect >>>> vm-072.idm.lab.bos.redhat.com >>>> >>>> [root@vm-072 ~]# ipa-replica-manage del vm-055.idm.lab.bos.redhat.com >>>> Deleting a master is irreversible. >>>> To reconnect to the remote master you will need to prepare a new replica >>>> file >>>> and re-install. >>>> Continue to delete? [no]: yes >>>> Deleting replication agreements between vm-055.idm.lab.bos.redhat.com and >>>> vm-072.idm.lab.bos.redhat.com >>>> ipa: INFO: Setting agreement >>>> cn=meTovm-072.idm.lab.bos.redhat.com,cn=replica,cn=dc\=idm\,dc\=lab\,dc\=bos\,dc\=redhat\,dc\=com,cn=mapping >>>> >>>> >>>> tree,cn=config schedule to 2358-2359 0 to force synch >>>> ipa: INFO: Deleting schedule 2358-2359 0 from agreement >>>> cn=meTovm-072.idm.lab.bos.redhat.com,cn=replica,cn=dc\=idm\,dc\=lab\,dc\=bos\,dc\=redhat\,dc\=com,cn=mapping >>>> >>>> >>>> tree,cn=config >>>> ipa: INFO: Replication Update in progress: FALSE: status: 0 Replica >>>> acquired >>>> successfully: Incremental update succeeded: start: 0: end: 0 >>>> Deleted replication agreement from 'vm-072.idm.lab.bos.redhat.com' to >>>> 'vm-055.idm.lab.bos.redhat.com' >>>> Unable to remove replication agreement for vm-055.idm.lab.bos.redhat.com >>>> from >>>> vm-072.idm.lab.bos.redhat.com. >>>> Background task created to clean replication data. This may take a while. >>>> This may be safely interrupted with Ctrl+C >>>> ^CWait for task interrupted. It will continue to run in the background >>>> >>>> Failed to cleanup vm-055.idm.lab.bos.redhat.com entries: Server is >>>> unwilling to >>>> perform: database is read-only arguments: >>>> dn=krbprincipalname=ldap/vm-055.idm.lab.bos.redhat....@idm.lab.bos.redhat.com,cn=services,cn=accounts,dc=idm,dc=lab,dc=bos,dc=redhat,dc=com >>>> >>>> >>>> >>>> You may need to manually remove them from the tree >>>> ipa: INFO: Unhandled LDAPError: {'info': 'database is read-only', 'desc': >>>> 'Server is unwilling to perform'} >>>> >>>> Failed to cleanup vm-055.idm.lab.bos.redhat.com DNS entries: Server is >>>> unwilling to perform: database is read-only >>>> >>>> You may need to manually remove them from the tree >>>> >>>> >>>> --cleanup did not work for me as well: >>>> [root@vm-072 ~]# ipa-replica-manage del vm-055.idm.lab.bos.redhat.com >>>> --force >>>> --cleanup >>>> Cleaning a master is irreversible. >>>> This should not normally be require, so use cautiously. >>>> Continue to clean master? [no]: yes >>>> unexpected error: Server is unwilling to perform: database is read-only >>>> arguments: >>>> dn=krbprincipalname=ldap/vm-055.idm.lab.bos.redhat....@idm.lab.bos.redhat.com,cn=services,cn=accounts,dc=idm,dc=lab,dc=bos,dc=redhat,dc=com >>>> >>>> >>>> >>>> Martin >>>> >>> >> >> I think you sent a wrong patch... >> >> Martin >> > > I hate Mondays. > > rob
Maybe you will like this one a little bit more :-) ACK. Pushed to master, ipa-3-0. Martin _______________________________________________ Freeipa-devel mailing list Freeipa-devel@redhat.com https://www.redhat.com/mailman/listinfo/freeipa-devel