On 07/05/2012 08:39 PM, Rob Crittenden wrote:
> Martin Kosek wrote:
>> On 07/03/2012 04:41 PM, Rob Crittenden wrote:
>>> Deleting a replica can leave a replication vector (RUV) on the other 
>>> servers.
>>> This can confuse things if the replica is re-added, and it also causes the
>>> server to calculate changes against a server that may no longer exist.
>>>
>>> 389-ds-base provides a new task that self-propogates itself to all available
>>> replicas to clean this RUV data.
>>>
>>> This patch will create this task at deletion time to hopefully clean things 
>>> up.
>>>
>>> It isn't perfect. If any replica is down or unavailable at the time the
>>> cleanruv task fires, and then comes back up, the old RUV data may be
>>> re-propogated around.
>>>
>>> To make things easier in this case I've added two new commands to
>>> ipa-replica-manage. The first lists the replication ids of all the servers 
>>> we
>>> have a RUV for. Using this you can call clean_ruv with the replication id 
>>> of a
>>> server that no longer exists to try the cleanallruv step again.
>>>
>>> This is quite dangerous though. If you run cleanruv against a replica id 
>>> that
>>> does exist it can cause a loss of data. I believe I've put in enough scary
>>> warnings about this.
>>>
>>> rob
>>>
>>
>> Good work there, this should make cleaning RUVs much easier than with the
>> previous version.
>>
>> This is what I found during review:
>>
>> 1) list_ruv and clean_ruv command help in man is quite lost. I think it would
>> help if we for example have all info for commands indented. This way user 
>> could
>> simply over-look the new commands in the man page.
>>
>>
>> 2) I would rename new commands to clean-ruv and list-ruv to make them
>> consistent with the rest of the commands (re-initialize, force-sync).
>>
>>
>> 3) It would be nice to be able to run clean_ruv command in an unattended way
>> (for better testing), i.e. respect --force option as we already do for
>> ipa-replica-manage del. This fix would aid test automation in the future.
>>
>>
>> 4) (minor) The new question (and the del too) does not react too well for
>> CTRL+D:
>>
>> # ipa-replica-manage clean_ruv 3 --force
>> Clean the Replication Update Vector for vm-055.idm.lab.bos.redhat.com:389
>>
>> Cleaning the wrong replica ID will cause that server to no
>> longer replicate so it may miss updates while the process
>> is running. It would need to be re-initialized to maintain
>> consistency. Be very careful.
>> Continue to clean? [no]: unexpected error:
>>
>>
>> 5) Help for clean_ruv command without a required parameter is quite confusing
>> as it reports that command is wrong and not the parameter:
>>
>> # ipa-replica-manage clean_ruv
>> Usage: ipa-replica-manage [options]
>>
>> ipa-replica-manage: error: must provide a command [clean_ruv | force-sync |
>> disconnect | connect | del | re-initialize | list | list_ruv]
>>
>> It seems you just forgot to specify the error message in the command 
>> definition
>>
>>
>> 6) When the remote replica is down, the clean_ruv command fails with an
>> unexpected error:
>>
>> [root@vm-086 ~]# ipa-replica-manage clean_ruv 5
>> Clean the Replication Update Vector for vm-055.idm.lab.bos.redhat.com:389
>>
>> Cleaning the wrong replica ID will cause that server to no
>> longer replicate so it may miss updates while the process
>> is running. It would need to be re-initialized to maintain
>> consistency. Be very careful.
>> Continue to clean? [no]: y
>> unexpected error: {'desc': 'Operations error'}
>>
>>
>> /var/log/dirsrv/slapd-IDM-LAB-BOS-REDHAT-COM/errors:
>> [04/Jul/2012:06:28:16 -0400] NSMMReplicationPlugin - cleanAllRUV_task: failed
>> to connect to repl        agreement connection
>> (cn=meTovm-055.idm.lab.bos.redhat.com,cn=replica,
>>      cn=dc\3Didm\2Cdc\3Dlab\2Cdc\3Dbos\2Cdc\3Dredhat\2Cdc\3Dcom,cn=mapping
>> tree,cn=config), error 105
>> [04/Jul/2012:06:28:16 -0400] NSMMReplicationPlugin - cleanAllRUV_task: 
>> replica
>> (cn=meTovm-055.idm.lab.
>> bos.redhat.com,cn=replica,cn=dc\3Didm\2Cdc\3Dlab\2Cdc\3Dbos\2Cdc\3Dredhat\2Cdc\3Dcom,cn=mapping
>>
>> tree,   cn=config) has not been cleaned.  You will need to rerun the
>> CLEANALLRUV task on this replica.
>> [04/Jul/2012:06:28:16 -0400] NSMMReplicationPlugin - cleanAllRUV_task: Task
>> failed (1)
>>
>> In this case I think we should inform user that the command failed, possibly
>> because of disconnected replicas and that they could enable the replicas and
>> try again.
>>
>>
>> 7) (minor) "pass" is now redundant in replication.py:
>> +        except ldap.INSUFFICIENT_ACCESS:
>> +            # We can't make the server we're removing read-only but
>> +            # this isn't a show-stopper
>> +            root_logger.debug("No permission to switch replica to read-only,
>> continuing anyway")
>> +            pass
>>
> 
> I think this addresses everything.
> 
> rob

Thanks, almost there! I just found one more issue which needs to be fixed
before we push:

# ipa-replica-manage del vm-055.idm.lab.bos.redhat.com --force
Directory Manager password:

Unable to connect to replica vm-055.idm.lab.bos.redhat.com, forcing removal
Failed to get data from 'vm-055.idm.lab.bos.redhat.com': {'desc': "Can't
contact LDAP server"}
Forcing removal on 'vm-086.idm.lab.bos.redhat.com'

There were issues removing a connection: %d format: a number is required, not 
str

Failed to get data from 'vm-055.idm.lab.bos.redhat.com': {'desc': "Can't
contact LDAP server"}

This is a traceback I retrieved:
Traceback (most recent call last):
  File "/sbin/ipa-replica-manage", line 425, in del_master
    del_link(realm, r, hostname, options.dirman_passwd, force=True)
  File "/sbin/ipa-replica-manage", line 271, in del_link
    repl1.cleanallruv(replica_id)
  File "/usr/lib/python2.7/site-packages/ipaserver/install/replication.py",
line 1094, in cleanallruv
    root_logger.debug("Creating CLEANALLRUV task for replica id %d" % replicaId)


The problem here is that you don't convert replica_id to int in this part:
+    replica_id = None
+    if repl2:
+        replica_id = repl2._get_replica_id(repl2.conn, None)
+    else:
+        servers = get_ruv(realm, replica1, dirman_passwd)
+        for (netloc, rid) in servers:
+            if netloc.startswith(replica2):
+                replica_id = rid
+                break

Martin

_______________________________________________
Freeipa-devel mailing list
Freeipa-devel@redhat.com
https://www.redhat.com/mailman/listinfo/freeipa-devel

Reply via email to