[ https://issues.apache.org/jira/browse/CLOUDSTACK-8839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Xin He updated CLOUDSTACK-8839: ------------------------------- Description: *[appearance]* when many cocurrent disable static nat command happend in one network, some public ips may remnant in VR, and this will cause a big problem for the hole cloud netowrk *[reason]* when executing the disable static nat command, CS will execute disassociate ip command at the same time, and this command will put all the public ip, include the associate and disassociate ips, to VR, however, if cocurrent disable static nat commands is happening, like disable public ips A and B, first disable ip A, then disable ip B, this two commands will be like first A- B+, second A- B- (this place, we use - as disassociate ip, + as associate ip), but this two commands like above is working in a normal way, if the cocurrent time is very close, the answer of VR for disassociate ip A is not returned to CS, the public ip A will be remain in associated status in CS database, then the second command would not be A- B-, but A+ B-, then the ip A will be reassociated by the second command without our expectation, and this will make public ip A remnant in VR, so as the reason above, some ips which should be disassociated may be reassociated by the cocurrent other commands, this issue will happened easily as the close cocurrent disable static nat commands. *[bug fix suggestion]* use some kind of lock mechanism like "optimistic lock", this "optimistic lock" will give a version id for network and vpc in CS database, anytime the network or vpc is doing some about public ip or network rules (network rules also have this problem), the version id will have an increment, when the resouce part (like VmwareResource) find the command which it got is before or equal the last version they got before, this command will be discarded. This method guarantee that every command sent by resource part and rechieved by VR will be the last version of network or vpc at that time. so the example like above will not happen again. was: *[appearance]* when many cocurrent disable static nat command happend in one network, some public ips may remnant in VR, and this will cause a big problem for the hole cloud netowrk *[reason]* when executing the disable static nat command, CS will execute disassociate ip command at the same time, and this command will put all the public ip, include the associate and disassociate ips, to VR, however, if cocurrent disable static nat commands is happening, like ips A and B, first disable ip A, then disable ip B, this two commands will be like first A- B+, second A- B- (this place, we use - as disassociate ip, + as associate ip), but this two commands like above is work in a normal way, if the cocurrent time is very close, the answer of VR for disassociate ip A is not achieved by CS, the public ip A will be remain in associated status, then the second command would not be A- B-, but A+ B-, then the ip A will be reassociated by the second command without our expectation, and this will make public ip A remnant in VR, so as the reason above, some ip which should be disassociated may be associated again by the cocurrent commands. *[bug fix suggestion]* use some kind of lock mechanism like "optimistic lock", this "optimistic lock" will give a version id for network and vpc, anytime the network or vpc is doing some about public ip or network rules (network rules also have this problem), the version id will have an increment, when the resouce part (like VmwareResource) find the command which it got is before the last version they got before, this command will be discarded. This method guarantee that every command sent by resource part and rechieved by VR will be the last version of network or vpc at that time. > concurrent ip disassociate commands for virtual router maybe out of order, > and this cause some public ips remnant in VR > ----------------------------------------------------------------------------------------------------------------------- > > Key: CLOUDSTACK-8839 > URL: https://issues.apache.org/jira/browse/CLOUDSTACK-8839 > Project: CloudStack > Issue Type: Bug > Security Level: Public(Anyone can view this level - this is the > default.) > Components: Management Server, Virtual Router > Affects Versions: 4.5.2 > Environment: three management server and handreds of vmware compute > nodes > Reporter: Xin He > Priority: Critical > Fix For: 4.5.2 > > > *[appearance]* > when many cocurrent disable static nat command happend in one network, some > public ips may remnant in VR, and this will cause a big problem for the hole > cloud netowrk > *[reason]* > when executing the disable static nat command, CS will execute disassociate > ip command at the same time, and this command will put all the public ip, > include the associate and disassociate ips, to VR, however, if cocurrent > disable static nat commands is happening, like disable public ips A and B, > first disable ip A, then disable ip B, this two commands will be like first > A- B+, second A- B- (this place, we use - as disassociate ip, + as associate > ip), but this two commands like above is working in a normal way, if the > cocurrent time is very close, the answer of VR for disassociate ip A is not > returned to CS, the public ip A will be remain in associated status in CS > database, then the second command would not be A- B-, but A+ B-, then the ip > A will be reassociated by the second command without our expectation, and > this will make public ip A remnant in VR, so as the reason above, some ips > which should be disassociated may be reassociated by the cocurrent other > commands, this issue will happened easily as the close cocurrent disable > static nat commands. > *[bug fix suggestion]* > use some kind of lock mechanism like "optimistic lock", this "optimistic > lock" will give a version id for network and vpc in CS database, anytime the > network or vpc is doing some about public ip or network rules (network rules > also have this problem), the version id will have an increment, when the > resouce part (like VmwareResource) find the command which it got is before or > equal the last version they got before, this command will be discarded. This > method guarantee that every command sent by resource part and rechieved by VR > will be the last version of network or vpc at that time. so the example like > above will not happen again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)