Wolfram Schlich created CLOUDSTACK-585: ------------------------------------------
Summary: DHCP entry provisioning is broken in the KVM agent Key: CLOUDSTACK-585 URL: https://issues.apache.org/jira/browse/CLOUDSTACK-585 Project: CloudStack Issue Type: Bug Security Level: Public (Anyone can view this level - this is the default.) Components: Hypervisor Controller, KVM Environment: CloudStack 3.0.2 KVM agent running on Fedora 14 Reporter: Wolfram Schlich When adding an instance to a routerVM DHCP configuration, it seems that the KVM agent calls /usr/lib64/cloud/agent/scripts/network/domr/dhcp_entry.sh with wrongly constructed command line arguments, making the script fail to add correct entries (specifically default router, DNS servers and static routes) for that instance to the routerVM's /etc/dhcphosts.txt + /etc/dhcpopts.txt. Especially adding a specific default gateway fails, so the routerVM will always announce itself as the default router, because the correct entry in /etc/dhcpopts specifying the gateway of the instance's default network as the gateway is missing. This is especially nasty for non-default/additional networks of an instance, messing up the default routing. Examples: Management server log entry: 2012-11-29 14:36:37,764 DEBUG [resource.virtualnetwork.VirtualRoutingResource] (agentRequest-Handler-3:null) Executing: /usr/lib64/cloud/agent/./scripts/network/domr/dhcp_entry.sh -r 169.254.0.122 -v 172.31.2.233 -m 06:b5:88:00:02:30 -n vmname -d 172.31.2.1 -N 172.31.2.201 <END-OF-LINE> Notice the double spaces before -d and -N (and the extra space at the EOL). After patching /usr/lib64/cloud/agent/scripts/network/domr/dhcp_entry.sh to do meaningful logging, it's clear that the script does not get called with "-d", but " -d" instead (same for -N), so with an extra space before the dash. Thus, getopts fails to parse/recognize these two arguments correctly and passed empty values for $dfltrt and $dns to the /root/edithosts.sh being called on the routerVM. It's also clear that the CloudStack KVM Java agent calls it with the wrongly constructed command line, because if a shell would interpret this command line, it would just ignore the extra spaces itself. I've not been able to dig it down, but I somehow suspect that one of ./utils/src/com/cloud/utils/script/Script.java:protected String buildCommandLine(String[] command) { } ./utils/src/com/cloud/utils/script/Script.java:public String execute(OutputInterpreter interpreter) { } might mess up building the command line of the command that had been built by ./core/src/com/cloud/agent/resource/virtualnetwork/VirtualRoutingResource.java:protected synchronized Answer execute (final DhcpEntryCommand cmd) { } before. I've not tried 4.0.0 so far, thus I cannot say whether it might be affected or not. As a workaround, I've patched dhcp_entry.sh to re-evaluate the positional parameters using 'set -- ${@}' (will attach a patch, also one for logging). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira