Hi Guys. I am planning on upgrading my 4.1.1 infrastructure to 4.2 over the weekend.
When testing my 4.1.1 setup I ran across a problem where a TOR switch failure would cause an outage to the management server. The agents use 2 NICs for all management traffic using bonds. When I tried to configure the management server to use a bond0 in simple active-passive mode (like I use for my agent management network), cloudstack-management would not start due to 'Integrity Issues', which at the time I located back to a IntegitryChecker which ensures the interfaces of eth* em* or some others were taking the IP of management server. My question is does this limitation still exist and if so, can it be overcome by adding bond* to the list of allowed interface names and compiling the management server from source? I would love to hear input to this, it seems bizarre to me that it is difficult to add simple but effective network redundancy to the management server. For scenario basis, this is the basic redundant network setup I have for my Agents: 4x KVM Hosts all with 4 NICs - 2 bonds (Private/Public Traffic) Example Host: ------------------Interconnect--------------- TOR 1 --------- TOR 2 --------------------- --------------------- | Management | | Tagged VLANs | ---------------------------------------------------- KVM Cloudstack Hypervisor ---------------------------------------------------- | Public Traffic | | Tagged VLANS | | LACP Aggregation | ---------------------------------------------------- Core Router ---------------------------------------------------- There are also LACP links with STP rules between the TOR switches are the core device to allow for interconnect failure so the TORs do not become isolated, but I have excluded that for simplicity. I would have thought it would be easy to create a bond for my management node and connect the two NICs to both the TOR switches, but that didn't work in 4.1.1 due to my reasons above. Thanks! Marty