+1
This is indicative of why Cloudstack lacks market following.
It does not look like a product that you can bet your career on.
This was discussed at the meeting in Montreal last year but died shortly
thereafter.
It is probably time to freeze development until the docs and wiki are
cleaned up.
Won't happen but should.
Unfortunately it takes a team effort to fix this.
There is no doubt that the team has the talent.
Ron
On 02/08/2017 4:12 AM, Eric Green wrote:
First, about me -- I've been administering Linux systems since 1995. No, that's
not a typo -- that's 22 years. I've also worked for a firewall manufacturer in
the past, I designed the layer 2 VLAN support for a firewall vendor, so I know
VLAN's and such. I run a fairly complex production network with multiple
VLAN's, multiple networks, etc. already, and speak fluent Cisco CLI. In short,
I'm not an amateur at this networking stuff, but figuring out how Cloudstack
wanted my CentOS 7 networking to be configured, and doing all the gymnastics to
make it happen, consumed nearly a week because the documentation simply isn't
up to date, thorough, or accurate, at least for Centos 7.
So anyhow, my configuration:
Cloudstack 4.9.2.0 from the RPM repository at cloudstack.apt-get.eu
Centos 7 servers with:
2 10gbit Ethernet ports -> bond0
A handful of VLANS:
100 -- from my top of rack switch is sent to my core backbone switch layer 3
routed to my local network as 10.100.x.x and thru the NAT border firewall and
router to the Internet. Management.
101 -- same but for 10.101.x.x -- public.
102 -- same but for 10.102.x.x -- guest public (see below).
192 -- A video surveillance camera network that is not routed to anywhere, via
a drop from the core video surveillance POE switch to an access mode port on my
top of rack switch. Not routed.
200 -- 10 gig drop over to my production racks to my storage network there for
accessing legacy storage. Not routed. (Legacy storage is not used for
Cloudstack instance or secondary storage but can be accessed by virtual
machines being migrated to this rack).
1000-2000 -- VLAN's that exist in my top of rack switch on the Cloudstack rack
and assigned to my trunk ports to the cloud servers but routed nowhere else,
for VPC's and such.
Stuck with VLAN's rather than one of the SDN modules like VXNET because a) it's
the oldest and most likely to be stable, b) compatible with my already-existing
network hardware and networks (wouldn't have to somehow map a VLAN to a SDN
virtual network to reach 192 or 200 or create a public 102), and c) least
complex to set up and configure given my existing top-of-rack switch that does
VLANs just fine.
Okay, here's how I had to configure Centos 7 to make it work:
enp4s[01] -> bond0 -> bond0.100 -> br100 -- had to create two interface files,
add them to bond0 bridge, then create a bond0.100 vlan interface, then a br100 bridge,
for my management network. In
/etc/sysconfig-network-scripts:
# ls ifcfg-*
ifcfg-bond0 ifcfg-bond0.100 ifcfg-br100 ifcfg-enp4s0 ifcfg-enp4s1
(where 4s0 and 4s1 are my 10 gigabit Ethernets).
Don't create anything else. You'll just confuse Cloudstack. Any other
configuration of the network simply fails to work. In particular, creating
br101 etc. fails because CloudStack wants to create its own VLANs and bridges
and if you traffic label it as br101 it'll try making vlan br101.101 (doesn't
work, duh). Yes, I know this contradicts every single piece of advice I've seen
on this list. All I know is that this is what works, while every other piece of
advice I've seen for labeling the public and private guest networking fails.
When creating the networks in the GUI under Advanced networking, set bond0 as your physical
network and br100 as the KVM traffic label for the Management network and Storage network
and give them addresses with VLAN 100 (assuming you're using the same network for both
management and storage networks, which is what makes sense with my single 10gbit pipe), but
do *not* set up anything as a traffic label for Guest or Public networks. You will confuse
the agent greatly. Let it use the default labels. It'll work. It'll set up its on
bond0.<tag> VLAN interface and brbond0-<tag> as needed. This violates every
other piece of advice I've seen for labeling, but this is what actually works with this
version of Cloudstack and this version of Centos when you're sending everything through a
VLAN-tagged bond0.
A very important configuration option *not* documented in the installation
documents:
secstorage.allowed.internal.sites=10.100.0.0/16
(for my particular network).
Otherwise I couldn't upload ISO files to the server from my nginx server that's
pointing at the NFS directory full of ISO files.
---
Very important guest VM image prep *NOT* in the docs:
Be sure to install / enable / run acpid on Linux guests, otherwise "clean"
shutdowns can't happen. Turns out Cloudstack on KVM uses the ACPI shutdown functionality
of qemu-kvm. Probably does that on other hypervisors too.
---
Now on for that mysterious VLAN 102:
I created a "public" shared network on the 102 vlan for stuff I don't care is
out in the open. This is a QA lab environment, not a public cloud. So I assigned a subnet
and a VLAN, ran a VLAN drop over to my main backbone layer 3 switch (and bopped up to my
border firewall and told it about the new subnet too so that we could get out to the
Internet as needed), and let it go public. Gotta be a reason why we paid Cisco big bucks
for all that hardware, right?
Plus it's very convenient to delegate a subdomain to the virtual router for that subnet, and have
people able to access their instances as "my-instance.cloud.mycompany.com" where
"my-instance" is the name of their instance in the GUI. It's not documented anywhere that
I can find that you can do this (delegate a subdomain to the virtual router for a guest subnet).
But it works, and it's very convenient for my QA people.
I've played with the VPC stuff. It looks quite powerful. If I were doing a
customer-facing cloud, that's how I'd do it. It's just not what our engineers
need for testing our software.
---
Final thoughts:
1) The GUI is definitely in need of help. Maybe I'm just too accustomed to
modern responsive RESTful UI's, but this GUI is the opposite of responsive in
most locations. You do something, and the display never updates with the
changes. Because it's not RESTful, you can't just hit the refresh button either
-- that'll take you all the way back to the login screen.
2) The documentation clearly is in need of help. If I, someone with 22 years of
experience with Linux and advanced networking and an already-existing complex
network of multiple VLAN's with multiple virtualization offerings and who
already had a top-of-rack switch configured and VLANs and subnets to core
backbone switch and Internet boundary router configured as well as working
networking with NFS etc already configured on the CentOS 7 servers take a week
of trial-and-error to actually get a working installation when it turns out to
be ridiculously simple once you know the tricks, clearly the tricks need to be
documented. It appears that most of the documentation is oriented around
XenServer, and there's nothing specific to CentOS 7 either, though the CentOS 6
documents are *almost* correct for CentOS 7.
3) Failures were mysterious. Error messages said '[Null] failed' way too often. '[Null]'
what?! So then I had to examine the system itself via journalctl / ip addr / etc. to see
what clues it may have left behind such as attempts to configure network ports, check
agent logs, etc. to make guesses as to what may have gone wrong. A simple "Could not
create network bridge for public network because the NIC is in use by another
bridge" would have saved hours worth of time all by itself.
That said, I looked at OpenStack -- a mess of incompatible technologies stitched
together with hacks -- and waved off as something that was overkill for anything
smaller than a Fortune 500 company or Rackspace.com that has the budget to have a
team of consultants come in and hack it to their needs. Eucalyptus isn't flexible
enough to do what I need to do with networks, we have a surveillance network with
around 100 cameras that feeds data to the QA / R&D infrastructure, I could find
no way in Eucalyptus to give that network to the virtual machines I wanted to have
it. OpenNebula ate a friend's cloud multiple times. Not going to talk about oVirt.
Nope, won't. And CloudStack does everything I need it to do.
That said, my needs are almost fulfilled by vSphere / vCenter. It's quite clear
why VMware still continues to exist despite the limitations of their solution.
There is something to be said for bullet-proof and easy to install and manage.
It's clunky and limited but bullet-proof. As in, the only time my ESXi servers
have ever gone done, *ever*, is for power failures. As in, run for years at a
time without any attention at all. And it didn't take much time to install and
configure either, certainly none of the trial and error involved with
Cloudstack. That's hard to beat... but the hardware requirements are exacting
and would have required me to invest more in hardware than I did here, the
software licenses are expensive too, and I just couldn't justify that for a QA
playground.
So consider me slightly annoyed but appreciative. It appears Cloudstack is
going to solve my needs here. We'll see.
--
Ron Wheeler
President
Artifact Software Inc
email: [email protected]
skype: ronaldmwheeler
phone: 866-970-2435, ext 102