Don't know whether this helps or not but I logged into the SSVM and ran an
ifconfig -
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 169.254.3.35 netmask 255.255.0.0 broadcast 169.254.255.255
ether 0e:00:a9:fe:03:23 txqueuelen 1000 (Ethernet)
RX packets 141 bytes 20249 (19.7 KiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 108 bytes 16287 (15.9 KiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
eth1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 172.30.3.34 netmask 255.255.255.192 broadcast 172.30.3.63
ether 1e:00:3b:00:00:05 txqueuelen 1000 (Ethernet)
RX packets 56722 bytes 4953133 (4.7 MiB)
RX errors 0 dropped 44573 overruns 0 frame 0
TX packets 11224 bytes 1234932 (1.1 MiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
eth2: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 172.30.4.86 netmask 255.255.255.128 broadcast 172.30.4.127
ether 1e:00:d9:00:00:53 txqueuelen 1000 (Ethernet)
RX packets 366191 bytes 435300557 (415.1 MiB)
RX errors 0 dropped 39456 overruns 0 frame 0
TX packets 145065 bytes 7978602 (7.6 MiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
eth3: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 172.30.5.14 netmask 255.255.255.240 broadcast 172.30.5.15
ether 1e:00:cb:00:00:1a txqueuelen 1000 (Ethernet)
RX packets 132440 bytes 426362982 (406.6 MiB)
RX errors 0 dropped 39446 overruns 0 frame 0
TX packets 67443 bytes 423670834 (404.0 MiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
inet 127.0.0.1 netmask 255.0.0.0
loop txqueuelen 1 (Local Loopback)
RX packets 18 bytes 1440 (1.4 KiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 18 bytes 1440 (1.4 KiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
so it has interfaces in both the management and the storage subnets (as well as
guest).
________________________________
From: Jon Marshall <[email protected]>
Sent: 06 June 2018 11:08
To: [email protected]
Subject: Re: advanced networking with public IPs direct to VMs
Hi Rafael
Thanks for the help, really appreciate it.
So rerunning that command with all servers up -
mysql> select * from cloud.storage_pool where cluster_id = 1 and removed is
null;
Empty set (0.00 sec)
mysql>
As for the storage IP no I'm not setting it to be the management IP when I
setup the zone but the output of the SQL command suggests that is what has
happened.
As I said to Dag I am using a different subnet for storage ie.
172.30.3.0/26 - management subnet
172.30.4.0/25 - guest VM subnet
172.30.5.0/28 - storage
the NFS server IP is 172.30.5.2
each compute node has 3 NICs with an IP from each subnet (i am assuming the
management node only needs an IP in the management network ?)
When I add the zone in the UI I have one physical network with management
(cloudbr0), guest (cloudbr1) and storage (cloudbr2).
When I fill in the storage traffic page I use the range 172.16.5.10 - 14 as
free IPs as I exclude the ones already allocated to the compute nodes and the
NFS server.
I think maybe I am doing something wrong in the UI setup but it is not obvious
to me what it is.
What I might try today unless you want me to keep the setup I have for more
outputs is to go back to 2 NICs, one for storage/management and one for guest
VMs.
I think with the 2 NICs setup the mistake I made last time when adding the zone
was to assume storage would just run over management so I did not drag and drop
the storage icon and assign it to cloudbr0 as with the management which I think
is what I should do ?
________________________________
From: Rafael Weingärtner <[email protected]>
Sent: 06 June 2018 10:54
To: users
Subject: Re: advanced networking with public IPs direct to VMs
Jon, do not panic we are here to help you :)
So, I might have mistyped the SQL query. You you use select * from
cloud.storage_pool where cluster_id = 1 and removed is not null ", you are
listing the storage pools removed. Therefore, the right query would be "
select * from cloud.storage_pool where cluster_id = 1 and removed is null "
There is also something else I do not understand. You are setting the
storage IP in the management subnet? I am not sure if you should be doing
like this. Normally, I set all my storages (primary[when working with NFS]
and secondary) to IPs in the storage subnet.
On Wed, Jun 6, 2018 at 6:49 AM, Dag Sonstebo <[email protected]>
wrote:
> Hi John,
>
> I’m late to this thread and have possibly missed some things – but a
> couple of observations:
>
> “When I add the zone and get to the storage web page I exclude the IPs
> already used for the compute node NICs and the NFS server itself. …..”
> “So the range is 172.30.5.1 -> 15 and the range I fill in is 172.30.5.10
> -> 172.30.5.14.”
>
> I think you may have some confusion around the use of the storage network.
> The important part here is to understand this is for *secondary storage*
> use only – it has nothing to do with primary storage. This means this
> storage network needs to be accessible to the SSVM, to the hypervisors, and
> secondary storage NFS pools needs to be accessible on this network.
>
> The important part – this also means you *can not use the same IP ranges
> for management and storage networks* - doing so means you will have issues
> where effectively both hypervisors and SSVM can see the same subnet on two
> NICs – and you end up in a routing black hole.
>
> So – you need to either:
>
> 1) Use different IP subnets on management and storage, or
> 2) preferably just simplify your setup – stop using a secondary storage
> network altogether and just allow secondary storage to use the management
> network (which is default). Unless you have a very high I/O environment in
> production you are just adding complexity by running separate management
> and storage.
>
> Regards,
> Dag Sonstebo
> Cloud Architect
> ShapeBlue
>
> On 06/06/2018, 10:18, "Jon Marshall" <[email protected]> wrote:
>
> I will disconnect the host this morning and test but before I do that
> I ran this command when all hosts are up -
>
>
>
>
>
> select * from cloud.host;
> +----+-----------------+------------------------------------
> --+--------+--------------------+--------------------+------
> -----------+---------------------+--------------------+-----
> ------------+---------------------+----------------------+--
> ---------------------+-------------------+------------+-----
> --------------+-----------------+--------------------+------
> ------+----------------+--------+-------------+------+------
> -+-------------------------------------+---------+----------
> -------+--------------------+------------+----------+-------
> ---+--------+------------+--------------+-------------------
> --------------------------------------------+-----------+---
> ----+-------------+------------+----------------+-----------
> ----------+---------------------+---------+--------------+--
> --------------+-------+-------------+--------------+
> | id | name | uuid | status
> | type | private_ip_address | private_netmask |
> private_mac_address | storage_ip_address | storage_netmask |
> storage_mac_address | storage_ip_address_2 | storage_mac_address_2 |
> storage_netmask_2 | cluster_id | public_ip_address | public_netmask |
> public_mac_address | proxy_port | data_center_id | pod_id | cpu_sockets |
> cpus | speed | url | fs_type |
> hypervisor_type | hypervisor_version | ram | resource | version |
> parent | total_size | capabilities | guid
> | available | setup | dom0_memory | last_ping |
> mgmt_server_id | disconnected | created | removed |
> update_count | resource_state | owner | lastUpdated | engine_state |
> +----+-----------------+------------------------------------
> --+--------+--------------------+--------------------+------
> -----------+---------------------+--------------------+-----
> ------------+---------------------+----------------------+--
> ---------------------+-------------------+------------+-----
> --------------+-----------------+--------------------+------
> ------+----------------+--------+-------------+------+------
> -+-------------------------------------+---------+----------
> -------+--------------------+------------+----------+-------
> ---+--------+------------+--------------+-------------------
> --------------------------------------------+-----------+---
> ----+-------------+------------+----------------+-----------
> ----------+---------------------+---------+--------------+--
> --------------+-------+-------------+--------------+
> | 1 | dcp-cscn1.local | d97b930c-ab5f-4b7d-9243-eabd60012284 | Up
> | Routing | 172.30.3.3 | 255.255.255.192 |
> 00:22:19:92:4e:34 | 172.30.3.3 | 255.255.255.192 |
> 00:22:19:92:4e:34 | NULL | NULL | NULL
> | 1 | 172.30.4.3 | 255.255.255.128 |
> 00:22:19:92:4e:35 | NULL | 1 | 1 | 1 |
> 2 | 2999 | iqn.1994-05.com.redhat:fa437fb0c023 | NULL | KVM
> | NULL | 7510159360 | NULL | 4.11.0.0 | NULL |
> NULL | hvm,snapshot |
> 9f2b15cb-1b75-321b-bf59-f83e7a5e8efb-LibvirtComputingResource
> | 1 | 0 | 0 | 1492390408 | 146457912294 |
> 2018-06-05 14:09:22 | 2018-06-05 13:44:33 | NULL | 4 |
> Enabled | NULL | NULL | Disabled |
> | 2 | v-2-VM | ce1f4594-2b4f-4b2b-a239-3f5e2c2215b0 | Up
> | ConsoleProxy | 172.30.3.49 | 255.255.255.192 |
> 1e:00:80:00:00:14 | 172.30.3.49 | 255.255.255.192 |
> 1e:00:80:00:00:14 | NULL | NULL | NULL
> | NULL | 172.30.4.98 | 255.255.255.128 |
> 1e:00:c9:00:00:5f | NULL | 1 | 1 | NULL |
> NULL | NULL | NoIqn | NULL | NULL
> | NULL | 0 | NULL | 4.11.0.0 | NULL |
> NULL | NULL | Proxy.2-ConsoleProxyResource
> | 1 | 0 | 0 | 1492390409 | 146457912294 |
> 2018-06-05 14:09:22 | 2018-06-05 13:46:22 | NULL | 7 |
> Enabled | NULL | NULL | Disabled |
> | 3 | s-1-VM | 107d0a8e-e2d1-42b5-8b9d-ff3845bb556c | Up
> | SecondaryStorageVM | 172.30.3.34 | 255.255.255.192 |
> 1e:00:3b:00:00:05 | 172.30.3.34 | 255.255.255.192 |
> 1e:00:3b:00:00:05 | NULL | NULL | NULL
> | NULL | 172.30.4.86 | 255.255.255.128 |
> 1e:00:d9:00:00:53 | NULL | 1 | 1 | NULL |
> NULL | NULL | NoIqn | NULL | NULL
> | NULL | 0 | NULL | 4.11.0.0 | NULL |
> NULL | NULL | s-1-VM-NfsSecondaryStorageResource
> | 1 | 0 | 0 | 1492390407 | 146457912294
> | 2018-06-05 14:09:22 | 2018-06-05 13:46:27 | NULL | 7 |
> Enabled | NULL | NULL | Disabled |
> | 4 | dcp-cscn2.local | f0c076cb-112f-4f4b-a5a4-1a96ffac9794 | Up
> | Routing | 172.30.3.4 | 255.255.255.192 |
> 00:26:b9:4a:97:7d | 172.30.3.4 | 255.255.255.192 |
> 00:26:b9:4a:97:7d | NULL | NULL | NULL
> | 1 | 172.30.4.4 | 255.255.255.128 |
> 00:26:b9:4a:97:7e | NULL | 1 | 1 | 1 |
> 2 | 2999 | iqn.1994-05.com.redhat:e9b4aa7e7881 | NULL | KVM
> | NULL | 7510159360 | NULL | 4.11.0.0 | NULL |
> NULL | hvm,snapshot |
> 40e58399-fc7a-3a59-8f48-16d0f99b11c9-LibvirtComputingResource
> | 1 | 0 | 0 | 1492450882 | 146457912294 |
> 2018-06-05 14:09:22 | 2018-06-05 13:46:33 | NULL | 8 |
> Enabled | NULL | NULL | Disabled |
> | 5 | dcp-cscn3.local | 0368ae16-550f-43a9-bb40-ee29d2b5c274 | Up
> | Routing | 172.30.3.5 | 255.255.255.192 |
> 00:24:e8:73:6a:b2 | 172.30.3.5 | 255.255.255.192 |
> 00:24:e8:73:6a:b2 | NULL | NULL | NULL
> | 1 | 172.30.4.5 | 255.255.255.128 |
> 00:24:e8:73:6a:b3 | NULL | 1 | 1 | 1 |
> 2 | 3000 | iqn.1994-05.com.redhat:ccdce43aff1c | NULL | KVM
> | NULL | 7510159360 | NULL | 4.11.0.0 | NULL |
> NULL | hvm,snapshot |
> 10bb1c01-0e92-3108-8209-37f3eebad8fb-LibvirtComputingResource
> | 1 | 0 | 0 | 1492390408 | 146457912294 |
> 2018-06-05 14:09:22 | 2018-06-05 13:47:04 | NULL | 6 |
> Enabled | NULL | NULL | Disabled |
> +----+-----------------+------------------------------------
> --+--------+--------------------+--------------------+------
> -----------+---------------------+--------------------+-----
> ------------+---------------------+----------------------+--
> ---------------------+-------------------+------------+-----
> --------------+-----------------+--------------------+------
> ------+----------------+--------+-------------+------+------
> -+-------------------------------------+---------+----------
> -------+--------------------+------------+----------+-------
> ---+--------+------------+--------------+-------------------
> --------------------------------------------+-----------+---
> ----+-------------+------------+----------------+-----------
> ----------+---------------------+---------+--------------+--
> --------------+-------+-------------+--------------+
> 5 rows in set (0.00 sec)
>
>
>
> and you can see that it says the storage IP address is the same as the
> private IP address (the management network).
>
>
> I also ran the command you provided using the Cluster ID number from
> the table above -
>
>
>
> mysql> select * from cloud.storage_pool where cluster_id = 1 and
> removed is not null;
> Empty set (0.00 sec)
>
> mysql>
>
> So assuming I am reading this correctly that seems to be the issue.
>
>
> I am at a loss as to why though.
>
>
> I have a separate NIC for storage as described. When I add the zone
> and get to the storage web page I exclude the IPs already used for the
> compute node NICs and the NFS server itself. I do this because initially I
> didn't and the SSVM started using the IP address of the NFS server.
>
>
> So the range is 172.30.5.1 -> 15 and the range I fill in is
> 172.30.5.10 -> 172.30.5.14.
>
>
> And I used the label "cloudbr2" for storage.
>
>
> I must be doing this wrong somehow.
>
>
> Any pointers would be much appreciated.
>
>
>
>
> ________________________________
> From: Rafael Weingärtner <[email protected]>
> Sent: 05 June 2018 16:13
> To: users
> Subject: Re: advanced networking with public IPs direct to VMs
>
> That is interesting. Let's see the source of all truth...
> This is the code that is generating that odd message.
>
> > List<StoragePoolVO> clusterPools =
> > _storagePoolDao.listPoolsByCluster(agent.getClusterId());
> > boolean hasNfs = false;
> > for (StoragePoolVO pool : clusterPools) {
> > if (pool.getPoolType() == StoragePoolType.NetworkFilesystem)
> {
> > hasNfs = true;
> > break;
> > }
> > }
> > if (!hasNfs) {
> > s_logger.warn(
> > "Agent investigation was requested on host " +
> agent +
> > ", but host does not support investigation because it has no NFS
> storage.
> > Skipping investigation.");
> > return Status.Disconnected;
> > }
> >
>
> There are two possibilities here. You do not have any NFS storage? Is
> that
> the case? Or maybe, for some reason, the call
> "_storagePoolDao.listPoolsByCluster(agent.getClusterId())" is not
> returning
> any NFS storage pools. Looking at the "listPoolsByCluster " we will see
> that the following SQL is used:
>
> Select * from storage_pool where cluster_id = <host'sClusterId> and
> removed
> > is not null
> >
>
> Can you run that SQL to see the its return when your hosts are marked
> as
> disconnected?
>
>
> [email protected]
> www.shapeblue.com<http://www.shapeblue.com>
Shapeblue - The CloudStack Company<http://www.shapeblue.com/>
www.shapeblue.com
ShapeBlue are the largest independent integrator of CloudStack technologies
globally and are specialists in the design and implementation of IaaS cloud
infrastructures for both private and public cloud implementations.
> 53 Chandos Place, Covent Garden, London WC2N 4HSUK
> @shapeblue
>
>
>
> On Tue, Jun 5, 2018 at 11:32 AM, Jon Marshall <[email protected]>
> wrote:
>
> > I reran the tests with the 3 NIC setup. When I configured the zone
> through
> > the UI I used the labels cloudbr0 for management, cloudbr1 for guest
> > traffic and cloudbr2 for NFS as per my original response to you.
> >
> >
> > When I pull the power to the node (dcp-cscn2.local) after about 5
> mins
> > the host status goes to "Alert" but never to "Down"
> >
> >
> > I get this in the logs -
> >
> >
> > 2018-06-05 15:17:14,382 WARN [c.c.h.KVMInvestigator]
> > (AgentTaskPool-1:ctx-f4da4dc9) (logid:138e9a93) Agent investigation
> was
> > requested on host Host[-4-Routing], but host does not support
> investigation
> > because it has no NFS storage. Skipping investigation.
> > 2018-06-05 15:17:14,382 DEBUG [c.c.h.HighAvailabilityManagerImpl]
> > (AgentTaskPool-1:ctx-f4da4dc9) (logid:138e9a93) KVMInvestigator was
> able to
> > determine host 4 is in Disconnected
> > 2018-06-05 15:17:14,382 INFO [c.c.a.m.AgentManagerImpl]
> > (AgentTaskPool-1:ctx-f4da4dc9) (logid:138e9a93) The agent from host
> 4 state
> > determined is Disconnected
> > 2018-06-05 15:17:14,382 WARN [c.c.a.m.AgentManagerImpl]
> > (AgentTaskPool-1:ctx-f4da4dc9) (logid:138e9a93) Agent is
> disconnected but
> > the host is still up: 4-dcp-cscn2.local
> >
> > I don't understand why it thinks there is no NFS storage as each
> compute
> > node has a dedicated storage NIC.
> >
> >
> > I also don't understand why it thinks the host is still up ie. what
> test
> > is it doing to determine that ?
> >
> >
> > Am I just trying to get something working that is not supported ?
> >
> >
> > ________________________________
> > From: Rafael Weingärtner <[email protected]>
> > Sent: 04 June 2018 15:31
> > To: users
> > Subject: Re: advanced networking with public IPs direct to VMs
> >
> > What type of failover are you talking about?
> > What ACS version are you using?
> > What hypervisor are you using?
> > How are you configuring your NICs in the hypervisor?
> > How are you configuring the traffic labels in ACS?
> >
> > On Mon, Jun 4, 2018 at 11:29 AM, Jon Marshall <[email protected]
> >
> > wrote:
> >
> > > Hi all
> > >
> > >
> > > I am close to giving up on basic networking as I just cannot get
> failover
> > > working with multiple NICs (I am not even sure it is supported).
> > >
> > >
> > > What I would like is to use 3 NICs for management, storage and
> guest
> > > traffic. I would like to assign public IPs direct to the VMs which
> is
> > why I
> > > originally chose basic.
> > >
> > >
> > > If I switch to advanced networking do I just configure a guest VM
> with
> > > public IPs on one NIC and not both with the public traffic -
> > >
> > >
> > > would this work ?
> > >
> >
> >
> >
> > --
> > Rafael Weingärtner
> >
>
>
>
> --
> Rafael Weingärtner
>
>
>
--
Rafael Weingärtner