Hi Carlos,
we encountered the same issue after introducing nic-bonding on the
management net which changed the MAC.
The msid includes the MAC of the interface which has the service_ip
bound to. If that MAC changes, your management node encounters an
identity crisis ;)
We solved that, by updating the msid field in the cloud.mshost table.
The msid is the decimal representation of the MAC. So, in our case
57177340185274 (dec) = 3400a30d0aba (hex) (34:00:a3:0d:0a:ba)
This currently shows:
mysql> select * from mshost where id=1 \G
*************************** 1. row ***************************
id: 1
msid: 57177340185274 <-------------------
runid: 1474366224657
name: acs-management-1
state: Up
version: 4.9.0
service_ip: 10.97.13.1
service_port: 9090
last_update: 2016-10-07 07:43:59
removed: NULL
alert_count: 0
1 row in set (0.00 sec)
# ip addr show dev bond0
4: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc
noqueue state UP group default qlen 1000
link/ether 34:00:a3:0d:0a:ba brd ff:ff:ff:ff:ff:ff
inet 10.97.13.1/22 brd 10.97.15.255 scope global bond0
valid_lft forever preferred_lft forever
Another way to solve this issue could be changing the MAC of the
management interface to a known (and fixed) value. E.g. by using
ethtool.
If you know the removed hardware is no longer used in the same network
segment, and you do know the previos MAC, you could change your network
configuration by setting the MAC on the new hardware to the previously
one.
To be safe from further changes, we did this additionally by using the
hwaddress keyword in /etc/network/interfaces. This is the debian/ubuntu
way.
auto bond0
iface bond0 inet static
address 10.97.13.1
netmask 255.255.252.0
gateway 10.97.12.1
dns-nameservers XXXXXXXXXXXXXXXX
dns-search XXXXXXXXXXXXXXXXXXXXXXX
hwaddress 34:00:a3:0d:0a:ba
bond-slaves eth0 eth1
bond-mode 4
bond-lacp-rate 1
bond-miimon 100
bond-updelay 200
bond-downdelay 200
Hope this helps!
cheers,
- Stephan
Am Donnerstag, den 06.10.2016, 23:41 -0700 schrieb Carlos Reátegui:
> Hi,
> I have had this issue a few times now with my management server. I
> have had a couple unfortunate hw issues that have caused the machine
> to crash (have replaced motherboard, backplane and network cards….
> don’t ask). Upon restoring the machine there is a new entry in the
> mshost table. However the host table mgmt_server_id still references
> the old mshost.msid and therefore my management server is unable to
> “manage” the hosts.
>
> The only way to fix this is to :
> update host set mgmt_server_id=<new msid> where mgmt_server_id=<old
> msid>;
>
> Anyone else run across this? Is there a proper way to restore a
> management server? How does the management server decide it is a new
> mshost and create a new entry in the mshost table?
>
> Currently my mshost table has 5 entries and they all claim to be in
> the “Up” state even though there is only 1 management server. One of
> the entries is for the original 4.5.1 install. The other 4 are for
> the current 4.5.2.1 upgrade. They all have the same IP address.
>
> thanks,
> Carlos