I do use  automatic migration policy.

The main question you have to solve is:
1. Why the nodes became 'Non-operational' .Usually this happens when the 
management interface (in your case HoatedEngine VM) could not reach the nodes 
over the management network.

By default, management is going over the ovirtmgmt network. I guess you have 
created the new network. Marked that new network as management network and then 
the switch was off , causing 'Non-Operational  state'.

2. Migrating VMs is usually a safe approach, but this behavior is quite 
strange. If a node ia Non-operational ->  there could be no successful 
migration.

3. Some of the VMs got paused  due to storage issue.  Are you using GlusterFS, 
NFS or iSCSI ? If yes, you need to clarify why you lost your storage.

I guess  for now you can mark each VM to be migrated only manually (VM -> Edit) 
and if they are critical VMs , set a high Availability from each VM's Edit 
options.

In such case, if a node fails, the VMs will be restarted on another node.

4.  Have you setup node fencing ? For example APC, iLo, iDRAC and other fencing 
mechanisms can allow the HostedEngine use another Host as fencing proxy and to 
reset the problematic Hypervisor.


P.S.: You can define the following alias  in '~/.bashrc' :
alias virsh='virsh -c 
qemu:///system?authfile=/etc/ovirt-hosted-engine/virsh_auth.conf'
Then you can verify your VMs even when a HostedEngine is down:
'virsh list --all'

Best Regards,
Strahil NikolovOn Dec 27, 2019 08:40, [email protected] wrote:
>
> I had a crash yesterday in my ovirt cluster, which is made up of 3 nodes.
>
> I just tried to add a new network, but the whole cluster crashed
>
> I added a new network to my cluster, but while I was debugging the newswitch, 
> when the switch was poweroff, the node detected the network card status down 
> and then moved to Non-Operational state.
>
>
>
> At this time  all of 3 nodes moved to Non-Operational state.
>
> All virtual machines have started automatic migration,When I received the 
> alert email, all virtual machines were suspended
>
>
>
>
>
> In 15 minutes my newswitch were power up again.The 3 ovirt-nodes become 
> active again, but many virtual machines become unresponsive or suspended due 
> to forced migration, and only a few virtual machines are pulled up again due 
> to cancelled migration
>
> After I tried to terminate the migration tasks and restart ovirt-engine  
> service, I was still unable to restore most of the virtual machines, so I had 
> to restart 3 ovirt-nodes to restore my virtual machine
>
> I didn't recover all the virtual machines until an hour later
>
>
> Then  I modify my migration policy  to " Do Not Migrate Virtual Machines"
>
> Which migration Policy do you recommend?
>
> I'm afraid to use cluster...
>
> ________________________________
> [email protected]
_______________________________________________
Users mailing list -- [email protected]
To unsubscribe send an email to [email protected]
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/[email protected]/message/LEWYTSUQJMCDPAXD6CU37TBDI5A7QRD3/

Reply via email to