On Wed, Jan 11, 2017 at 9:23 PM, Nir Soffer <[email protected]> wrote: > On Wed, Jan 11, 2017 at 7:35 PM, Mark Greenall > <[email protected]> wrote: >> Hi Ovirt Champions, >> >> >> >> I am pulling my hair out and in need of advice / help. >> >> >> >> Host server: Dell PowerEdge R815 (40 cores and 768GB memory) >> >> Stoage: Dell Equallogic (Firmware V8.1.4) >> >> OS: Centos 7.3 (although the same thing happens on 7.2) >> >> Ovirt: 4.0.6.3-1 (although also happens on 4.0.5) >> >> >> >> I can’t exactly pinpoint when this started happening but it’s certainly been >> happening with Ovirt 4.0.5 and CentOS 7.2. Today I updated Hosted Engine and >> one host to 4.0.6 and CentOS 7.3 but we still see the same problem. Our >> hosts are connected to Dell iSCSI Eqallogic storage. We have one storage >> domain defined per VM guest, so do have quite a few LUN’s presented to the >> cluster (around 45 in total). >> >> >> >> Problem Description: >> >> 1) Reboot a host. >> >> 2) Activate a host in Ovirt Admin Gui. >> >> 3) A few minutes later host is shown as activated. >> >> 4) Approx 10-15 mins later host goes offline complaining that it can’t >> connect to storage. >> >> 5) Constantly then loops around (activating, non operational, >> connecting, initialising) and the host ends up with a high CPU load and >> large number of lvm commands in the process tree. >> >> 6) Multipath and iscsi show all storage is available and logged in. >> >> 7) Equallogic shows host connected and no errors. >> >> 8) Admin GUI ends up saying the host can’t connect to storage >> ‘UNKNOWN’. >> >> >> >> The strange thing is that every now and again step 5 doesn’t happen and the >> host will actually activate again and then stays up. However, it still >> takes step 4 to take the host offline first. >> >> >> >> Expected Behaviour: >> >> 1) Reboot a host. >> >> 2) Activate a host in Ovirt Admin Gui. >> >> 3) A few minutes later host is shown as activated. >> >> 4) Begin using host with confidence. >> >> >> >> I’ve attached the engine.log from Hosted Engine and vdsm.log from the host. >> The following is a timeline of the latest event. >> >> >> >> Host Activation : 15:07 >> >> Host Up: 15:10 >> >> Non-Operational: 15:17 >> >> >> >> Seriously hoping someone can spot something obvious as this is making the >> clusters somewhat unstable and unreliable. > > Can you share /var/log/messages and /var/log/sanlock.log?
And /etc/multipath.conf > > Nir _______________________________________________ Users mailing list [email protected] http://lists.ovirt.org/mailman/listinfo/users

