Re: [ovirt-users] storage redundancy in Ovirt

Pavel Gashev Mon, 17 Apr 2017 08:05:05 -0700

Adam,

You know, Sanlock has recovery mechanism that kills VDSM, or even triggers 
Watchdog to reboot SPM host in case it has lost the SPM lock.
I’m asking because I had issues with my master storage that caused SPM host to 
reboot by Watchdog. And I was sure that it’s an intended behaviour. Isn’t it?

From: Adam Litke <[email protected]>
Date: Monday, 17 April 2017 at 17:32
To: Pavel Gashev <[email protected]>
Cc: Nir Soffer <[email protected]>, users <[email protected]>
Subject: Re: [ovirt-users] storage redundancy in Ovirt

On Mon, Apr 17, 2017 at 9:26 AM, Pavel Gashev 
<[email protected]<mailto:[email protected]>> wrote:
Nir,

Isn’t SPM managed via Sanlock? I believe there is no need to fence SPM host. 
Especially if there are no SPM tasks running.

It's true that the exclusivity of the SPM role is enforced by Sanlock but you 
always need to fence a non-responsive SPM because there is no way to guarantee 
that the host is not still manipulating storage (eg. LV extensions) and we must 
ensure that only one host has the masterfs on the master storage domain mounted.

From: <[email protected]<mailto:[email protected]>> on behalf of 
Nir Soffer <[email protected]<mailto:[email protected]>>
Date: Monday, 17 April 2017 at 16:06
To: Konstantin Raskoshnyi <[email protected]<mailto:[email protected]>>, Dan 
Yasny <[email protected]<mailto:[email protected]>>
Cc: users <[email protected]<mailto:[email protected]>>, FERNANDO FREDIANI 
<[email protected]<mailto:[email protected]>>
Subject: Re: [ovirt-users] storage redundancy in Ovirt

On Mon, Apr 17, 2017 at 8:24 AM Konstantin Raskoshnyi 
<[email protected]<mailto:[email protected]>> wrote:
But actually, it didn't work well. After main SPM host went down I see this

2017-04-17 05:23:15,554Z ERROR 
[org.ovirt.engine.core.vdsbroker.irsbroker.IrsProxy] (DefaultQuartzScheduler5) 
[4dcc033d-26bf-49bb-bfaa-03a970dbbec1] SPM Init: could not find reported vds or 
not up - pool: 'STG' vds_spm_id: '1'
2017-04-17 05:23:15,567Z INFO  
[org.ovirt.engine.core.vdsbroker.irsbroker.IrsProxy] (DefaultQuartzScheduler5) 
[4dcc033d-26bf-49bb-bfaa-03a970dbbec1] SPM selection - vds seems as spm 'tank5'
2017-04-17 05:23:15,567Z WARN  
[org.ovirt.engine.core.vdsbroker.irsbroker.IrsProxy] (DefaultQuartzScheduler5) 
[4dcc033d-26bf-49bb-bfaa-03a970dbbec1] spm vds is non responsive, stopping spm 
selection.

So that means only if BMC is up it's possible to automatically switch  SPM host?

BMC?

If your SPM is no responsive, the system will try to fence it. Did you
configure power management for all hosts? did you check that it
work? How did you simulate non-responsive host?

If power management is not configured or fail, the system cannot
move the spm to another host, unless you manually confirm that the
SPM host was rebooted.

Nir

Thanks

On Sun, Apr 16, 2017 at 8:29 PM, Konstantin Raskoshnyi 
<[email protected]<mailto:[email protected]>> wrote:
Oh, fence agent works fine if I select ilo4,
Thank you for your help!

On Sun, Apr 16, 2017 at 8:22 PM Dan Yasny 
<[email protected]<mailto:[email protected]>> wrote:
On Sun, Apr 16, 2017 at 11:19 PM, Konstantin Raskoshnyi 
<[email protected]<mailto:[email protected]>> wrote:
Makes sense.
I was trying to set it up, but doesn't work with our staging hardware.
We have old ilo100, I'll try again.
Thanks!

It is absolutely necessary for any HA to work properly. There's of course the 
"confirm host has been shutdown" option, which serves as an override for the 
fence command, but it's manual

On Sun, Apr 16, 2017 at 8:18 PM Dan Yasny 
<[email protected]<mailto:[email protected]>> wrote:
On Sun, Apr 16, 2017 at 11:15 PM, Konstantin Raskoshnyi 
<[email protected]<mailto:[email protected]>> wrote:
Fence agent under each node?

When you configure a host, there's the power management tab, where you need to 
enter the bmc details for the host. If you don't have fencing enabled, how do 
you expect the system to make sure a host running a service is actually down 
(and it is safe to start HA services elsewhere), and not, for example, just 
unreachable by the engine? How do you avoid a splitbraid -> SBA ?

On Sun, Apr 16, 2017 at 8:14 PM Dan Yasny 
<[email protected]<mailto:[email protected]>> wrote:
On Sun, Apr 16, 2017 at 11:13 PM, Konstantin Raskoshnyi 
<[email protected]<mailto:[email protected]>> wrote:
"Corner cases"?
I tried to simulate crash of SPM server and ovirt kept trying to reistablished 
connection to the failed node.

Did you configure fencing?

On Sun, Apr 16, 2017 at 8:10 PM Dan Yasny 
<[email protected]<mailto:[email protected]>> wrote:
On Sun, Apr 16, 2017 at 7:29 AM, Nir Soffer 
<[email protected]<mailto:[email protected]>> wrote:
On Sun, Apr 16, 2017 at 2:05 PM Dan Yasny 
<[email protected]<mailto:[email protected]>> wrote:

On Apr 16, 2017 7:01 AM, "Nir Soffer" 
<[email protected]<mailto:[email protected]>> wrote:
On Sun, Apr 16, 2017 at 4:17 AM Dan Yasny 
<[email protected]<mailto:[email protected]>> wrote:
When you set up a storage domain, you need to specify a host to perform the 
initial storage operations, but once the SD is defined, it's details are in the 
engine database, and all the hosts get connected to it directly. If the first 
host you used to define the SD goes down, all other hosts will still remain 
connected and work. SPM is an HA service, and if the current SPM host goes 
down, SPM gets started on another host in the DC. In short, unless your actual 
NFS exporting host goes down, there is no outage.

There is no storage outage, but if you shutdown the spm host, the spm host
will not move to a new host until the spm host is online again, or you confirm
manually that the spm host was rebooted.

In a properly configured setup the SBA should take care of that. That's the 
whole point of HA services

In some cases like power loss or hardware failure, there is no way to start
the spm host, and the system cannot recover automatically.

There are always corner cases, no doubt. But in a normal situation. where an 
SPM host goes down because of a hardware failure, it gets fenced, other hosts 
contend for SPM and start it. No surprises there.

Nir

Nir

On Sat, Apr 15, 2017 at 1:53 PM, Konstantin Raskoshnyi 
<[email protected]<mailto:[email protected]>> wrote:
Hi Fernando,
I see each host has direct connection nfs mount, but yes, if main host to which 
I connected nfs storage going down the storage becomes unavailable and all vms 
are down

On Sat, Apr 15, 2017 at 10:37 AM FERNANDO FREDIANI 
<[email protected]<mailto:[email protected]>> wrote:
Hello Konstantin.
That doesn`t make much sense make a whole cluster depend on a single host. From 
what I know any host talk directly to NFS Storage Array or whatever other 
Shared Storage you have.
Have you tested that host going down if that affects the other with the NFS 
mounted directlly in a NFS Storage array ?
Fernando

2017-04-15 12:42 GMT-03:00 Konstantin Raskoshnyi 
<[email protected]<mailto:[email protected]>>:
In ovirt you have to attach storage through specific host.
If host goes down storage is not available.

On Sat, Apr 15, 2017 at 7:31 AM FERNANDO FREDIANI 
<[email protected]<mailto:[email protected]>> wrote:
Well, make it not go through host1 and dedicate a storage server for running 
NFS and make both hosts connect to it.
In my view NFS is much easier to manage than any other type of storage, 
specially FC and iSCSI and performance is pretty much the same, so you won`t 
get better results other than management going to other type.
Fernando

2017-04-15 5:25 GMT-03:00 Konstantin Raskoshnyi 
<[email protected]<mailto:[email protected]>>:
Hi guys,
I have one nfs storage,
it's connected through host1.
host2 also has access to it, I can easily migrate vms between them.

The question is - if host1 is down - all infrastructure is down, since all 
traffic goes through host1,
is there any way in oVirt to use redundant storage?

Only glusterfs?

Thanks

_______________________________________________
Users mailing list
[email protected]<mailto:[email protected]>
http://lists.ovirt.org/mailman/listinfo/users

_______________________________________________
Users mailing list
[email protected]<mailto:[email protected]>
http://lists.ovirt.org/mailman/listinfo/users

_______________________________________________
Users mailing list
[email protected]<mailto:[email protected]>
http://lists.ovirt.org/mailman/listinfo/users

_______________________________________________
Users mailing list
[email protected]<mailto:[email protected]>
http://lists.ovirt.org/mailman/listinfo/users

_______________________________________________
Users mailing list
[email protected]<mailto:[email protected]>
http://lists.ovirt.org/mailman/listinfo/users

--
Adam Litke

_______________________________________________
Users mailing list
[email protected]
http://lists.ovirt.org/mailman/listinfo/users

Re: [ovirt-users] storage redundancy in Ovirt

Reply via email to