On Tue, Apr 18, 2017 at 12:23 AM Pavel Gashev <[email protected]> wrote:
> Nir, > > A process can chdir into mount point and then lazy umount it. Filesystem > remains mounted while the process exists and current directory is on > mounted filesystem. > > # struncate -s 1g masterfs.img > # mkfs.ext4 masterfs.img > # mkdir masterfs > # mount -o loop masterfs.img masterfs > # cd masterfs > # umount -l . > # touch file > # ls > # cd .. > # ls masterfs > Interesting idea! The only issue I see if not having a way to tell if the file system was actually unmounted. Does process termination guarantee that the file system was unmounted? Do you know if the behaviour is documented somewhere? Nir > > ------------------------------ > *From:* Nir Soffer <[email protected]> > *Sent:* Apr 17, 2017 8:40 PM > *To:* Adam Litke; Pavel Gashev > *Cc:* users > > *Subject:* Re: [ovirt-users] storage redundancy in Ovirt > > On Mon, Apr 17, 2017 at 6:54 PM Adam Litke <[email protected]> wrote: > >> On Mon, Apr 17, 2017 at 11:04 AM, Pavel Gashev <[email protected]> wrote: >> >>> Adam, >>> >>> >>> >>> You know, Sanlock has recovery mechanism that kills VDSM, or even >>> triggers Watchdog to reboot SPM host in case it has lost the SPM lock. >>> >>> I’m asking because I had issues with my master storage that caused SPM >>> host to reboot by Watchdog. And I was sure that it’s an intended behaviour. >>> Isn’t it? >>> >> >> Yes of course. But an SPM host can fail but still maintain its >> connection to the storage lease. In this case still you need classic >> fencing. >> >> Something new we are investigating is the use of sanlock's request >> feature which allows a new host to take the lease away from the current >> holder. The current holder would be fenced by sanlock (watchdog if >> necessary) and only once the lease is free would it be granted to the new >> requester. >> > > We can use the SPM lease to kill vdsm on the non-responsive SPM host, > and start the SPM on another host, similar to the way we handle vms with > a lease. > > But this does not help with the masterfs mounted on the SPM host. if vdsm > is killed before it unmount it, strating the SPM on another host (and > mounting > the msasterfs on the new host) will corrupt the masterfs. > > When using file based storage (nfs, glusterfs) we don't have a masterfs so > killing vdsm on the SPM should be good enough to start the SPM on another > host, even if fencing is not possible. > > We can start with enabling sanlock based SPM fencing on file based storage. > > Nir > > >> >> >>> >>> >>> >>> >>> *From: *Adam Litke <[email protected]> >>> *Date: *Monday, 17 April 2017 at 17:32 >>> *To: *Pavel Gashev <[email protected]> >>> *Cc: *Nir Soffer <[email protected]>, users <[email protected]> >>> >>> *Subject: *Re: [ovirt-users] storage redundancy in Ovirt >>> >>> >>> >>> >>> >>> >>> >>> On Mon, Apr 17, 2017 at 9:26 AM, Pavel Gashev <[email protected]> wrote: >>> >>> Nir, >>> >>> >>> >>> Isn’t SPM managed via Sanlock? I believe there is no need to fence SPM >>> host. Especially if there are no SPM tasks running. >>> >>> >>> >>> It's true that the exclusivity of the SPM role is enforced by Sanlock >>> but you always need to fence a non-responsive SPM because there is no way >>> to guarantee that the host is not still manipulating storage (eg. LV >>> extensions) and we must ensure that only one host has the masterfs on the >>> master storage domain mounted. >>> >>> >>> >>> >>> >>> >>> >>> *From: *<[email protected]> on behalf of Nir Soffer < >>> [email protected]> >>> *Date: *Monday, 17 April 2017 at 16:06 >>> *To: *Konstantin Raskoshnyi <[email protected]>, Dan Yasny < >>> [email protected]> >>> *Cc: *users <[email protected]>, FERNANDO FREDIANI < >>> [email protected]> >>> *Subject: *Re: [ovirt-users] storage redundancy in Ovirt >>> >>> >>> >>> On Mon, Apr 17, 2017 at 8:24 AM Konstantin Raskoshnyi < >>> [email protected]> wrote: >>> >>> But actually, it didn't work well. After main SPM host went down I see >>> this >>> >>> >>> >>> >>> >>> 2017-04-17 05:23:15,554Z ERROR >>> [org.ovirt.engine.core.vdsbroker.irsbroker.IrsProxy] >>> (DefaultQuartzScheduler5) [4dcc033d-26bf-49bb-bfaa-03a970dbbec1] SPM Init: >>> could not find reported vds or not up - pool: 'STG' vds_spm_id: '1' >>> >>> 2017-04-17 05:23:15,567Z INFO >>> [org.ovirt.engine.core.vdsbroker.irsbroker.IrsProxy] >>> (DefaultQuartzScheduler5) [4dcc033d-26bf-49bb-bfaa-03a970dbbec1] SPM >>> selection - vds seems as spm 'tank5' >>> >>> 2017-04-17 05:23:15,567Z WARN >>> [org.ovirt.engine.core.vdsbroker.irsbroker.IrsProxy] >>> (DefaultQuartzScheduler5) [4dcc033d-26bf-49bb-bfaa-03a970dbbec1] spm vds is >>> non responsive, stopping spm selection. >>> >>> >>> >>> So that means only if BMC is up it's possible to automatically switch >>> SPM host? >>> >>> >>> >>> BMC? >>> >>> >>> >>> If your SPM is no responsive, the system will try to fence it. Did you >>> >>> configure power management for all hosts? did you check that it >>> >>> work? How did you simulate non-responsive host? >>> >>> >>> >>> If power management is not configured or fail, the system cannot >>> >>> move the spm to another host, unless you manually confirm that the >>> >>> SPM host was rebooted. >>> >>> >>> >>> Nir >>> >>> >>> >>> >>> >>> Thanks >>> >>> >>> >>> On Sun, Apr 16, 2017 at 8:29 PM, Konstantin Raskoshnyi < >>> [email protected]> wrote: >>> >>> Oh, fence agent works fine if I select ilo4, >>> >>> Thank you for your help! >>> >>> >>> >>> On Sun, Apr 16, 2017 at 8:22 PM Dan Yasny <[email protected]> wrote: >>> >>> On Sun, Apr 16, 2017 at 11:19 PM, Konstantin Raskoshnyi < >>> [email protected]> wrote: >>> >>> Makes sense. >>> >>> I was trying to set it up, but doesn't work with our staging hardware. >>> >>> We have old ilo100, I'll try again. >>> >>> Thanks! >>> >>> >>> >>> >>> >>> It is absolutely necessary for any HA to work properly. There's of >>> course the "confirm host has been shutdown" option, which serves as an >>> override for the fence command, but it's manual >>> >>> >>> >>> On Sun, Apr 16, 2017 at 8:18 PM Dan Yasny <[email protected]> wrote: >>> >>> On Sun, Apr 16, 2017 at 11:15 PM, Konstantin Raskoshnyi < >>> [email protected]> wrote: >>> >>> Fence agent under each node? >>> >>> >>> >>> When you configure a host, there's the power management tab, where you >>> need to enter the bmc details for the host. If you don't have fencing >>> enabled, how do you expect the system to make sure a host running a service >>> is actually down (and it is safe to start HA services elsewhere), and not, >>> for example, just unreachable by the engine? How do you avoid a splitbraid >>> -> SBA ? >>> >>> >>> >>> >>> >>> On Sun, Apr 16, 2017 at 8:14 PM Dan Yasny <[email protected]> wrote: >>> >>> On Sun, Apr 16, 2017 at 11:13 PM, Konstantin Raskoshnyi < >>> [email protected]> wrote: >>> >>> "Corner cases"? >>> >>> I tried to simulate crash of SPM server and ovirt kept trying to >>> reistablished connection to the failed node. >>> >>> >>> >>> Did you configure fencing? >>> >>> >>> >>> >>> >>> >>> >>> On Sun, Apr 16, 2017 at 8:10 PM Dan Yasny <[email protected]> wrote: >>> >>> On Sun, Apr 16, 2017 at 7:29 AM, Nir Soffer <[email protected]> wrote: >>> >>> On Sun, Apr 16, 2017 at 2:05 PM Dan Yasny <[email protected]> wrote: >>> >>> >>> >>> >>> >>> On Apr 16, 2017 7:01 AM, "Nir Soffer" <[email protected]> wrote: >>> >>> On Sun, Apr 16, 2017 at 4:17 AM Dan Yasny <[email protected]> wrote: >>> >>> When you set up a storage domain, you need to specify a host to perform >>> the initial storage operations, but once the SD is defined, it's details >>> are in the engine database, and all the hosts get connected to it directly. >>> If the first host you used to define the SD goes down, all other hosts will >>> still remain connected and work. SPM is an HA service, and if the current >>> SPM host goes down, SPM gets started on another host in the DC. In short, >>> unless your actual NFS exporting host goes down, there is no outage. >>> >>> >>> >>> There is no storage outage, but if you shutdown the spm host, the spm >>> host >>> >>> will not move to a new host until the spm host is online again, or you >>> confirm >>> >>> manually that the spm host was rebooted. >>> >>> >>> >>> In a properly configured setup the SBA should take care of that. That's >>> the whole point of HA services >>> >>> >>> >>> In some cases like power loss or hardware failure, there is no way to >>> start >>> >>> the spm host, and the system cannot recover automatically. >>> >>> >>> >>> There are always corner cases, no doubt. But in a normal situation. >>> where an SPM host goes down because of a hardware failure, it gets fenced, >>> other hosts contend for SPM and start it. No surprises there. >>> >>> >>> >>> >>> >>> Nir >>> >>> >>> >>> >>> >>> >>> >>> Nir >>> >>> >>> >>> >>> >>> On Sat, Apr 15, 2017 at 1:53 PM, Konstantin Raskoshnyi < >>> [email protected]> wrote: >>> >>> Hi Fernando, >>> >>> I see each host has direct connection nfs mount, but yes, if main host >>> to which I connected nfs storage going down the storage becomes unavailable >>> and all vms are down >>> >>> >>> >>> >>> >>> On Sat, Apr 15, 2017 at 10:37 AM FERNANDO FREDIANI < >>> [email protected]> wrote: >>> >>> Hello Konstantin. >>> >>> That doesn`t make much sense make a whole cluster depend on a single >>> host. From what I know any host talk directly to NFS Storage Array or >>> whatever other Shared Storage you have. >>> >>> Have you tested that host going down if that affects the other with the >>> NFS mounted directlly in a NFS Storage array ? >>> >>> Fernando >>> >>> >>> >>> 2017-04-15 12:42 GMT-03:00 Konstantin Raskoshnyi <[email protected]>: >>> >>> In ovirt you have to attach storage through specific host. >>> >>> If host goes down storage is not available. >>> >>> >>> >>> On Sat, Apr 15, 2017 at 7:31 AM FERNANDO FREDIANI < >>> [email protected]> wrote: >>> >>> Well, make it not go through host1 and dedicate a storage server for >>> running NFS and make both hosts connect to it. >>> >>> In my view NFS is much easier to manage than any other type of storage, >>> specially FC and iSCSI and performance is pretty much the same, so you >>> won`t get better results other than management going to other type. >>> >>> Fernando >>> >>> >>> >>> 2017-04-15 5:25 GMT-03:00 Konstantin Raskoshnyi <[email protected]>: >>> >>> Hi guys, >>> >>> I have one nfs storage, >>> >>> it's connected through host1. >>> >>> host2 also has access to it, I can easily migrate vms between them. >>> >>> >>> >>> The question is - if host1 is down - all infrastructure is down, since >>> all traffic goes through host1, >>> >>> is there any way in oVirt to use redundant storage? >>> >>> >>> >>> Only glusterfs? >>> >>> >>> >>> Thanks >>> >>> >>> >>> >>> >>> _______________________________________________ >>> Users mailing list >>> [email protected] >>> http://lists.ovirt.org/mailman/listinfo/users >>> >>> >>> >>> >>> _______________________________________________ >>> Users mailing list >>> [email protected] >>> http://lists.ovirt.org/mailman/listinfo/users >>> >>> >>> >>> _______________________________________________ >>> Users mailing list >>> [email protected] >>> http://lists.ovirt.org/mailman/listinfo/users >>> >>> >>> _______________________________________________ >>> Users mailing list >>> [email protected] >>> http://lists.ovirt.org/mailman/listinfo/users >>> >>> >>> >>> >>> >>> >>> _______________________________________________ >>> Users mailing list >>> [email protected] >>> http://lists.ovirt.org/mailman/listinfo/users >>> >>> >>> >>> >>> -- >>> >>> Adam Litke >>> >> >> >> >> -- >> Adam Litke >> >
_______________________________________________ Users mailing list [email protected] http://lists.ovirt.org/mailman/listinfo/users

