Hi Stefan - we simply disabled exclusive-lock on all older (pre-jewel)
images. We still allow the default jewel featuresets for newly created
images because as you mention - the issue does not seem to affect them.

On Thu, May 4, 2017 at 10:19 AM, Stefan Priebe - Profihost AG <
s.pri...@profihost.ag> wrote:

> Hello Brian,
>
> this really sounds the same. I don't see this on a cluster with only
> images created AFTER jewel. And it seems to start happening after i
> enabled exclusive lock on all images.
>
> Did just use feature disable, exclusive-lock,fast-diff,object-map or did
> you also restart all those vms?
>
> Greets,
> Stefan
>
> Am 04.05.2017 um 19:11 schrieb Brian Andrus:
> > Sounds familiar... and discussed in "disk timeouts in libvirt/qemu
> VMs..."
> >
> > We have not had this issue since reverting exclusive-lock, but it was
> > suggested this was not the issue. So far it's held up for us with not a
> > single corrupt filesystem since then.
> >
> > On some images (ones created post-Jewel upgrade) the feature could not
> > be disabled, but these don't seem to be affected. Of course, we never
> > did pinpoint the cause of timeouts, so it's entirely possible something
> > else was causing it but no other major changes went into effect.
> >
> > One thing to look for that might confirm the same issue are timeouts in
> > the guest VM. Most OS kernel will report a hung task in conjunction with
> > the hang up/lock/corruption. Wondering if you're seeing that too.
> >
> > On Wed, May 3, 2017 at 10:49 PM, Stefan Priebe - Profihost AG
> > <s.pri...@profihost.ag <mailto:s.pri...@profihost.ag>> wrote:
> >
> >     Hello,
> >
> >     since we've upgraded from hammer to jewel 10.2.7 and enabled
> >     exclusive-lock,object-map,fast-diff we've problems with corrupting
> VM
> >     filesystems.
> >
> >     Sometimes the VMs are just crashing with FS errors and a restart can
> >     solve the problem. Sometimes the whole VM is not even bootable and we
> >     need to import a backup.
> >
> >     All of them have the same problem that you can't revert to an older
> >     snapshot. The rbd command just hangs at 99% forever.
> >
> >     Is this a known issue - anythink we can check?
> >
> >     Greets,
> >     Stefan
> >     _______________________________________________
> >     ceph-users mailing list
> >     ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
> >     http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >     <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>
> >
> >
> >
> >
> > --
> > Brian Andrus | Cloud Systems Engineer | DreamHost
> > brian.and...@dreamhost.com | www.dreamhost.com <http://www.dreamhost.com
> >
>



-- 
Brian Andrus | Cloud Systems Engineer | DreamHost
brian.and...@dreamhost.com | www.dreamhost.com
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to