Re: [ceph-users] OSD slow requests causing disk aborts in KVM

Krzysztof Nowicki Fri, 13 Feb 2015 00:38:24 -0800

Thanks for the info. I'll try that for now and see if it helps.

Fri Feb 13 2015 at 09:00:59 użytkownik Alexandre DERUMIER <
aderum...@odiso.com> napisał:


> >>Can this timeout be increased in some way? I've searched around and
> found the /sys/block/sdx/device/timeout knob, which in my case is set to
> 30s.
>
> yes, sure
>
> echo 60 > /sys/block/sdx/device/timeout
>
> for 60s for example
>
>
> ----- Mail original -----
> De: "Krzysztof Nowicki" <krzysztof.a.nowi...@gmail.com>
> À: "Andrey Korolyov" <and...@xdel.ru>, "aderumier" <aderum...@odiso.com>
> Cc: "ceph-users" <ceph-users@lists.ceph.com>
> Envoyé: Vendredi 13 Février 2015 08:18:26
> Objet: Re: [ceph-users] OSD slow requests causing disk aborts in KVM
>
> Thu Feb 12 2015 at 16:23:38 użytkownik Andrey Korolyov < and...@xdel.ru >
> napisał:
>
>
>
> On Fri, Feb 6, 2015 at 12:16 PM, Krzysztof Nowicki
> < krzysztof.a.nowi...@gmail.com > wrote:
> > Hi all,
> >
> > I'm running a small Ceph cluster with 4 OSD nodes, which serves as a
> storage
> > backend for a set of KVM virtual machines. The VMs use RBD for disk
> storage.
> > On the VM side I'm using virtio-scsi instead of virtio-blk in order to
> gain
> > DISCARD support.
> >
> > Each OSD node is running on a separate machine, using 3TB WD Black drive
> +
> > Samsung SSD for journal. The machines used for OSD nodes are not equal in
> > spec. Three of them are small servers, while one is a desktop PC. The
> last
> > node is the one causing trouble. During high loads caused by remapping
> due
> > to one of the other nodes going down I've experienced some slow
> requests. To
> > my surprise however these slow requests caused aborts from the block
> device
> > on the VM side, which ended up corrupting files.
> >
> > What I wonder if such behaviour (aborts) is normal in case slow requests
> > pile up. I always though that these requests would be delayed but
> eventually
> > they'd be handled. Are there any tunables that would help me avoid such
> > situations? I would really like to avoid VM outages caused by such
> > corruption issues.
> >
> > I can attach some logs if needed.
> >
> > Best regards
> > Chris
>
> Hi, this is unevitable payoff for using scsi backend on a storage
> which is capable to slow enough operations. There was some
> argonaut/bobtail-era discussions in ceph ml, may be those readings can
> be interesting for you. AFAIR the scsi disk would about after 70s of
> non-receiving ack state for a pending operation.
>
>
>
> Can this timeout be increased in some way? I've searched around and found
> the /sys/block/sdx/device/timeout knob, which in my case is set to 30s.
>
> As for the versions I'm running all Ceph nodes on Gentoo with Ceph version
> 0.80.5. The VM guest in question is running Ubuntu 12.04 LTS with kernel
> 3.13. The guest filesystem is BTRFS.
>
> I'm thinking that the corruption may be some BTRFS bug.
>

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] OSD slow requests causing disk aborts in KVM

Reply via email to