>>> The problem is caused by the RBD device not handling device aborts
> >>>> properly causing LIO and ESXi to enter a death spiral together.
> >>>>
> >>>> If something in the Ceph cluster causes an IO to take longer than
> >>>> 10
nger than 10
>>>> seconds(I think!!!) ESXi submits an iSCSI abort message. Once this happens,
>>>> as you have seen it never recovers.
>>>>
>>>> Mike Christie from Redhat is doing a lot of work on this currently, so
>>>> hopefully in the futur
t RBD interface into LIO and it
>>> will all work much better.
>>>
>>> Either tgt or SCST seem to be pretty stable in testing.
>>>
>>> Nick
>>>
>>>> -Original Message-
>>>> From: ceph-users [mailto:ceph-users-boun.
T seem to be pretty stable in testing.
>>
>> Nick
>>
>>> -Original Message-
>>> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
>>> Alex Gorbachev
>>> Sent: 23 August 2015 02:17
>>> To: ceph-users
>>>
>> -Original Message-
>> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
>> Alex Gorbachev
>> Sent: 23 August 2015 02:17
>> To: ceph-users
>> Subject: [ceph-users] Slow responding OSDs are not OUTed and cause RBD
>> client
> -Original Message-
> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
> Alex Gorbachev
> Sent: 23 August 2015 02:17
> To: ceph-users
> Subject: [ceph-users] Slow responding OSDs are not OUTed and cause RBD
> client IO hangs
>
> Hello, th
Hello, this is an issue we have been suffering from and researching
along with a good number of other Ceph users, as evidenced by the
recent posts. In our specific case, these issues manifest themselves
in a RBD -> iSCSI LIO -> ESXi configuration, but the problem is more
general.
When there is an