Hi Igor,

Thanks for your reply.
I can verify, discard is disabled in our cluster:

10:03 root@node106b [fra]:~# ceph daemon osd.417 config show | grep discard
    "bdev_async_discard": "false",
    "bdev_enable_discard": "false",
[...]

So there must be something else causing the problems.

Thanks,
Denny


> Am 15.02.2019 um 12:41 schrieb Igor Fedotov <ifedo...@suse.de>:
> 
> Hi Denny,
> 
> Do not remember exactly when discards appeared in BlueStore but they are 
> disabled by default:
> 
> See bdev_enable_discard option.
> 
> 
> Thanks,
> 
> Igor
> 
> On 2/15/2019 2:12 PM, Denny Kreische wrote:
>> Hi,
>> 
>> two weeks ago we upgraded one of our ceph clusters from luminous 12.2.8 to 
>> mimic 13.2.4, cluster is SSD-only, bluestore-only, 68 nodes, 408 OSDs.
>> somehow we see strange behaviour since then. Single OSDs seem to block for 
>> around 5 minutes and this causes the whole cluster and connected 
>> applications to hang. This happened 5 times during the last 10 days at 
>> irregular times, it didn't happen before the upgrade.
>> 
>> OSD log shows something like this (more log here: 
>> https://pastebin.com/6BYam5r4):
>> 
>> [...]
>> 2019-02-14 23:53:39.754 7f379a368700 -1 osd.417 340516 get_health_metrics 
>> reporting 3 slow ops, oldest is osd_op(client.84226977.0:5112539976 0.dff 
>> 0.1d783dff (undecoded) ondisk+read+known_if_redirected e340516)
>> 2019-02-14 23:53:40.706 7f379a368700 -1 osd.417 340516 get_health_metrics 
>> reporting 7 slow ops, oldest is osd_op(client.84226977.0:5112539976 0.dff 
>> 0.1d783dff (undecoded) ondisk+read+known_if_redirected e340516)
>> [...]
>> 
>> In this example osd.417 seems to have a problem. I can see same log line in 
>> other osd logs with placement groups related to osd.417.
>> I assume that all placement groups related to osd.417 are hanging or blocked 
>> when osd.417 is blocked.
>> 
>> How can I see in detail what might cause a certain OSD to stop working?
>> 
>> The cluster consists of 3 different SSD vendors (micron, samsung, intel), 
>> but only micron disks are affected until now. we earlier had problems with 
>> micron SSDs with filestore (xfs), it was fstrim to cause single OSDs to 
>> block for several minutes. we migrated to bluestore about a year ago. just 
>> in case, is there any kind of ssd trim/discard happening in bluestore since 
>> mimic?
>> 
>> Thanks,
>> Denny
>> 
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

-- 
Denny Kreische
IT System Ingenieur und Consultant

Am Teichdamm 20
04680 Colditz

Telefon: 034381 55125
Mobil: 0176 2115 1457

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to