Hi Igor, Thanks for your reply. I can verify, discard is disabled in our cluster:
10:03 root@node106b [fra]:~# ceph daemon osd.417 config show | grep discard "bdev_async_discard": "false", "bdev_enable_discard": "false", [...] So there must be something else causing the problems. Thanks, Denny > Am 15.02.2019 um 12:41 schrieb Igor Fedotov <ifedo...@suse.de>: > > Hi Denny, > > Do not remember exactly when discards appeared in BlueStore but they are > disabled by default: > > See bdev_enable_discard option. > > > Thanks, > > Igor > > On 2/15/2019 2:12 PM, Denny Kreische wrote: >> Hi, >> >> two weeks ago we upgraded one of our ceph clusters from luminous 12.2.8 to >> mimic 13.2.4, cluster is SSD-only, bluestore-only, 68 nodes, 408 OSDs. >> somehow we see strange behaviour since then. Single OSDs seem to block for >> around 5 minutes and this causes the whole cluster and connected >> applications to hang. This happened 5 times during the last 10 days at >> irregular times, it didn't happen before the upgrade. >> >> OSD log shows something like this (more log here: >> https://pastebin.com/6BYam5r4): >> >> [...] >> 2019-02-14 23:53:39.754 7f379a368700 -1 osd.417 340516 get_health_metrics >> reporting 3 slow ops, oldest is osd_op(client.84226977.0:5112539976 0.dff >> 0.1d783dff (undecoded) ondisk+read+known_if_redirected e340516) >> 2019-02-14 23:53:40.706 7f379a368700 -1 osd.417 340516 get_health_metrics >> reporting 7 slow ops, oldest is osd_op(client.84226977.0:5112539976 0.dff >> 0.1d783dff (undecoded) ondisk+read+known_if_redirected e340516) >> [...] >> >> In this example osd.417 seems to have a problem. I can see same log line in >> other osd logs with placement groups related to osd.417. >> I assume that all placement groups related to osd.417 are hanging or blocked >> when osd.417 is blocked. >> >> How can I see in detail what might cause a certain OSD to stop working? >> >> The cluster consists of 3 different SSD vendors (micron, samsung, intel), >> but only micron disks are affected until now. we earlier had problems with >> micron SSDs with filestore (xfs), it was fstrim to cause single OSDs to >> block for several minutes. we migrated to bluestore about a year ago. just >> in case, is there any kind of ssd trim/discard happening in bluestore since >> mimic? >> >> Thanks, >> Denny >> >> _______________________________________________ >> ceph-users mailing list >> ceph-users@lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Denny Kreische IT System Ingenieur und Consultant Am Teichdamm 20 04680 Colditz Telefon: 034381 55125 Mobil: 0176 2115 1457 _______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com