Sorry some typo. It is mclock. Exact parameter osd_recovery_max_active_ssd/hdd. Is 10, to reduce you have to override mclock to true.
Restarting osd daemon alone will solve your issue. Regards Dev On Fri, 2 May 2025 at 9:07 AM, Devender Singh <deven...@netskrt.io> wrote: > Hello > > Try restarting osds showing slow ops. > Also if any recovery going on then max recovery drives for Malcom is 10 > try reducing it. Will resolve this issue. > If it persist for a drive then check for smart TK for errors and replace > that drive. > > Regards > Dev > > On Fri, 2 May 2025 at 8:45 AM, Maged Mokhtar <mmokh...@petasan.org> wrote: > >> >> On 02/05/2025 13:57, Frédéric Nass wrote: >> > To clarify, there's no "issue" with the code itself. It's just that the >> code now reveals a potential "issue" with the OSD's underlying device, as >> Igor explained. >> > >> > This warning can pop up starting from Quincy v17.2.8 (PR 59468), Reef >> v18.2.5 (PR #59466) and Squid v19.2.1 (PR #59464). >> > >> > Regards, >> > Frédéric. >> >> >> Thanks Igor and Frederic for the clarifications. >> >> However, this begs the question, what should users do see-ing such slow >> ops? >> From quoted link: >> >> https://docs.ceph.com/en/latest/rados/operations/health-checks/#bluestore-slow-op-alert >> Which states it could be a drive issue, but not always... >> >> So i think it could be helpful to share information/experiences of what >> users find to be the root cause of such issues. >> From our side: >> >> 1) With Octopus and earlier, we rarely saw such logs, and when they >> happened, it was mainly bad drives. >> >> 2) When we made an upgrade from Octopus->Quincy, we started to see more >> users complain. >> The complaint was not always due to a warning, but generally slower >> performance + higher latencies seen on charts + we can see it in the >> logs for a time period like: >> grep -r "slow operation observed for" /var/log/ceph | grep "2024-11" >> >> 3) Many users with issue, reported improvement when they stopped/reduced >> bulk deletions like heavy patterns of block rbd trim/discard/reclaim. >> This recommendation was influenced by messages from Igor and Mark Nelson >> on slow bulk deletions. >> It was also noticeable that after stopping trim, the cluster will not >> report issues even at significantly higher client load. >> This constituted the larger portion of issues we saw. >> >> 4) Generally performing an offline db compaction also helped: >> ceph-kvstore-tool bluestore-kv /var/lib/ceph/osd/ceph-XX compact >> >> 5) For non-db related warnings, some older OSDs had high fragmentation >> ceph daemon osd.XX bluestore allocator score block >> Deleting and re-adding the same drive helped slow ops. >> >> 6) To a lesser extent, the logs do indicate a defective drive or a drive >> with a different model/type that has much less performance than the >> other models in cluster/pool. >> >> >> /Maged >> >> >> > >> > ----- Le 2 Mai 25, à 12:36, Eugen Block ebl...@nde.ag a écrit : >> > >> >> The link Frederic shared is for 19.2.1, so yes, the new warning >> >> appeared in 19.2.1 as well. >> >> >> >> Zitat von Laimis Juzeliūnas <laimis.juzeliu...@oxylabs.io>: >> >> >> >>> Hi all, >> >>> >> >>> Could this also be an issue with 19.2.2? >> >>> We have seen few of these warnings right after upgrading from >> >>> 19.2.0. A simple OSD restart removed them, but we haven’t seen them >> >>> before. >> >>> There are some users on the Ceph Slack channels discussing this >> >>> observation in 19.2.2 as well. >> >>> >> >>> Best, >> >>> Laimis J. >> >>> >> >>>> On 2 May 2025, at 13:11, Igor Fedotov <igor.fedo...@croit.io> wrote: >> >>>> >> >>>> Hi Everyone, >> >>>> >> >>>> well, indeed this warning has been introduced in 18.2.6. >> >>>> >> >>>> But I wouldn't say that's not an issue. Having it permanently >> >>>> visible (particularly for a specific OSD only) might indicate some >> >>>> issues with this OSD which could negatively impact overall cluster >> >>>> performance. >> >>>> >> >>>> OSD log to be checked for potential clues and more research on the >> >>>> root cause is recommended. >> >>>> >> >>>> And once again - likely that's not a regression in 18.2.6 but >> >>>> rather some additional diagnostics brought by the release which >> >>>> reveals a potential issue. >> >>>> >> >>>> >> >>>> Thanks, >> >>>> >> >>>> Igor >> >>>> >> >>>> On 02.05.2025 11:19, Frédéric Nass wrote: >> >>>>> Hi Michel, >> >>>>> >> >>>>> This is not an issue. It's a new warning that can be adjusted or >> >>>>> muted. Check this thread [1] and this part [2] of the Reef >> >>>>> documentation about this new alert. >> >>>>> Came to Reef with PR #59466 [3]. >> >>>>> >> >>>>> Cheers, >> >>>>> Frédéric. >> >>>>> >> >>>>> [1] >> >>>>> >> https://www.google.com/url?q=https://www.spinics.net/lists/ceph-users/msg86131.html&source=gmail-imap&ust=1746785596000000&usg=AOvVaw27M4y8QaoDcRiJBkxDVVoK >> >>>>> [2] >> >>>>> >> https://www.google.com/url?q=https://docs.ceph.com/en/latest/rados/operations/health-checks/%23bluestore-slow-op-alert&source=gmail-imap&ust=1746785596000000&usg=AOvVaw21VoozoT2KT6FESbkkVJ_w >> >>>>> [3] >> >>>>> >> https://www.google.com/url?q=https://github.com/ceph/ceph/pull/59466&source=gmail-imap&ust=1746785596000000&usg=AOvVaw0nnpOvrWFLB1lAk0Ekms1i >> >>>>> >> >>>>> ----- Le 2 Mai 25, à 9:44, Michel Jouvin >> >>>>> michel.jou...@ijclab.in2p3.fr a écrit : >> >>>>> >> >>>>>> Hi, >> >>>>>> >> >>>>>> Since our upgrade to 18.2.6 2 days ago, our cluster is reporting >> the >> >>>>>> warning "1 OSD(s) experiencing slow operations in BlueStore": >> >>>>>> >> >>>>>> [root@dig-osd4 bluestore-slow-ops]# ceph health detail >> >>>>>> HEALTH_WARN 1 OSD(s) experiencing slow operations in BlueStore >> >>>>>> [WRN] BLUESTORE_SLOW_OP_ALERT: 1 OSD(s) experiencing slow >> operations in >> >>>>>> BlueStore >> >>>>>> osd.247 observed slow operation indications in BlueStore >> >>>>>> >> >>>>>> I have never seen this warning before so I've the feeling it is >> somehow >> >>>>>> related to the upgrade and it doesn't seem related to the >> regression >> >>>>>> mentioned in another thread (that should result in an OSD crash). >> >>>>>> Googling quickly, I found this reported on 19.2.1 with SSD where >> in my >> >>>>>> case it is an HDD. I don't know if the workaround mentioned in the >> issue >> >>>>>> (bdev_xxx_discard=true) also applies to 18.2.6... >> >>>>>> >> >>>>>> Did somebody saw this in 18.2.x? Any recommandation? Our plan was, >> >>>>>> according to best practicies described recently in another thread >> to >> >>>>>> move from 18.2.2 to 18.2.6 and then from 18.2.6 to 19.2.2... Will >> 19.2.2 >> >>>>>> clear this issue (at the risk of others as it is probably not >> >>>>>> widely used)? >> >>>>>> >> >>>>>> Best regards, >> >>>>>> >> >>>>>> Michel >> >>>>>> _______________________________________________ >> >>>>>> ceph-users mailing list -- ceph-users@ceph.io >> >>>>>> To unsubscribe send an email to ceph-users-le...@ceph.io >> >>>>> _______________________________________________ >> >>>>> ceph-users mailing list -- ceph-users@ceph.io >> >>>>> To unsubscribe send an email to ceph-users-le...@ceph.io >> >>>> _______________________________________________ >> >>>> ceph-users mailing list -- ceph-users@ceph.io >> >>>> To unsubscribe send an email to ceph-users-le...@ceph.io >> >>> _______________________________________________ >> >>> ceph-users mailing list -- ceph-users@ceph.io >> >>> To unsubscribe send an email to ceph-users-le...@ceph.io >> >> >> >> _______________________________________________ >> >> ceph-users mailing list -- ceph-users@ceph.io >> >> To unsubscribe send an email to ceph-users-le...@ceph.io >> > _______________________________________________ >> > ceph-users mailing list -- ceph-users@ceph.io >> > To unsubscribe send an email to ceph-users-le...@ceph.io >> _______________________________________________ >> ceph-users mailing list -- ceph-users@ceph.io >> To unsubscribe send an email to ceph-users-le...@ceph.io >> > _______________________________________________ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io