[ceph-users] Re: SLOW_OPS problems

2024-10-15 Thread Tim Sauerbein
> On 15 Oct 2024, at 18:57, Kai Stian Olstad wrote: > > On Tue, Oct 15, 2024 at 05:36:15PM +, Mat Young wrote: >> Looking at the smartlog seems to show 63C current temp with 53C as worst >> case which doesn’t make a lot of sense. Could they drive be thermally >> throttling? > > That is th

[ceph-users] Re: SLOW_OPS problems

2024-10-15 Thread Tim Sauerbein
Sorry, forgot to mention: I did a secure erase on the drive yesterday, added it to the OSD again with the same result of slow ops a few hours later. > On 15 Oct 2024, at 16:07, Tim Sauerbein wrote: > >> On 14 Oct 2024, at 16:01, Anthony D'Atri wrote: >> >> Remin

[ceph-users] Re: SLOW_OPS problems

2024-10-15 Thread Tim Sauerbein
> On 14 Oct 2024, at 16:01, Anthony D'Atri wrote: > > Remind me, have you sent me a full `smartctl -a` output for this drive? See here, looks good though: https://gist.github.com/sauerbein/6423231adb954d28c8c82a8422256355 > If there’s a firmware update available, updating it with a subsequent

[ceph-users] Re: SLOW_OPS problems

2024-10-14 Thread Tim Sauerbein
> On 14 Oct 2024, at 10:12, Igor Fedotov wrote: > > Out of curiosity - have you found out what was the problem with that OSD? > Some hardware issues? I guess the SSD is faulty, even though it doesn't show any issues in SMART. I will replace it next week to bring the OSD back online and will

[ceph-users] Re: SLOW_OPS problems

2024-10-14 Thread Tim Sauerbein
Hi Igor, Thanks for the valuable advice! I just wanted to provide feedback that it was indeed one single OSD causing the issues which I could triangulate as you said. After removing this OSD, the slow ops haven't occurred anymore. Best regards, Tim > On 1 Oct 2024, at 12:42, Igor Fedotov wrot

[ceph-users] Re: SLOW_OPS problems

2024-09-30 Thread Tim Sauerbein
Thanks for the replies everyone! > On 30 Sep 2024, at 13:10, Anthony D'Atri wrote: > > Remember that slow ops are a top of the iceberg thing, you only see ones that > crest above 30s So far metrics of the hosted VMs show no other I/O slowdown except when these hiccups occur. > On 30 Sep 2024

[ceph-users] Re: SLOW_OPS problems

2024-09-30 Thread Tim Sauerbein
> On 30 Sep 2024, at 06:23, Joachim Kraftmayer > wrote: > > do you see the behaviour across all devices or does it only affect one > type/manufacturer? All devices are affected equally, every time one or two random ODSs report slow ops. So I don't think the SSDs are to blame. Thanks, Tim _

[ceph-users] SLOW_OPS problems

2024-09-29 Thread Tim Sauerbein
Dear list, I have a small cluster (Reef 18.2.4) with 7 hosts and 3-4 OSDs each (960GB/1.92TB mixed Intel D3-S4610, Samsung SM883, PM897 SSDs): cluster: id: ecff3ce8-539b-443e-a492-da428f4aa9e9 health: HEALTH_OK services: mon: 5 daemons, quorum titan,mangan,kalium,argon,chrom