[ceph-users] Re: SLOW_OPS problems

2024-10-15 Thread Tim Sauerbein
> On 15 Oct 2024, at 18:57, Kai Stian Olstad wrote: > > On Tue, Oct 15, 2024 at 05:36:15PM +, Mat Young wrote: >> Looking at the smartlog seems to show 63C current temp with 53C as worst >> case which doesn’t make a lot of sense. Could they drive be thermally >> throttling? > > That is th

[ceph-users] Re: SLOW_OPS problems

2024-10-15 Thread Mark Nelson
15, 2024, at 1:36 PM, Mat Young wrote: Looking at the smartlog seems to show 63C current temp with 53C as worst case which doesn’t make a lot of sense. Could they drive be thermally throttling? Rgds mat From: Tim Sauerbein Sent: Tuesday, October 15, 2024 11:21 AM To: ceph-users Subject:

[ceph-users] Re: SLOW_OPS problems

2024-10-15 Thread Kai Stian Olstad
On Tue, Oct 15, 2024 at 05:36:15PM +, Mat Young wrote: Looking at the smartlog seems to show 63C current temp with 53C as worst case which doesn’t make a lot of sense. Could they drive be thermally throttling? That is the normalized value, shouldn't the value in RAW_VALUE be used instead?

[ceph-users] Re: SLOW_OPS problems

2024-10-15 Thread Mat Young
Could they drive be thermally throttling? > > Rgds > > mat > > From: Tim Sauerbein > Sent: Tuesday, October 15, 2024 11:21 AM > To: ceph-users > Subject: [ceph-users] Re: SLOW_OPS problems > > [External: Do not click links or open attachments without verifying &

[ceph-users] Re: SLOW_OPS problems

2024-10-15 Thread Anthony D'Atri
ung wrote: > > Looking at the smartlog seems to show 63C current temp with 53C as worst case > which doesn’t make a lot of sense. Could they drive be thermally throttling? > > Rgds > > mat > > From: Tim Sauerbein > Sent: Tuesday, October 15, 2024 11:21 AM > To: ce

[ceph-users] Re: SLOW_OPS problems

2024-10-15 Thread Mat Young
Looking at the smartlog seems to show 63C current temp with 53C as worst case which doesn’t make a lot of sense. Could they drive be thermally throttling? Rgds mat From: Tim Sauerbein Sent: Tuesday, October 15, 2024 11:21 AM To: ceph-users Subject: [ceph-users] Re: SLOW_OPS problems

[ceph-users] Re: SLOW_OPS problems

2024-10-15 Thread Tim Sauerbein
Sorry, forgot to mention: I did a secure erase on the drive yesterday, added it to the OSD again with the same result of slow ops a few hours later. > On 15 Oct 2024, at 16:07, Tim Sauerbein wrote: > >> On 14 Oct 2024, at 16:01, Anthony D'Atri wrote: >> >> Remind me, have you sent me a full

[ceph-users] Re: SLOW_OPS problems

2024-10-15 Thread Tim Sauerbein
> On 14 Oct 2024, at 16:01, Anthony D'Atri wrote: > > Remind me, have you sent me a full `smartctl -a` output for this drive? See here, looks good though: https://gist.github.com/sauerbein/6423231adb954d28c8c82a8422256355 > If there’s a firmware update available, updating it with a subsequent

[ceph-users] Re: SLOW_OPS problems

2024-10-14 Thread Anthony D'Atri
>>> Out of curiosity - have you found out what was the problem with that OSD? >>> Some hardware issues? >> I guess the SSD is faulty, even though it doesn't show any issues in SMART. >> I will replace it next week to bring the OSD back online and will report if >> the issue reappears, which wo

[ceph-users] Re: SLOW_OPS problems

2024-10-14 Thread Mark Nelson
On 10/14/24 05:05, Tim Sauerbein wrote: On 14 Oct 2024, at 10:12, Igor Fedotov wrote: Out of curiosity - have you found out what was the problem with that OSD? Some hardware issues? I guess the SSD is faulty, even though it doesn't show any issues in SMART. I will replace it next week to br

[ceph-users] Re: SLOW_OPS problems

2024-10-14 Thread Tim Sauerbein
> On 14 Oct 2024, at 10:12, Igor Fedotov wrote: > > Out of curiosity - have you found out what was the problem with that OSD? > Some hardware issues? I guess the SSD is faulty, even though it doesn't show any issues in SMART. I will replace it next week to bring the OSD back online and will

[ceph-users] Re: SLOW_OPS problems

2024-10-14 Thread Igor Fedotov
Hi Tim, thanks for the feedback, highly appreciated. Out of curiosity - have you found out what was the problem with that OSD? Some hardware issues? Regards, Igor On 10/14/2024 11:58 AM, Tim Sauerbein wrote: Hi Igor, Thanks for the valuable advice! I just wanted to provide feedback that

[ceph-users] Re: SLOW_OPS problems

2024-10-14 Thread Tim Sauerbein
Hi Igor, Thanks for the valuable advice! I just wanted to provide feedback that it was indeed one single OSD causing the issues which I could triangulate as you said. After removing this OSD, the slow ops haven't occurred anymore. Best regards, Tim > On 1 Oct 2024, at 12:42, Igor Fedotov wrot

[ceph-users] Re: SLOW_OPS problems

2024-10-01 Thread Igor Fedotov
Hi Tim, first of all - given the provided logs - all the slow operastions are stuck in 'waiting for sub ops' state. Which apparently means that reported OSDs aren't suffering from local issues but stuck on replication operations to their peer OSDs. From my experince even a single "faulty" o

[ceph-users] Re: SLOW_OPS problems

2024-09-30 Thread Anthony D'Atri
My point is that you may have more 10-30s delays that aren’t surfaced. > On Sep 30, 2024, at 10:17 AM, Tim Sauerbein wrote: > > Thanks for the replies everyone! > >> On 30 Sep 2024, at 13:10, Anthony D'Atri wrote: >> >> Remember that slow ops are a top of the iceberg thing, you only see on

[ceph-users] Re: SLOW_OPS problems

2024-09-30 Thread Tim Sauerbein
Thanks for the replies everyone! > On 30 Sep 2024, at 13:10, Anthony D'Atri wrote: > > Remember that slow ops are a top of the iceberg thing, you only see ones that > crest above 30s So far metrics of the hosted VMs show no other I/O slowdown except when these hiccups occur. > On 30 Sep 2024

[ceph-users] Re: SLOW_OPS problems

2024-09-30 Thread Alexander Schreiber
On Mon, Sep 30, 2024 at 11:04:30AM +0100, Tim Sauerbein wrote: > > > On 30 Sep 2024, at 06:23, Joachim Kraftmayer > > wrote: > > > > do you see the behaviour across all devices or does it only affect one > > type/manufacturer? > > All devices are affected equally, every time one or two random

[ceph-users] Re: SLOW_OPS problems

2024-09-30 Thread Igor Fedotov
Hi Tim, there is no log attached to your post, you better share it via some other means. BTW - what log did you mean - monitor or OSD one? It would be nice to have logs for a couple of OSDs suffering from slow ops, preferably relevant to two different cases. Thanks, Igor On 9/29/2024 3

[ceph-users] Re: SLOW_OPS problems

2024-09-30 Thread Anthony D'Atri
Remember that slow ops are a top of the iceberg thing, you only see ones that crest above 30s > On Sep 30, 2024, at 6:06 AM, Tim Sauerbein wrote: > >  >> On 30 Sep 2024, at 06:23, Joachim Kraftmayer >> wrote: >> >> do you see the behaviour across all devices or does it only affect one >> t

[ceph-users] Re: SLOW_OPS problems

2024-09-30 Thread Tim Sauerbein
> On 30 Sep 2024, at 06:23, Joachim Kraftmayer > wrote: > > do you see the behaviour across all devices or does it only affect one > type/manufacturer? All devices are affected equally, every time one or two random ODSs report slow ops. So I don't think the SSDs are to blame. Thanks, Tim _

[ceph-users] Re: SLOW_OPS problems

2024-09-29 Thread Joachim Kraftmayer
Hi Tim, do you see the behaviour across all devices or does it only affect one type/manufacturer? Joachim www.clyso.com Hohenzollernstr. 27, 80801 Munich Utting a. A. | HR: Augsburg | HRB: 25866 | USt. ID-Nr.: DE2754306 Tim Sauerbein schrieb am So., 29. Sept. 2024, 23:32: > Dear list, > >