[ceph-users] Re: Disk Failure Predication cloud module?

2022-01-25 Thread Marc
Is there also (going to be) something available that works 'offline'? ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Disk Failure Predication cloud module?

2022-01-25 Thread Yaarit Hatuka
Hi Jake, Many thanks for contributing the data. Indeed, our data scientists use the data from Backblaze too. Have you found strong correlations between device health metrics (such as reallocated sector count, or any combination of attributes) and read/write errors in /var/log/messages from what

[ceph-users] Re: Disk Failure Predication cloud module?

2022-01-21 Thread Jake Grimmett
Hi Yaarit, Thanks for confirming. telemetry is enabled on our clusters, so are contributing data on ~1270 disks. Are you able to use data from backblaze? Deciding on when an OSD is starting to fail is a dark art, we are still hoping that the Disk Failure Predication module will take the gue

[ceph-users] Re: Disk Failure Predication cloud module?

2022-01-20 Thread Yaarit Hatuka
Hi Jake, diskprediction_cloud module is no longer available in Pacific. There are efforts to enhance the diskprediction module, using our anonymized device telemetry data, which is aimed at building a dynamic, large, diverse, free and open data set to help data scientists create accurate failure p

[ceph-users] Re: disk failure

2019-09-05 Thread Anthony D'Atri
Are you using Filestore? If so directory splitting can manifest this way. Check your networking too, packet loss between OSD nodes or between OSD nodes and the mons can also manifest this way, say if bonding isn’t working properly or you have a bad link. But as suggested below, check the OSD

[ceph-users] Re: disk failure

2019-09-05 Thread Nathan Fish
Disks failing should cause the OSD to exit, be marked down, and after around 15 minutes marked out. That's routine. An OSD flapping is something you need to look into. It could be a flaky drive, or extreme load as was mentioned. On Thu, Sep 5, 2019 at 2:27 PM solarflow99 wrote: > > dicks are exp

[ceph-users] Re: disk failure

2019-09-05 Thread solarflow99
dicks are expected to fail, and every once in a while i'll lose one, so that was expected and didn't come as any surprise to me. Are you suggesting failed drives almost always stay down and out? On Thu, Sep 5, 2019 at 11:13 AM Ashley Merrick wrote: > I would suggest checking the logs and seein

[ceph-users] Re: disk failure

2019-09-05 Thread Ashley Merrick
I would suggest checking the logs and seeing the exact reason its being marked out. If the disk is being hit hard and their is heavy I/O delays then Ceph may see that as a delayed reply outside of the set windows and mark as out. There is some variables that can be changed to give an OSD more t

[ceph-users] Re: disk failure

2019-09-05 Thread solarflow99
no, I mean ceph sees it as a failure and marks it out for a while On Thu, Sep 5, 2019 at 11:00 AM Ashley Merrick wrote: > Is your HD actually failing and vanishing from the OS and then coming back > shortly? > > Or do you just mean your OSD is crashing and then restarting it self > shortly later

[ceph-users] Re: disk failure

2019-09-05 Thread Ashley Merrick
Is your HD actually failing and vanishing from the OS and then coming back shortly? Or do you just mean your OSD is crashing and then restarting it self shortly later? On Fri, 06 Sep 2019 01:55:25 +0800 solarflo...@gmail.com wrote One of the things i've come to notice is when HDD