One of the things to watch out in small clusters is OSDs can get full
rather unexpectedly in recovery/backfill cases:
In your case you have 2 OSD nodes with 5 disks each. Since you have a
replica of 2, each PG will have 1 copy on each host, so if an OSD fails,
all its PGs will have to be re-created on the same host, meaning they
will be distributed only among the 4 OSDs on the same host, which will
quickly bump their usage by nearly 20% each.
the default osd_backfill_full_ratio is 85% so if any of the 4 OSDs was
near 70% util before the failure, it will easily reach 85% and cause the
cluster to error with backfill_toofull message you see. This is why i
suggest you add an extra disk or try your luck reasing
osd_backfill_full_ratio to 92% it may fix things.
/Maged
On 2017-08-29 21:13, hjcho616 wrote:
> Nice! Thank you for the explanation! I feel like I can revive that OSD. =)
> That does sound great. I don't quite have another cluster so waiting for a
> drive to arrive! =)
>
> After setting min and max_min to 1, looks like toofull flag is gone... Maybe
> when I was making that video copy OSDs were already down... and those two
> OSDs were not enough to take too much extra... and on top of it that last
> OSD alive was smaller disk (2TB vs 320GB)... so it probably was filling up
> faster. I should have captured that message... but turned machine off and
> now I am at work. =P When I get back home, I'll try to grab that and share.
> Maybe I don't need to try to add another OSD to that cluster just yet! OSDs
> are about 50% full on OSD1.
>
> So next up, fixing osd0!
>
> Regards,
> Hong
>
> On Tuesday, August 29, 2017 1:05 PM, David Turner <drakonst...@gmail.com>
> wrote:
>
> But it was absolutely awesome to run an osd off of an rbd after the disk
> failed.
>
> On Tue, Aug 29, 2017, 1:42 PM David Turner <drakonst...@gmail.com> wrote:
> To addend Steve's success, the rbd was created in a second cluster in the
> same datacenter so it didn't run the risk of deadlocking that mapping rbds on
> machines running osds has. It is still theoretical to work on the same
> cluster, but more inherently dangerous for a few reasons.
>
> On Tue, Aug 29, 2017, 1:15 PM Steve Taylor <steve.tay...@storagecraft.com>
> wrote: Hong,
>
> Probably your best chance at recovering any data without special,
> expensive, forensic procedures is to perform a dd from /dev/sdb to
> somewhere else large enough to hold a full disk image and attempt to
> repair that. You'll want to use 'conv=noerror' with your dd command
> since your disk is failing. Then you could either re-attach the OSD
> from the new source or attempt to retrieve objects from the filestore
> on it.
>
> I have actually done this before by creating an RBD that matches the
> disk size, performing the dd, running xfs_repair, and eventually
> adding it back to the cluster as an OSD. RBDs as OSDs is certainly a
> temporary arrangement for repair only, but I'm happy to report that it
> worked flawlessly in my case. I was able to weight the OSD to 0,
> offload all of its data, then remove it for a full recovery, at which
> point I just deleted the RBD.
>
> The possibilities afforded by Ceph inception are endless. ☺
>
> Steve Taylor | Senior Software Engineer | StorageCraft Technology Corporation
> 380 Data Drive Suite 300 | Draper | Utah | 84020
> Office: 801.871.2799 |
>
> If you are not the intended recipient of this message or received it
> erroneously, please notify the sender and delete it, together with any
> attachments, and be advised that any dissemination or copying of this message
> is prohibited.
>
> On Mon, 2017-08-28 at 23:17 +0100, Tomasz Kusmierz wrote:
>> Rule of thumb with batteries is:
>> - more "proper temperature" you run them at the more life you get out
>> of them
>> - more battery is overpowered for your application the longer it will
>> survive.
>>
>> Get your self a LSI 94** controller and use it as HBA and you will be
>> fine. but get MORE DRIVES !!!!! ...
>>> On 28 Aug 2017, at 23:10, hjcho616 <hjcho...@yahoo.com> wrote:
>>>
>>> Thank you Tomasz and Ronny. I'll have to order some hdd soon and
>>> try these out. Car battery idea is nice! I may try that.. =) Do
>>> they last longer? Ones that fit the UPS original battery spec
>>> didn't last very long... part of the reason why I gave up on them..
>>> =P My wife probably won't like the idea of car battery hanging out
>>> though ha!
>>>
>>> The OSD1 (one with mostly ok OSDs, except that smart failure)
>>> motherboard doesn't have any additional SATA connectors available.
>>> Would it be safe to add another OSD host?
>>>
>>> Regards,
>>> Hong
>>>
>>>
>>>
>>> On Monday, August 28, 2017 4:43 PM, Tomasz Kusmierz <tom.kusmierz@g
>>> mail.com [1]> wrote:
>>>
>>>
>>> Sorry for being brutal ... anyway
>>> 1. get the battery for UPS ( a car battery will do as well, I've
>>> moded on ups in the past with truck battery and it was working like
>>> a charm :D )
>>> 2. get spare drives and put those in because your cluster CAN NOT
>>> get out of error due to lack of space
>>> 3. Follow advice of Ronny Aasen on hot to recover data from hard
>>> drives
>>> 4 get cooling to drives or you will loose more !
>>>
>>>
>>> > On 28 Aug 2017, at 22:39, hjcho616 <hjcho...@yahoo.com> wrote:
>>> >
>>> > Tomasz,
>>> >
>>> > Those machines are behind a surge protector. Doesn't appear to
>>> > be a good one! I do have a UPS... but it is my fault... no
>>> > battery. Power was pretty reliable for a while... and UPS was
>>> > just beeping every chance it had, disrupting some sleep.. =P So
>>> > running on surge protector only. I am running this in home
>>> > environment. So far, HDD failures have been very rare for this
>>> > environment. =) It just doesn't get loaded as much! I am not
>>> > sure what to expect, seeing that "unfound" and just a feeling of
>>> > possibility of maybe getting OSD back made me excited about it.
>>> > =) Thanks for letting me know what should be the priority. I
>>> > just lack experience and knowledge in this. =) Please do continue
>>> > to guide me though this.
>>> >
>>> > Thank you for the decode of that smart messages! I do agree that
>>> > looks like it is on its way out. I would like to know how to get
>>> > good portion of it back if possible. =)
>>> >
>>> > I think I just set the size and min_size to 1.
>>> > # ceph osd lspools
>>> > 0 data,1 metadata,2 rbd,
>>> > # ceph osd pool set rbd size 1
>>> > set pool 2 size to 1
>>> > # ceph osd pool set rbd min_size 1
>>> > set pool 2 min_size to 1
>>> >
>>> > Seems to be doing some backfilling work.
>>> >
>>> > # ceph health
>>> > HEALTH_ERR 22 pgs are stuck inactive for more than 300 seconds; 2
>>> > pgs backfill_toofull; 74 pgs backfill_wait; 3 pgs backfilling;
>>> > 108 pgs degraded; 6 pgs down; 6 pgs inconsistent; 6 pgs peering;
>>> > 7 pgs recovery_wait; 16 pgs stale; 108 pgs stuck degraded; 6 pgs
>>> > stuck inactive; 16 pgs stuck stale; 130 pgs stuck unclean; 101
>>> > pgs stuck undersized; 101 pgs undersized; 1 requests are blocked
>>> > > 32 sec; recovery 1790657/4502340 objects degraded (39.772%);
>>> > recovery 641906/4502340 objects misplaced (14.257%); recovery
>>> > 147/2251990 unfound (0.007%); 50 scrub errors; mds cluster is
>>> > degraded; no legacy OSD present but 'sortbitwise' flag is not set
>>> >
>>> >
>>> >
>>> > Regards,
>>> > Hong
>>> >
>>> >
>>> > On Monday, August 28, 2017 4:18 PM, Tomasz Kusmierz <tom.kusmierz
>>> > @gmail.com [2]> wrote:
>>> >
>>> >
>>> > So to decode few things about your disk:
>>> >
>>> > 1 Raw_Read_Error_Rate 0x002f 100 100 051 Pre-fail
>>> > Always - 37
>>> > 37 read erros and only one sector marked as pending - fun disk
>>> > :/
>>> >
>>> > 181 Program_Fail_Cnt_Total 0x0022 099 099 000 Old_age
>>> > Always - 35325174
>>> > So firmware has quite few bugs, that's nice
>>> >
>>> > 191 G-Sense_Error_Rate 0x0022 100 100 000 Old_age
>>> > Always - 2855
>>> > disk was thrown around while operational even more nice.
>>> >
>>> > 194 Temperature_Celsius 0x0002 047 041 000 Old_age
>>> > Always - 53 (Min/Max 15/59)
>>> > if your disk passes 50 you should not consider using it, high
>>> > temperatures demagnetise plate layer and you will see more errors
>>> > in very near future.
>>> >
>>> > 197 Current_Pending_Sector 0x0032 100 100 000 Old_age
>>> > Always - 1
>>> > as mentioned before :)
>>> >
>>> > 200 Multi_Zone_Error_Rate 0x002a 100 100 000 Old_age
>>> > Always - 4222
>>> > your heads keep missing tracks ... bent ? I don't even know how to
>>> > comment here.
>>> >
>>> >
>>> > generally fun drive you've got there ... rescue as much as you can
>>> > and throw it away !!!
>>> >
>>> >
>>>
>>>
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Links:
------
[1] http://mail.com/
[2] http://gmail.com/
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com