Re: [ceph-users] Power outages!!! help!

hjcho616 Fri, 01 Sep 2017 07:02:11 -0700

Looks like it has been rescued... Only 1 error as we saw before in the smart 
log!# ddrescue -f /dev/sda /dev/sdc ./rescue.logGNU ddrescue 1.21Press Ctrl-C 
to interrupt     ipos:    1508 GB, non-trimmed:        0 B,  current rate:      
 0 B/s     opos:    1508 GB, non-scraped:        0 B,  average rate:  88985 
kB/snon-tried:        0 B,     errsize:     4096 B,      run time:  6h 14m 40s  
rescued:    2000 GB,      errors:        1,  remaining time:         n/apercent 
rescued:  99.99%      time since last successful read:         39sFinished      
                 
Still missing partition in the new drive. =P  I found this util called testdisk 
for broken partition tables.  Will try that tonight. =P
Regards,Hong

On Wednesday, August 30, 2017 9:18 AM, Ronny Aasen
<[email protected]> wrote:

On 30.08.2017 15:32, Steve Taylor wrote:

I'm not familiar with dd_rescue, but I've just been reading about it. I'm not
seeing any features that would be beneficial in this scenario that aren't also
available in dd. What specific features give it "really a far better chance of
restoring a copy of your disk" than dd? I'm always interested in learning about
new recovery tools.
i see i wrote dd_rescue from old habit, but the package one should use on
debian is gddrescue or also called gnu ddrecue.

this page have some details on the differences on dd vs the ddrescue variants.
http://www.toad.com/gnu/sysadmin/index.html#ddrescue

kind regards
Ronny Aasen

| If you are not the intended recipient of this message or received it
erroneously, please notify the sender and delete it, together with any
attachments, and be advised that any dissemination or copying of this message
is prohibited. |

On Tue, 2017-08-29 at 21:49 +0200, Willem Jan Withagen wrote:
On 29-8-2017 19:12, Steve Taylor wrote:

Hong,Probably your best chance at recovering any data without
special,expensive, forensic procedures is to perform a dd from /dev/sdb
tosomewhere else large enough to hold a full disk image and attempt torepair
that. You'll want to use 'conv=noerror' with your dd commandsince your disk is
failing. Then you could either re-attach the OSDfrom the new source or attempt
to retrieve objects from the filestoreon it.
Like somebody else already pointed outIn problem "cases like disk, use
dd_rescue.It has really a far better chance of restoring a copy of your
disk--WjW
I have actually done this before by creating an RBD that matches thedisk size,
performing the dd, running xfs_repair, and eventuallyadding it back to the
cluster as an OSD. RBDs as OSDs is certainly atemporary arrangement for repair
only, but I'm happy to report that itworked flawlessly in my case. I was able
to weight the OSD to 0,offload all of its data, then remove it for a full
recovery, at whichpoint I just deleted the RBD.The possibilities afforded by
Ceph inception are endless. ☺ Steve Taylor | Senior Software Engineer |
StorageCraft Technology Corporation380 Data Drive Suite 300 | Draper | Utah |
84020Office: 801.871.2799 | If you are not the intended recipient of this
message or received it erroneously, please notify the sender and delete it,
together with any attachments, and be advised that any dissemination or copying
of this message is prohibited. On Mon, 2017-08-28 at 23:17 +0100, Tomasz
Kusmierz wrote:
Rule of thumb with batteries is:- more “proper temperature” you run them at the
more life you get outof them- more battery is overpowered for your application
the longer it willsurvive. Get your self a LSI 94** controller and use it as
HBA and you will befine. but get MORE DRIVES !!!!! …
On 28 Aug 2017, at 23:10, hjcho616 <[email protected]> wrote:Thank you Tomasz
and Ronny. I'll have to order some hdd soon andtry these out. Car battery
idea is nice! I may try that.. =) Dothey last longer? Ones that fit the UPS
original battery specdidn't last very long... part of the reason why I gave up
on them..=P My wife probably won't like the idea of car battery hanging
outthough ha!The OSD1 (one with mostly ok OSDs, except that smart
failure)motherboard doesn't have any additional SATA connectors available.
Would it be safe to add another OSD host?Regards,HongOn Monday, August 28, 2017
4:43 PM, Tomasz Kusmierz <[email protected]> wrote:Sorry for being brutal
… anyway 1. get the battery for UPS ( a car battery will do as well, I’vemoded
on ups in the past with truck battery and it was working likea charm :D )2. get
spare drives and put those in because your cluster CAN NOTget out of error due
to lack of space3. Follow advice of Ronny Aasen on hot to recover data from
harddrives 4 get cooling to drives or you will loose more !
On 28 Aug 2017, at 22:39, hjcho616 <[email protected]> wrote:Tomasz,Those
machines are behind a surge protector. Doesn't appear tobe a good one! I do
have a UPS... but it is my fault... nobattery. Power was pretty reliable for a
while... and UPS wasjust beeping every chance it had, disrupting some sleep..
=P Sorunning on surge protector only. I am running this in homeenvironment.
So far, HDD failures have been very rare for thisenvironment. =) It just
doesn't get loaded as much! I am notsure what to expect, seeing that "unfound"
and just a feeling ofpossibility of maybe getting OSD back made me excited
about it.=) Thanks for letting me know what should be the priority. Ijust lack
experience and knowledge in this. =) Please do continueto guide me though this.
Thank you for the decode of that smart messages! I do agree thatlooks like it
is on its way out. I would like to know how to getgood portion of it back if
possible. =)I think I just set the size and min_size to 1.# ceph osd lspools0
data,1 metadata,2 rbd,# ceph osd pool set rbd size 1set pool 2 size to 1# ceph
osd pool set rbd min_size 1set pool 2 min_size to 1Seems to be doing some
backfilling work.# ceph healthHEALTH_ERR 22 pgs are stuck inactive for more
than 300 seconds; 2pgs backfill_toofull; 74 pgs backfill_wait; 3 pgs
backfilling;108 pgs degraded; 6 pgs down; 6 pgs inconsistent; 6 pgs peering;7
pgs recovery_wait; 16 pgs stale; 108 pgs stuck degraded; 6 pgsstuck inactive;
16 pgs stuck stale; 130 pgs stuck unclean; 101pgs stuck undersized; 101 pgs
undersized; 1 requests are blocked
32 sec; recovery 1790657/4502340 objects degraded (39.772%);
recovery 641906/4502340 objects misplaced (14.257%); recovery147/2251990
unfound (0.007%); 50 scrub errors; mds cluster isdegraded; no legacy OSD
present but 'sortbitwise' flag is not setRegards,HongOn Monday, August 28, 2017
4:18 PM, Tomasz Kusmierz <[email protected]> wrote:So to decode few things
about your disk: 1 Raw_Read_Error_Rate 0x002f 100 100 051 Pre-fail
Always - 3737 read erros and only one sector marked as pending - fun
disk:/ 181 Program_Fail_Cnt_Total 0x0022 099 099 000 Old_age Always
- 35325174So firmware has quite few bugs, that’s nice191
G-Sense_Error_Rate 0x0022 100 100 000 Old_age Always -
2855disk was thrown around while operational even more nice.194
Temperature_Celsius 0x0002 047 041 000 Old_age Always - 53
(Min/Max 15/59)if your disk passes 50 you should not consider using it,
hightemperatures demagnetise plate layer and you will see more errorsin very
near future.197 Current_Pending_Sector 0x0032 100 100 000 Old_age Always
- 1as mentioned before :)200 Multi_Zone_Error_Rate 0x002a 100 100
000 Old_age Always - 4222your heads keep missing tracks … bent ? I
don’t even know how tocomment here.generally fun drive you’ve got there …
rescue as much as you canand throw it away !!!

_______________________________________________ceph-users mailing
[email protected]http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Power outages!!! help!

Reply via email to