Personally I would suggest to: - change minimal replication type to OSD (from default host) - remove the OSD from the host with all those "down OSD’s" (note that they are down not out which makes it more weird) - let single node cluster stabilise, yes performance will suck but at least you will have data on two copies on singular host … better this than nothing. - fix whatever issues you have on host OSD2 - add all osd on OSD2 and mark all osd from OSD1 with weight 0 - this will make ceph migrate all data away from host OSD1 - fix all the problem you’ve got on host OSD1
reason I suggest that is that is seems that you’ve got issues everywhere and since you are running a production environment (at least it seem like that to me) data and down time is main priority. > On 28 Aug 2017, at 11:58, Ronny Aasen <[email protected]> wrote: > > On 28. aug. 2017 08:01, hjcho616 wrote: >> Hello! >> I've been using ceph for long time mostly for network CephFS storage, even >> before Argonaut release! It's been working very well for me. Yes, I had >> some power outtages before and asked few questions on this list before and >> got resolved happily! Thank you all! >> Not sure why but we've been having quite a bit of power outages lately. >> Ceph appear to be running OK with those going on.. so I was pretty happy and >> didn't thought much of it... till yesterday, When I started to move some >> videos to cephfs, ceph decided that it was full although df showed only 54% >> utilization! Then I looked up, some of the osds were down! (only 3 at that >> point!) >> I am running pretty simple ceph configuration... I have one machine running >> MDS and mon named MDS1. Two OSD machines with 5 2TB HDDs and 1 SSD for >> journal named OSD1 and OSD2. >> At the time, I was running jewel 10.2.2. I looked at some of downed OSD's >> log file and googled some of them... they appeared to be tied to version >> 10.2.2. So I just upgraded all to 10.2.9. Well that didn't solve my >> problems.. =P While looking at some of this.. there was another power >> outage! D'oh! I may need to invest in a UPS or something... Until this >> happened, all of the osd down were from OSD2. But OSD1 took a hit! >> Couldn't boot, because osd-0 was damaged... I tried xfs_repair -L /dev/sdb1 >> as suggested by command line.. I was able to mount it again, phew, reboot... >> then /dev/sdb1 is no longer accessible! Noooo!!! >> So this is what I have today! I am a bit concerned as half of the osds are >> down! and osd.0 doesn't look good at all... >> # ceph osd tree >> ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY >> -1 16.24478 root default >> -2 8.12239 host OSD1 >> 1 1.95250 osd.1 up 1.00000 1.00000 >> 0 1.95250 osd.0 down 0 1.00000 >> 7 0.31239 osd.7 up 1.00000 1.00000 >> 6 1.95250 osd.6 up 1.00000 1.00000 >> 2 1.95250 osd.2 up 1.00000 1.00000 >> -3 8.12239 host OSD2 >> 3 1.95250 osd.3 down 0 1.00000 >> 4 1.95250 osd.4 down 0 1.00000 >> 5 1.95250 osd.5 down 0 1.00000 >> 8 1.95250 osd.8 down 0 1.00000 >> 9 0.31239 osd.9 up 1.00000 1.00000 >> This looked alot better before that last extra power outage... =( Can't >> mount it anymore! >> # ceph health >> HEALTH_ERR 22 pgs are stuck inactive for more than 300 seconds; 44 pgs >> backfill_toofull; 80 pgs backfill_wait; 122 pgs degraded; 6 pgs down; 8 pgs >> inconsistent; 6 pgs peering; 2 pgs recovering; 18 pgs recovery_wait; 16 pgs >> stale; 122 pgs stuck degraded; 6 pgs stuck inactive; 16 pgs stuck stale; 159 >> pgs stuck unclean; 102 pgs stuck undersized; 102 pgs undersized; 1 requests >> are blocked > 32 sec; recovery 1803466/4503980 objects degraded (40.042%); >> recovery 692976/4503980 objects misplaced (15.386%); recovery 147/2251990 >> unfound (0.007%); 1 near full osd(s); 54 scrub errors; mds cluster is >> degraded; no legacy OSD present but 'sortbitwise' flag is not set >> Each of osds are showing different failure signature. >> I've uploaded osd log with debug osd = 20, debug filestore = 20, and debug >> ms = 20. You can find it in below links. Let me know if there is preferred >> way to share this! >> https://drive.google.com/open?id=0By7YztAJNGUWQXItNzVMR281Snc >> (ceph-osd.3.log) >> https://drive.google.com/open?id=0By7YztAJNGUWYmJBb3RvLVdSQWc >> (ceph-osd.4.log) >> https://drive.google.com/open?id=0By7YztAJNGUWaXhRMlFOajN6M1k >> (ceph-osd.5.log) >> https://drive.google.com/open?id=0By7YztAJNGUWdm9BWFM5a3ExOFE >> (ceph-osd.8.log) >> So how does this look? Can this be fixed? =) If so please let me know. I >> used to take backups but since it grew so big, I wasn't able to do so >> anymore... and would like to get most of these back if I can. Please let me >> know if you need more info! >> Thank you! >> Regards, >> Hong > > with only 2 osd host. how are you doing replication ? i assume you use > size=2, and that is somewhat ok, if you have min_size=2, but if you have > min_size=1 it can quickly become a big problem of lost objects. > > with size=2, min_size=2 your data should be on 2 drives safely(if you can get > one of them running again), but your cluster will block when there is an > issue. > > if at all possible i would add a third osd node in your cluster. so your OK > PG's can replicate to that and you can work on the down osd's without fear of > loosing additional working osd's > > Also some of your logs contains lines like... > > failed to bind the UNIX domain socket to '/var/run/ceph/ceph-osd.3.asok': > (17) File exists > > filestore(/var/lib/ceph/osd/ceph-3) lock_fsid failed to lock > /var/lib/ceph/osd/ceph-3/fsid, is another ceph-osd still running? (11) > Resource temporarily unavailable > > 7faf16e23800 -1 osd.3 0 OSD::pre_init: object store > '/var/lib/ceph/osd/ceph-3' is currently in use. (Is ceph-osd already running?) > > 7faf16e23800 -1 ** ERROR: osd pre_init failed: (16) Device or resource busy > > > > This can indicate that you have a dead osd3 process keeping the resources > open, and preventing a new osd from starting. > > check with ps aux if you can see any ceph processes. If you do find > somthging relating to your down osds's you should try stopping it normally, > and if that fails. killing it manually. before trying to restart the osd. > > also check dmesg if you have messages relating to faulty hardware or OOM > killer there. i have had experiences with the OOM killer where the osd node > became unreliable until i rebooted the machine. > > > kind regards, and good luck > Ronny Aasen > > _______________________________________________ > ceph-users mailing list > [email protected] > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list [email protected] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
