Hi, -- First, let me start with the bonus... I migrated from hammer => jewel and followed the migration instructions... but migrations instructions are missing this : #chown -R ceph:ceph /var/log/ceph I just discoved this was the reason I found no log nowhere about my current issue :/ --
This is maybe the 3rd time this happens to me ... This time I'd like to try to understand what happens. So. ceph-10.2.0-0.el7.x86_64+Cent0S 7.2 here. Ceph health was happy, but any rbd operation was hanging - hence : ceph was hung, and so were the test VMs running on it. I placed my VM in an EC pool on top of which I overlayed an RBD pool with SSDs. The EC pool is defined as being a 3+1 pool, with 5 hosts hosting the OSDs (and the failure domain is set to hosts) "Ceph -w" wasn't displaying new status lines as usual, but ceph health (detail) wasn't saying anything would be wrong. After looking at one node, I found that ceph logs were empty on one node, so I decided to restart the OSDs on that one using : systemctl restart ceph-osd@* After I did that, ceph -w got to life again , but telling me there was a dead MON - which I restarted too. I watched some kind of recovery happening, and after a few seconds/minutes, I now see : [root@ceph0 ~]# ceph health detail HEALTH_WARN 4 pgs degraded; 3 pgs recovering; 1 pgs recovery_wait; 4 pgs stuck unclean; recovery 57/373846 objects degraded (0.015%); recovery 57/110920 unfound (0.051%) pg 691.65 is stuck unclean for 310704.556119, current state active+recovery_wait+degraded, last acting [44,99,69,9] pg 691.1e5 is stuck unclean for 493631.370697, current state active+recovering+degraded, last acting [77,43,20,99] pg 691.12a is stuck unclean for 14521.475478, current state active+recovering+degraded, last acting [42,56,7,106] pg 691.165 is stuck unclean for 14521.474525, current state active+recovering+degraded, last acting [21,71,24,117] pg 691.165 is active+recovering+degraded, acting [21,71,24,117], 15 unfound pg 691.12a is active+recovering+degraded, acting [42,56,7,106], 1 unfound pg 691.1e5 is active+recovering+degraded, acting [77,43,20,99], 2 unfound pg 691.65 is active+recovery_wait+degraded, acting [44,99,69,9], 39 unfound recovery 57/373846 objects degraded (0.015%) recovery 57/110920 unfound (0.051%) Damn. Last time this happened, I was forced to declare lost the PGs in order to recover a "healthy" ceph, because ceph does not want to revert PGs in EC pools. But one of the VMs started hanging randomly on disk IOs... This same VM is now down, and I can't remove its disk from rbd, it's hanging at 99% - I could work that around by renaming the file and re-installing the VM on a new disk, but anyway, I'd like to understand+fix+make sure this does not happen again. We sometimes suffer power cuts here : if restarting daemons kills ceph data, I cannot think of what would happen in case of power cut... Back to the unfound objects. I have no OSD down that would be in the cluster (only 1 down, and I put it myself down - OSD.46 - , but set its weight to 0 last week) I can query the PGs, but I don't understand what I see in there. For instance : #ceph pg 691.65 query (...) "num_objects_missing": 0, "num_objects_degraded": 39, "num_objects_misplaced": 0, "num_objects_unfound": 39, "num_objects_dirty": 138, And then for 2 peers I see : "state": "active+undersized+degraded", ## undersized ??? (...) "num_objects_missing": 0, "num_objects_degraded": 138, "num_objects_misplaced": 138, "num_objects_unfound": 0, "num_objects_dirty": 138, "blocked_by": [], "up_primary": 44, "acting_primary": 44 If I look at the "missing" objects, I can see something on some OSDs : # ceph pg 691.165 list_missing (...) { "oid": { "oid": "rbd_data.8de32431bd7b7.0000000000000ea7", "key": "", "snapid": -2, "hash": 971513189, "max": 0, "pool": 691, "namespace": "" }, "need": "26521'22595", "have": "25922'22575", "locations": [] } All of the missing objects have this "need/have" discrepancy. I can see such objects in a "691.165" directory on secondary OSDs, but I do not see any 691.165 directory on the primary OSD (44)... ? For instance : [root@ceph0 ~]# ll /var/lib/ceph/osd/ceph-21/current/691.165s0_head/*8de32431bd7b7.0000000000000ea7* -rw-r--r-- 1 ceph ceph 1399392 May 15 13:18 /var/lib/ceph/osd/ceph-21/current/691.165s0_head/rbd\udata.8de32431bd7b7.0000000000000ea7__head_39E81D65__2b3_5843_0 -rw-r--r-- 1 ceph ceph 1399392 May 27 11:07 /var/lib/ceph/osd/ceph-21/current/691.165s0_head/rbd\udata.8de32431bd7b7.0000000000000ea7__head_39E81D65__2b3_ffffffffffffffff_0 Even so : assuming I would have lost data on that OSD 44 (how ??), I would assume ceph would be able to reconstruct the missing data/PG thanks to the erasure codes/replica for RBD , it looks like it's not willing to ?? I already know that telling ceph to forget about the lost PGs is not a good idea, as it will cause the VMs using them to hang afterwards... and I'd prefer seeing ceph as a rock-solid solution allowing one to recover from such "usual" operations... ? If anyone got ideas, I'd be happy ... should I kill osd.44 for good and recreate it ? Thanks P.S : I already tried to : "ceph tell osd.44 injectargs --debug-osd 0/5 --debug-filestore 0/5" Or "ceph tell osd.44 injectargs --debug-osd 20/20 --debug-filestore 20/20" PS : I tried this before I found the bonus at the start of this email...
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com