Hi Craig Lewis,
My pool have 300TB DATA, I can't recreate a new pool, then copying data by "ceph cp pool" (take very long time). I upgraded Ceph to Giant (0.86), but still error :(( I think my proplem is "objects misplaced (0.320%)" # ceph pg 23.96 query "num_objects_missing_on_primary": 0, "num_objects_degraded": 0, "NUM_OBJECTS_MISPLACED": 79, cluster xxxxxx-xxxxx-xxxxx-xxxxx health HEALTH_WARN 225 pgs degraded; 2 pgs repair; 225 pgs stuck degraded; 263 pgs stuck unclean; 225 pgs stuck undersized; 225 pgs undersized; recovery 308759/54799506 objects degraded (0.563%); 175270/54799506 objects misplaced (0.320%); 1/130 in osds are down; flags noout,nodeep-scrub pgmap v28905830: 14973 pgs, 23 pools, 70255 GB data, 17838 kobjects 206 TB used, 245 TB / 452 TB avail 308759/54799506 objects degraded (0.563%); 175270/54799506 OBJECTS MISPLACED (0.320%) 14708 active+clean 38 ACTIVE+REMAPPED 225 ACTIVE+UNDERSIZED+DEGRADED client io 35068 kB/s rd, 71815 kB/s wr, 4956 op/s - Checking in ceph log: 2014-10-28 15:33:59.733177 7f6a7f1ab700 5 OSD.21 pg_epoch: 103718 pg[23.96( v 103713'171086 (103609'167229,103713'171086] local-les=103715 n=85 ec=25000 les/c 103715/103710 103714/103714/103236) [92,21,78] r=1 lpr=103714 pi=100280-103713/118 luod=0'0 crt=103713'171086 active] ENTER STARTED/REPLICAACTIVE/REPNOTRECOVERING Then logging many failed log: (on many objects eg: c03fe096/rbd_data.5348922ae8944a.000000000000306b,..) 2014-10-28 15:33:59.343435 7f6a7e1a9700 5 -- op tracker -- seq: 1793, time: 2014-10-28 15:33:59.343435, event: done, op: MOSDPGPush(23.96 103718 [PushOp(C03FE096/RBD_DATA.5348922AE8944A.000000000000306B/HEAD//24, version: 103622'283374, data_included: [0~4194304], data_size: 0, omap_header_size: 0, omap_entries_size: 0, attrset_size: 2, recovery_info: ObjectRecoveryInfo(c03fe096/rbd_data.5348922ae8944a.000000000000306b/head//24@103622'283374, copy_subset: [0~4194304], clone_subset: {}), after_progress: ObjectRecoveryProgress(!first, data_recovered_to:4194304, data_complete:true, omap_recovered_to:, omap_complete:true), before_progress: ObjectRecoveryProgress(first, data_recovered_to:0, data_complete:false, omap_recovered_to:, omap_complete:false)),PushOp(4120f096/rbd_data.7a63d32ae8944a.0000000000000083/head//24, version: 103679'295624, data_included: [0~4194304], data_size: 0, omap_header_size: 0, omap_entries_size: 0, attrset_size: 2, recovery_info: ObjectRecoveryInfo(4120f096/rbd_data.7a63d32ae8944a.0000000000000083/head//24@103679'295624, copy_subset: [0~4194304], clone_subset: {}), after_progress: ObjectRecoveryProgress(!first, data_recovered_to:4194304, data_complete:true, omap_recovered_to:, omap_complete:true), before_progress: ObjectRecoveryProgress(first, data_recovered_to:0, data_complete:false, omap_recovered_to:, omap_complete:false))]) Thanks! -- Tuan HaNoi-VietNam On 2014-10-28 01:35, Craig Lewis wrote: > My experience is that once you hit this bug, those PGs are gone. I tried > marking the primary OSD OUT, which caused this problem to move to the new > primary OSD. Luckily for me, my affected PGs were using replication state in > the secondary cluster. I ended up deleting the whole pool and recreating it. > > Which pools are 7 and 23? It's possible that it's something that easy to > replace. > > On Fri, Oct 24, 2014 at 9:26 PM, Ta Ba Tuan <tua...@vccloud.vn> wrote: > >> Hi Craig, Thanks for replying. >> When i started that osd, Ceph Log from "ceph -w" warns pgs 7.9d8 23.596, >> 23.9c6, 23.63 can't recovery as pasted log. >> >> Those pgs are "active+degraded" state. >> #ceph pg map 7.9d8 >> osdmap e102808 pg 7.9d8 (7.9d8) -> up [93,49] acting [93,49] (When start >> osd.21 then pg 7.9d8 and three remain pgs to changed to state >> "active+recovering") . osd.21 still down after following logs:
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com