Hi, Have you tried to query pg state for some stuck or undersized pgs? Maybe some OSD daemons are not right, blocking the reconstruction.
ceph pg 3.be query ceph pg 4.d4 query ceph pg 4.8c query http://docs.ceph.com/docs/jewel/rados/troubleshooting/troubleshooting-pg/ Cordialement / Best regards, Sébastien VIGNERON CRIANN, Ingénieur / Engineer Technopôle du Madrillet 745, avenue de l'Université 76800 Saint-Etienne du Rouvray - France tél. +33 2 32 91 42 91 fax. +33 2 32 91 42 92 http://www.criann.fr mailto:sebastien.vigne...@criann.fr support: supp...@criann.fr > Le 12 nov. 2017 à 10:59, gjprabu <gjpr...@zohocorp.com> a écrit : > > Hi Sebastien > > Thanks for you reply , yes undersize pgs and recovery in process becuase of > we added new osd after getting 2 OSD is near full warning . Yes newly added > osd is reblancing the size. > > > [root@intcfs-osd6 ~]# ceph osd df > ID WEIGHT REWEIGHT SIZE USE AVAIL %USE VAR PGS > 0 3.29749 1.00000 3376G 2875G 501G 85.15 1.26 165 > 1 3.26869 1.00000 3347G 1923G 1423G 57.46 0.85 152 > 2 3.27339 1.00000 3351G 1980G 1371G 59.08 0.88 161 > 3 3.24089 1.00000 3318G 2130G 1187G 64.21 0.95 168 > 4 3.24089 1.00000 3318G 2997G 320G 90.34 1.34 176 > 5 3.32669 1.00000 3406G 2466G 939G 72.42 1.07 165 > 6 3.27800 1.00000 3356G 1463G 1893G 43.60 0.65 166 > > ceph osd crush rule dump > > [ > { > "rule_id": 0, > "rule_name": "replicated_ruleset", > "ruleset": 0, > "type": 1, > "min_size": 1, > "max_size": 10, > "steps": [ > { > "op": "take", > "item": -1, > "item_name": "default" > }, > { > "op": "chooseleaf_firstn", > "num": 0, > "type": "host" > }, > { > "op": "emit" > } > ] > } > ] > > > ceph version 10.2.2 and ceph version 10.2.9 > > > ceph osd pool ls detail > > pool 0 'rbd' replicated size 2 min_size 1 crush_ruleset 0 object_hash > rjenkins pg_num 64 pgp_num 64 last_change 1 flags hashpspool stripe_width 0 > pool 3 'downloads_data' replicated size 2 min_size 1 crush_ruleset 0 > object_hash rjenkins pg_num 250 pgp_num 250 last_change 39 flags hashpspool > crash_replay_interval 45 stripe_width 0 > pool 4 'downloads_metadata' replicated size 2 min_size 1 crush_ruleset 0 > object_hash rjenkins pg_num 250 pgp_num 250 last_change 36 flags hashpspool > stripe_width 0 > > > ---- On Sun, 12 Nov 2017 15:04:02 +0530 Sébastien VIGNERON > <sebastien.vigne...@criann.fr <mailto:sebastien.vigne...@criann.fr>> wrote > ---- > > Hi, > > Can you share: > - your placement rules: ceph osd crush rule dump > - your CEPH version: ceph versions > - your pools definitions: ceph osd pool ls detail > > With these we can determine is your pgs are stuck because of a > misconfiguration or something else. > > You seems to have some undersized pgs and a recovery in process. Does your > OSDs showed some rebalance of your datas? Does your OSDs use percentage > change over time? (changes in "ceph osd df") > > Cordialement / Best regards, > > Sébastien VIGNERON > CRIANN, > Ingénieur / Engineer > Technopôle du Madrillet > 745, avenue de l'Université > 76800 Saint-Etienne du Rouvray - France > tél. +33 2 32 91 42 91 > fax. +33 2 32 91 42 92 > http://www.criann.fr <http://www.criann.fr/> > mailto:sebastien.vigne...@criann.fr <mailto:sebastien.vigne...@criann.fr> > support: supp...@criann.fr <mailto:supp...@criann.fr> > > Le 12 nov. 2017 à 10:04, gjprabu <gjpr...@zohocorp.com > <mailto:gjpr...@zohocorp.com>> a écrit : > > Hi Team, > > We have ceph setup with 6 OSD and we got alert with 2 OSD is near > full . We faced issue like slow in accessing ceph from client. So i have > added 7th OSD and still 2 OSD is showing near full ( OSD.0 and OSD.4) , I > have restarted ceph service in osd.0 and osd.4 . Kindly check the below ceph > osd status and please provide us the solutions. > > > # ceph health detail > HEALTH_WARN 46 pgs backfill_wait; 1 pgs backfilling; 32 pgs degraded; 50 pgs > stuck unclean; 32 pgs undersized; recovery 1098780/40253637 objects degraded > (2.730%); recovery 3401433/40253637 objects misplaced (8.450%); 2 near full > osd(s); mds0: Client integ-hm3 failing to respond to cache pressure; mds0: > Client integ-hm8 failing to respond to cache pressure; mds0: Client integ-hm2 > failing to respond to cache pressure; mds0: Client integ-hm9 failing to > respond to cache pressure; mds0: Client integ-hm5 failing to respond to cache > pressure; mds0: Client integ-hm9-bkp failing to respond to cache pressure; > mds0: Client me-build1-bkp failing to respond to cache pressure > > pg 3.f6 is stuck unclean for 511223.069161, current state > active+undersized+degraded+remapped+wait_backfill, last acting [2] > pg 4.f6 is stuck unclean for 511232.770419, current state > active+undersized+degraded+remapped+wait_backfill, last acting [2] > pg 3.ec is stuck unclean for 510902.815668, current state > active+undersized+degraded+remapped+wait_backfill, last acting [2] > pg 3.eb is stuck unclean for 511285.576487, current state > active+remapped+wait_backfill, last acting [3,0] > pg 4.17 is stuck unclean for 511235.326709, current state > active+undersized+degraded+remapped+wait_backfill, last acting [1] > pg 4.2f is stuck unclean for 511232.356371, current state > active+undersized+degraded+remapped+wait_backfill, last acting [2] > pg 4.3d is stuck unclean for 511300.446982, current state active+remapped, > last acting [3,0] > pg 4.93 is stuck unclean for 511295.539229, current state > active+undersized+degraded+remapped+wait_backfill, last acting [3] > pg 3.47 is stuck unclean for 511288.104965, current state > active+remapped+wait_backfill, last acting [3,0] > pg 4.d5 is stuck unclean for 510916.509825, current state > active+undersized+degraded+remapped+wait_backfill, last acting [2] > pg 3.31 is stuck unclean for 511221.542878, current state > active+remapped+wait_backfill, last acting [0,3] > pg 3.62 is stuck unclean for 511221.551662, current state > active+undersized+degraded+remapped+wait_backfill, last acting [4] > pg 4.4d is stuck unclean for 511232.279602, current state > active+undersized+degraded+remapped+wait_backfill, last acting [2] > pg 4.48 is stuck unclean for 510911.095367, current state > active+remapped+wait_backfill, last acting [5,4] > pg 3.4f is stuck unclean for 511226.712285, current state > active+undersized+degraded+remapped+wait_backfill, last acting [1] > pg 3.78 is stuck unclean for 511221.531199, current state > active+undersized+degraded+remapped+wait_backfill, last acting [2] > pg 3.24 is stuck unclean for 510903.483324, current state > active+remapped+backfilling, last acting [1,2] > pg 4.8c is stuck unclean for 511231.668693, current state > active+undersized+degraded+remapped+wait_backfill, last acting [1] > pg 3.b4 is stuck unclean for 511222.612012, current state > active+undersized+degraded+remapped+wait_backfill, last acting [0] > pg 4.41 is stuck unclean for 511287.031264, current state > active+remapped+wait_backfill, last acting [3,2] > pg 3.d1 is stuck unclean for 510903.797329, current state > active+remapped+wait_backfill, last acting [0,3] > pg 3.7f is stuck unclean for 511222.929722, current state > active+undersized+degraded+remapped+wait_backfill, last acting [1] > pg 4.af is stuck unclean for 511262.494659, current state > active+undersized+degraded+remapped, last acting [0] > pg 3.66 is stuck unclean for 510903.296711, current state > active+remapped+wait_backfill, last acting [3,0] > pg 3.76 is stuck unclean for 511224.615144, current state > active+undersized+degraded+remapped+wait_backfill, last acting [3] > pg 4.57 is stuck unclean for 511234.514343, current state active+remapped, > last acting [0,4] > pg 3.69 is stuck unclean for 511224.672085, current state > active+undersized+degraded+remapped+wait_backfill, last acting [4] > pg 3.9a is stuck unclean for 510967.300000, current state > active+remapped+wait_backfill, last acting [3,2] > pg 4.50 is stuck unclean for 510903.825565, current state > active+undersized+degraded+remapped+wait_backfill, last acting [1] > pg 4.53 is stuck unclean for 510921.975268, current state > active+undersized+degraded+remapped+wait_backfill, last acting [2] > pg 3.e7 is stuck unclean for 511221.530592, current state > active+undersized+degraded+remapped+wait_backfill, last acting [2] > pg 4.6a is stuck unclean for 510911.284877, current state > active+undersized+degraded+remapped+wait_backfill, last acting [0] > pg 4.16 is stuck unclean for 511232.702762, current state > active+undersized+degraded+remapped+wait_backfill, last acting [1] > pg 3.2c is stuck unclean for 511222.443893, current state > active+remapped+wait_backfill, last acting [2,3] > pg 4.89 is stuck unclean for 511228.846614, current state > active+undersized+degraded+remapped+wait_backfill, last acting [4] > pg 4.39 is stuck unclean for 511239.544231, current state > active+remapped+wait_backfill, last acting [3,2] > pg 4.ce is stuck unclean for 511232.294586, current state > active+undersized+degraded+remapped+wait_backfill, last acting [1] > pg 3.91 is stuck unclean for 511232.341380, current state > active+undersized+degraded+remapped+wait_backfill, last acting [2] > pg 3.96 is stuck unclean for 510904.043900, current state > active+undersized+degraded+remapped+wait_backfill, last acting [2] > pg 4.c0 is stuck unclean for 510904.253281, current state > active+undersized+degraded+remapped+wait_backfill, last acting [2] > pg 4.9c is stuck unclean for 511237.612850, current state > active+undersized+degraded+remapped+wait_backfill, last acting [1] > pg 3.ab is stuck unclean for 510960.756324, current state > active+remapped+wait_backfill, last acting [3,2] > pg 4.aa is stuck unclean for 511229.307559, current state > active+remapped+wait_backfill, last acting [0,3] > pg 3.ad is stuck unclean for 510903.764157, current state > active+remapped+wait_backfill, last acting [0,3] > pg 3.b5 is stuck unclean for 511226.560774, current state > active+undersized+degraded+remapped+wait_backfill, last acting [3] > pg 4.58 is stuck unclean for 510919.273667, current state > active+undersized+degraded+remapped+wait_backfill, last acting [1] > pg 4.b9 is stuck unclean for 511232.760066, current state > active+remapped+wait_backfill, last acting [5,4] > pg 3.be <http://3.be/> is stuck unclean for 511224.422931, current state > active+remapped+wait_backfill, last acting [0,4] > pg 4.d4 is stuck unclean for 510962.810416, current state > active+undersized+degraded+remapped+wait_backfill, last acting [3] > pg 4.da is stuck unclean for 511259.506962, current state > active+undersized+degraded+remapped+wait_backfill, last acting [2] > pg 4.8c is active+undersized+degraded+remapped+wait_backfill, acting [1] > pg 3.7f is active+undersized+degraded+remapped+wait_backfill, acting [1] > pg 3.78 is active+undersized+degraded+remapped+wait_backfill, acting [2] > pg 3.76 is active+undersized+degraded+remapped+wait_backfill, acting [3] > pg 4.6a is active+undersized+degraded+remapped+wait_backfill, acting [0] > pg 3.69 is active+undersized+degraded+remapped+wait_backfill, acting [4] > pg 3.66 is active+remapped+wait_backfill, acting [3,0] > pg 3.62 is active+undersized+degraded+remapped+wait_backfill, acting [4] > pg 4.58 is active+undersized+degraded+remapped+wait_backfill, acting [1] > pg 4.50 is active+undersized+degraded+remapped+wait_backfill, acting [1] > pg 4.53 is active+undersized+degraded+remapped+wait_backfill, acting [2] > pg 3.4f is active+undersized+degraded+remapped+wait_backfill, acting [1] > pg 4.48 is active+remapped+wait_backfill, acting [5,4] > pg 4.4d is active+undersized+degraded+remapped+wait_backfill, acting [2] > pg 3.47 is active+remapped+wait_backfill, acting [3,0] > pg 4.41 is active+remapped+wait_backfill, acting [3,2] > pg 3.31 is active+remapped+wait_backfill, acting [0,3] > pg 4.2f is active+undersized+degraded+remapped+wait_backfill, acting [2] > pg 3.24 is active+remapped+backfilling, acting [1,2] > pg 4.17 is active+undersized+degraded+remapped+wait_backfill, acting [1] > pg 4.16 is active+undersized+degraded+remapped+wait_backfill, acting [1] > pg 3.2c is active+remapped+wait_backfill, acting [2,3] > pg 4.39 is active+remapped+wait_backfill, acting [3,2] > pg 4.89 is active+undersized+degraded+remapped+wait_backfill, acting [4] > pg 3.91 is active+undersized+degraded+remapped+wait_backfill, acting [2] > pg 4.93 is active+undersized+degraded+remapped+wait_backfill, acting [3] > pg 3.96 is active+undersized+degraded+remapped+wait_backfill, acting [2] > pg 3.9a is active+remapped+wait_backfill, acting [3,2] > pg 4.9c is active+undersized+degraded+remapped+wait_backfill, acting [1] > pg 4.af is active+undersized+degraded+remapped, acting [0] > pg 3.ab is active+remapped+wait_backfill, acting [3,2] > pg 4.aa is active+remapped+wait_backfill, acting [0,3] > pg 3.ad is active+remapped+wait_backfill, acting [0,3] > pg 3.b4 is active+undersized+degraded+remapped+wait_backfill, acting [0] > pg 3.b5 is active+undersized+degraded+remapped+wait_backfill, acting [3] > pg 4.b9 is active+remapped+wait_backfill, acting [5,4] > pg 3.be <http://3.be/> is active+remapped+wait_backfill, acting [0,4] > pg 4.c0 is active+undersized+degraded+remapped+wait_backfill, acting [2] > pg 4.ce is active+undersized+degraded+remapped+wait_backfill, acting [1] > pg 3.d1 is active+remapped+wait_backfill, acting [0,3] > pg 4.d5 is active+undersized+degraded+remapped+wait_backfill, acting [2] > pg 4.d4 is active+undersized+degraded+remapped+wait_backfill, acting [3] > pg 4.da is active+undersized+degraded+remapped+wait_backfill, acting [2] > pg 3.e7 is active+undersized+degraded+remapped+wait_backfill, acting [2] > pg 3.eb is active+remapped+wait_backfill, acting [3,0] > pg 3.ec is active+undersized+degraded+remapped+wait_backfill, acting [2] > pg 4.f6 is active+undersized+degraded+remapped+wait_backfill, acting [2] > pg 3.f6 is active+undersized+degraded+remapped+wait_backfill, acting [2] > recovery 1098780/40253637 objects degraded (2.730%) > recovery 3401433/40253637 objects misplaced (8.450%) > osd.0 is near full at 85% > osd.4 is near full at 90% > mds0: Client integ-hm3 failing to respond to cache pressure(client_id: 733998) > mds0: Client integ-hm8 failing to respond to cache pressure(client_id: 843866) > mds0: Client integ-hm2 failing to respond to cache pressure(client_id: 844939) > mds0: Client integ-hm9 failing to respond to cache pressure(client_id: 845065) > mds0: Client integ-hm5 failing to respond to cache pressure(client_id: 845068) > mds0: Client integ-hm9-bkp failing to respond to cache pressure(client_id: > 895898) > mds0: Client me-build1-bkp failing to respond to cache pressure(client_id: > 888666) > > > hm ~]# ceph osd tree > ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY > -1 22.92604 root default > -2 3.29749 host intcfs-osd1 > 0 3.29749 osd.0 up 1.00000 1.00000 > -3 3.26869 host intcfs-osd2 > 1 3.26869 osd.1 up 1.00000 1.00000 > -4 3.27339 host intcfs-osd3 > 2 3.27339 osd.2 up 1.00000 1.00000 > -5 3.24089 host intcfs-osd4 > 3 3.24089 osd.3 up 1.00000 1.00000 > -6 3.24089 host intcfs-osd5 > 4 3.24089 osd.4 up 1.00000 1.00000 > -7 3.32669 host intcfs-osd6 > 5 3.32669 osd.5 up 1.00000 1.00000 > -8 3.27800 host intcfs-osd7 > 6 3.27800 osd.6 up 1.00000 1.00000 > > > hm5 ~]# ceph osd df > ID WEIGHT REWEIGHT SIZE USE AVAIL %USE VAR PGS > 0 3.29749 1.00000 3376G 2874G 502G 85.13 1.26 165 > 1 3.26869 1.00000 3347G 1922G 1424G 57.44 0.85 152 > 2 3.27339 1.00000 3351G 2009G 1342G 59.95 0.89 162 > 3 3.24089 1.00000 3318G 2130G 1188G 64.19 0.95 168 > 4 3.24089 1.00000 3318G 2996G 321G 90.30 1.34 176 > 5 3.32669 1.00000 3406G 2465G 940G 72.39 1.07 165 > 6 3.27800 1.00000 3356G 1435G 1921G 42.76 0.63 166 > TOTAL 23476G 15834G 7641G 67.45 > MIN/MAX VAR: 0.63/1.34 STDDEV: 15.29 > > > Regards > Prabu GJ > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com> >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com