Hi Fabio, Did you resolve the issue? A bit late, i know, but did you tried to restart OSD 14? If 102 and 121 are fine i would also try to crush reweight 14 to 0.
Greetings Mehmet Am 10. März 2019 19:26:57 MEZ schrieb Fabio Abreu <fabioabreur...@gmail.com>: >Hi Darius, > >Thanks for your reply ! > >This happening after a disaster with an sata storage node, the osds 102 >and >121 is up . > >The information belllow is osd 14 log , do you recommend mark out of >this >cluster ? > >2019-03-10 17:36:17.654134 7f1991163700 0 -- 172.16.184.90:6800/589935 >>> >:/0 pipe(0x555be7808800 sd=516 :6800 s=0 pgs=0 cs=0 l=0 >c=0x555be6720400).accept failed to getpeername (107) Transport endpoint >is >not connected >2019-03-10 17:36:17.654660 7f1992d7f700 0 -- 172.16.184.90:6800/589935 >>> >:/0 pipe(0x555be773f400 sd=536 :6800 s=0 pgs=0 cs=0 l=0 >c=0x555be6720700).accept failed to getpeername (107) Transport endpoint >is >not connected >2019-03-10 17:36:17.654720 7f1993a8c700 0 -- 172.16.184.90:6800/589935 >>> >172.16.184.92:6801/1555502 pipe(0x555be7807400 sd=542 :6800 s=0 pgs=0 >cs=0 >l=0 c=0x555be6720280).accept connect_seq 0 vs existing 0 state wait >2019-03-10 17:36:17.654813 7f199095b700 0 -- 172.16.184.90:6800/589935 >>> >:/0 pipe(0x555be6d8e000 sd=537 :6800 s=0 pgs=0 cs=0 l=0 >c=0x555be671ff80).accept failed to getpeername (107) Transport endpoint >is >not connected >2019-03-10 17:36:17.654847 7f1992476700 0 -- 172.16.184.90:6800/589935 >>> >172.16.184.95:6840/1537112 pipe(0x555be773e000 sd=533 :6800 s=0 pgs=0 >cs=0 >l=0 c=0x555be671fc80).accept connect_seq 0 vs existing 0 state wait >2019-03-10 17:36:17.655252 7f1993486700 0 -- 172.16.184.90:6800/589935 >>> >172.16.184.92:6832/1098862 pipe(0x555be779f400 sd=521 :6800 s=0 pgs=0 >cs=0 >l=0 c=0x555be6242d00).accept connect_seq 0 vs existing 0 state wait >2019-03-10 17:36:17.655315 7f1993284700 0 -- 172.16.184.90:6800/589935 >>> >:/0 pipe(0x555be6d90800 sd=523 :6800 s=0 pgs=0 cs=0 l=0 >c=0x555be6720880).accept failed to getpeername (107) Transport endpoint >is >not connected >2019-03-10 17:36:17.655814 7f1992173700 0 -- 172.16.184.90:6800/589935 >>> >172.16.184.91:6833/316673 pipe(0x555be7740800 sd=527 :6800 s=0 pgs=0 >cs=0 >l=0 c=0x555be6720580).accept connect_seq 0 vs existing 0 state wait > >Regards, >Fabio Abreu > >On Sun, Mar 10, 2019 at 3:20 PM Darius Kasparavičius <daz...@gmail.com> >wrote: > >> Hi, >> >> Check your osd.14 logs for information its currently stuck and not >> providing io for replication. And what happened to OSD's 102 121? >> >> On Sun, Mar 10, 2019 at 7:44 PM Fabio Abreu ><fabioabreur...@gmail.com> >> wrote: >> > >> > Hi Everybody . >> > >> > I have an pg with down+peering state and that have requests >blocked >> impacting my pg query, I can't find the osd to apply the lost >paremeter. >> > >> > >> >http://docs.ceph.com/docs/mimic/rados/troubleshooting/troubleshooting-pg/#placement-group-down-peering-failure >> > >> > Did someone have same scenario with state down? >> > >> > Storage : >> > >> > 100 ops are blocked > 262.144 sec on osd.14 >> > >> > root@monitor:~# ceph pg dump_stuck inactive >> > ok >> > pg_stat state up up_primary acting acting_primary >> > 5.6e0 down+remapped+peering [102,121,14] 102 [14] 14 >> > >> > >> > root@monitor:~# ceph -s >> > cluster xxx >> > health HEALTH_ERR >> > 1 pgs are stuck inactive for more than 300 seconds >> > 223 pgs backfill_wait >> > 14 pgs backfilling >> > 215 pgs degraded >> > 1 pgs down >> > 1 pgs peering >> > 1 pgs recovering >> > 53 pgs recovery_wait >> > 199 pgs stuck degraded >> > 1 pgs stuck inactive >> > 278 pgs stuck unclean >> > 162 pgs stuck undersized >> > 162 pgs undersized >> > 100 requests are blocked > 32 sec >> > recovery 2767660/317878237 objects degraded (0.871%) >> > recovery 7484106/317878237 objects misplaced (2.354%) >> > recovery 29/105009626 unfoun >> > >> > >> > >> > >> > -- >> > Regards, >> > Fabio Abreu Reis >> > http://fajlinux.com.br >> > Tel : +55 21 98244-0161 >> > Skype : fabioabreureis >> > _______________________________________________ >> > ceph-users mailing list >> > ceph-users@lists.ceph.com >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> > > >-- >Atenciosamente, >Fabio Abreu Reis >http://fajlinux.com.br >*Tel : *+55 21 98244-0161 >*Skype : *fabioabreureis
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com