Re: [ceph-users] problems with pg down

ceph Tue, 09 Apr 2019 10:58:07 -0700

Hi Fabio,
Did you resolve the issue?

A bit late, i know, but did you tried to restart  OSD 14? If 102 and 121 are 
fine i would also try to crush reweight 14 to 0.


Greetings
Mehmet 

Am 10. März 2019 19:26:57 MEZ schrieb Fabio Abreu <fabioabreur...@gmail.com>:
>Hi Darius,
>
>Thanks for your reply !
>
>This happening after a disaster with an sata storage node, the osds 102
>and
>121 is up  .
>
>The information belllow is osd 14 log , do you recommend mark out of
>this
>cluster ?
>
>2019-03-10 17:36:17.654134 7f1991163700  0 -- 172.16.184.90:6800/589935
>>>
>:/0 pipe(0x555be7808800 sd=516 :6800 s=0 pgs=0 cs=0 l=0
>c=0x555be6720400).accept failed to getpeername (107) Transport endpoint
>is
>not connected
>2019-03-10 17:36:17.654660 7f1992d7f700  0 -- 172.16.184.90:6800/589935
>>>
>:/0 pipe(0x555be773f400 sd=536 :6800 s=0 pgs=0 cs=0 l=0
>c=0x555be6720700).accept failed to getpeername (107) Transport endpoint
>is
>not connected
>2019-03-10 17:36:17.654720 7f1993a8c700  0 -- 172.16.184.90:6800/589935
>>>
>172.16.184.92:6801/1555502 pipe(0x555be7807400 sd=542 :6800 s=0 pgs=0
>cs=0
>l=0 c=0x555be6720280).accept connect_seq 0 vs existing 0 state wait
>2019-03-10 17:36:17.654813 7f199095b700  0 -- 172.16.184.90:6800/589935
>>>
>:/0 pipe(0x555be6d8e000 sd=537 :6800 s=0 pgs=0 cs=0 l=0
>c=0x555be671ff80).accept failed to getpeername (107) Transport endpoint
>is
>not connected
>2019-03-10 17:36:17.654847 7f1992476700  0 -- 172.16.184.90:6800/589935
>>>
>172.16.184.95:6840/1537112 pipe(0x555be773e000 sd=533 :6800 s=0 pgs=0
>cs=0
>l=0 c=0x555be671fc80).accept connect_seq 0 vs existing 0 state wait
>2019-03-10 17:36:17.655252 7f1993486700  0 -- 172.16.184.90:6800/589935
>>>
>172.16.184.92:6832/1098862 pipe(0x555be779f400 sd=521 :6800 s=0 pgs=0
>cs=0
>l=0 c=0x555be6242d00).accept connect_seq 0 vs existing 0 state wait
>2019-03-10 17:36:17.655315 7f1993284700  0 -- 172.16.184.90:6800/589935
>>>
>:/0 pipe(0x555be6d90800 sd=523 :6800 s=0 pgs=0 cs=0 l=0
>c=0x555be6720880).accept failed to getpeername (107) Transport endpoint
>is
>not connected
>2019-03-10 17:36:17.655814 7f1992173700  0 -- 172.16.184.90:6800/589935
>>>
>172.16.184.91:6833/316673 pipe(0x555be7740800 sd=527 :6800 s=0 pgs=0
>cs=0
>l=0 c=0x555be6720580).accept connect_seq 0 vs existing 0 state wait
>
>Regards,
>Fabio Abreu
>
>On Sun, Mar 10, 2019 at 3:20 PM Darius Kasparavičius <daz...@gmail.com>
>wrote:
>
>> Hi,
>>
>> Check your osd.14 logs for information its currently stuck and not
>> providing io for replication. And what happened to OSD's 102 121?
>>
>> On Sun, Mar 10, 2019 at 7:44 PM Fabio Abreu
><fabioabreur...@gmail.com>
>> wrote:
>> >
>> > Hi Everybody .
>> >
>> > I have an pg with down+peering  state and that have requests
>blocked
>> impacting my pg query, I can't find the osd to apply the lost
>paremeter.
>> >
>> >
>>
>http://docs.ceph.com/docs/mimic/rados/troubleshooting/troubleshooting-pg/#placement-group-down-peering-failure
>> >
>> > Did someone  have  same  scenario with  state down?
>> >
>> > Storage :
>> >
>> > 100 ops are blocked > 262.144 sec on osd.14
>> >
>> > root@monitor:~# ceph pg dump_stuck inactive
>> > ok
>> > pg_stat state   up      up_primary      acting  acting_primary
>> > 5.6e0   down+remapped+peering   [102,121,14]    102     [14]    14
>> >
>> >
>> > root@monitor:~# ceph -s
>> >     cluster xxx
>> >      health HEALTH_ERR
>> >             1 pgs are stuck inactive for more than 300 seconds
>> >             223 pgs backfill_wait
>> >             14 pgs backfilling
>> >             215 pgs degraded
>> >             1 pgs down
>> >             1 pgs peering
>> >             1 pgs recovering
>> >             53 pgs recovery_wait
>> >             199 pgs stuck degraded
>> >             1 pgs stuck inactive
>> >             278 pgs stuck unclean
>> >             162 pgs stuck undersized
>> >             162 pgs undersized
>> >             100 requests are blocked > 32 sec
>> >             recovery 2767660/317878237 objects degraded (0.871%)
>> >             recovery 7484106/317878237 objects misplaced (2.354%)
>> >             recovery 29/105009626 unfoun
>> >
>> >
>> >
>> >
>> > --
>> > Regards,
>> > Fabio Abreu Reis
>> > http://fajlinux.com.br
>> > Tel : +55 21 98244-0161
>> > Skype : fabioabreureis
>> > _______________________________________________
>> > ceph-users mailing list
>> > ceph-users@lists.ceph.com
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
>
>-- 
>Atenciosamente,
>Fabio Abreu Reis
>http://fajlinux.com.br
>*Tel : *+55 21 98244-0161
>*Skype : *fabioabreureis

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] problems with pg down

Reply via email to