So what I see there is this for osd.307:
"empty": 1,
"dne": 0,
"incomplete": 0,
"last_epoch_started": 0,
"hit_set_history": {
"current_last_update": "0'0",
"history": []
}
}
last_epoch_started is 0 and empty is 1. The other OSDs are reporting
last_epoch_started 16806 and empty 0.
I noticed that too and was wondering why it never completed recovery and joined
> If you stop osd.307 and maybe mark it as out, does that help?
No, I see the same thing I saw when I took 595 out:
[root@ceph-mon1 ~]# ceph pg map 1.323
osdmap e22392 pg 1.323 (1.323) -> up
[985,1391,240,127,937,362,267,320,7,634,716] acting
[2147483647,1391,240,127,937,362,267,320,7,634,716]
Another OSD get chosen as the primary but never becomes acting on its own.
Another 11 PGs are reporting being undersized and having ITEM_NONE in their
acting sets as well.
> ________________________________________
> From: Wido den Hollander [[email protected]]
> Sent: 22 February 2017 12:18
> To: Vasilakakos, George (STFC,RAL,SC); [email protected]
> Subject: RE: [ceph-users] PG stuck peering after host reboot
>
> > Op 21 februari 2017 om 15:35 schreef [email protected]:
> >
> >
> > I have noticed something odd with the ceph-objectstore-tool command:
> >
> > It always reports PG X not found even on healthly OSDs/PGs. The 'list' op
> > works on both and unhealthy PGs.
> >
>
> Are you sure you are supplying the correct PG ID?
>
> I just tested with (Jewel 10.2.5):
>
> $ ceph pg ls-by-osd 5
> $ systemctl stop ceph-osd@5
> $ ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-5 --op info --pgid
> 10.d0
> $ systemctl start ceph-osd@5
>
> Can you double-check that?
>
> It's weird that the PG can't be found on those OSDs by the tool.
>
> Wido
>
>
> > ________________________________________
> > From: ceph-users [[email protected]] on behalf of
> > [email protected] [[email protected]]
> > Sent: 21 February 2017 10:17
> > To: [email protected]; [email protected]; [email protected]
> > Subject: Re: [ceph-users] PG stuck peering after host reboot
> >
> > > Can you for the sake of redundancy post your sequence of commands you
> > > executed and their output?
> >
> > [root@ceph-sn852 ~]# systemctl stop ceph-osd@307
> > [root@ceph-sn852 ~]# ceph-objectstore-tool --data-path
> > /var/lib/ceph/osd/ceph-307 --op info --pgid 1.323
> > PG '1.323' not found
> > [root@ceph-sn852 ~]# systemctl start ceph-osd@307
> >
> > I did the same thing for 307 (new up but not acting primary) and all the
> > OSDs in the original set (including 595). The output was the exact same. I
> > don't have the whole session log handy from all those sessions but here's a
> > sample from one that's easy to pick out:
> >
> > [root@ceph-sn832 ~]# systemctl stop ceph-osd@7
> > [root@ceph-sn832 ~]# ceph-objectstore-tool --data-path
> > /var/lib/ceph/osd/ceph-7 --op info --pgid 1.323
> > PG '1.323' not found
> > [root@ceph-sn832 ~]# systemctl start ceph-osd@7
> > [root@ceph-sn832 ~]# ll /var/lib/ceph/osd/ceph-7/current/
> > 0.18_head/ 11.1c8s5_TEMP/ 13.3b_head/ 1.74s1_TEMP/
> > 2.256s6_head/ 2.c3s10_TEMP/ 3.b9s4_head/
> > 0.18_TEMP/ 1.16s1_head/ 13.3b_TEMP/ 1.8bs9_head/
> > 2.256s6_TEMP/ 2.c4s3_head/ 3.b9s4_TEMP/
> > 1.106s10_head/ 1.16s1_TEMP/ 1.3a6s0_head/ 1.8bs9_TEMP/
> > 2.2d5s2_head/ 2.c4s3_TEMP/ 4.34s10_head/
> > 1.106s10_TEMP/ 1.274s5_head/ 1.3a6s0_TEMP/ 2.174s10_head/
> > 2.2d5s2_TEMP/ 2.dbs7_head/ 4.34s10_TEMP/
> > 11.12as10_head/ 1.274s5_TEMP/ 1.3e4s9_head/ 2.174s10_TEMP/
> > 2.340s8_head/ 2.dbs7_TEMP/ commit_op_seq
> > 11.12as10_TEMP/ 1.2ds8_head/ 1.3e4s9_TEMP/ 2.1c1s10_head/
> > 2.340s8_TEMP/ 3.159s3_head/ meta/
> > 11.148s2_head/ 1.2ds8_TEMP/ 14.1a_head/ 2.1c1s10_TEMP/
> > 2.36es10_head/ 3.159s3_TEMP/ nosnap
> > 11.148s2_TEMP/ 1.323s8_head/ 14.1a_TEMP/ 2.1d0s6_head/
> > 2.36es10_TEMP/ 3.170s1_head/ omap/
> > 11.165s6_head/ 1.323s8_TEMP/ 1.6fs9_head/ 2.1d0s6_TEMP/
> > 2.3d3s10_head/ 3.170s1_TEMP/
> > 11.165s6_TEMP/ 13.32_head/ 1.6fs9_TEMP/ 2.1efs2_head/
> > 2.3d3s10_TEMP/ 3.1aas5_head/
> > 11.1c8s5_head/ 13.32_TEMP/ 1.74s1_head/ 2.1efs2_TEMP/
> > 2.c3s10_head/ 3.1aas5_TEMP/
> > [root@ceph-sn832 ~]# ll /var/lib/ceph/osd/ceph-7/current/1.323s8_
> > 1.323s8_head/ 1.323s8_TEMP/
> > [root@ceph-sn832 ~]# ll
> > /var/lib/ceph/osd/ceph-7/current/1.323s8_head/DIR_3/DIR_2/DIR_
> > DIR_3/ DIR_7/ DIR_B/ DIR_F/
> > [root@ceph-sn832 ~]# ll
> > /var/lib/ceph/osd/ceph-7/current/1.323s8_head/DIR_3/DIR_2/DIR_3/DIR_
> > DIR_0/ DIR_1/ DIR_2/ DIR_3/ DIR_4/ DIR_5/ DIR_6/ DIR_7/ DIR_8/ DIR_9/
> > DIR_A/ DIR_B/ DIR_C/ DIR_D/ DIR_E/ DIR_F/
> > [root@ceph-sn832 ~]# ll
> > /var/lib/ceph/osd/ceph-7/current/1.323s8_head/DIR_3/DIR_2/DIR_3/DIR_1/
> > total 271276
> > -rw-r--r--. 1 ceph ceph 8388608 Feb 3 22:07
> > datadisk\srucio\sdata16\u13TeV\s11\sad\sDAOD\uTOPQ4.09383728.\u000436.pool.root.1.0000000000000001__head_2BA91323__1_ffffffffffffffff_8
> >
> > > If you run a find in the data directory of the OSD, does that PG show up?
> >
> > OSDs 595 (used to be 0), 1391(1), 240(2), 7(7, the one that started this)
> > have a 1.323_headsX directory. OSD 307 does not.
> > I have not checked the other OSDs in the PG yet.
> >
> > Wido
> >
> > >
> > > Best regards,
> > >
> > > George
> > _______________________________________________
> > ceph-users mailing list
> > [email protected]
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com