I'll power the cluster up today or tomorrow and take a look again, Dan, but
the initial problem is that many of the pgs can't be queried — the requests
time out. I don't know if it's purely the stale, or just the unknown pgs,
that can't be queried, but I'll investigate if there's something wrong with
mgr. I typically have plenty of running mgrs.

Thanks for the advice on ignore_history. I'll avoid it for now.

On Fri, Feb 5, 2021 at 6:52 AM Dan van der Ster <d...@vanderster.com> wrote:

> Eeek! Don't run `osd_find_best_info_ignore_history_les = true` -- that
> leads to data loss even such that you don't expect.
>
>
> Are you sure all OSDs are up?
>
> Query a PG to find out why it is unknown: `ceph pg <id> query`. Feel
> free to share that
>
> In fact, the 'unknown' state means the MGR doesn't know the state of
> the PG -- is your MGR running correctly now?
>
> -- Dan
>
>
>
>
> On Fri, Feb 5, 2021 at 4:49 PM Jeremy Austin <jhaus...@gmail.com> wrote:
> >
> > I was in the middle of a rebalance on a small test cluster with about 1%
> of
> > pgs degraded, and shut the cluster entirely down for maintenance.
> >
> > On startup, many pgs are entirely unknown, and most stale. In fact most
> pgs
> > can't be queried! No mon failures. Would osd logs tell me why pgs aren't
> > even moving to an inactive state?
> >
> > I'm not concerned about data loss due to the shutdown (all activity to
> the
> > cluster had been stopped), so should I be setting some or all OSDs "
> > osd_find_best_info_ignore_history_les = true"?
> >
> > Thank you,
> >
> > --
> > Jeremy Austin
> > jhaus...@gmail.com
> > _______________________________________________
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
>


-- 
Jeremy Austin
jhaus...@gmail.com
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to