[ceph-users] Re: Fwd: Lots of OSDs crashlooping (DRAFT - feedback?)

2022-01-25 Thread Benjamin Staffin
the memory can balloon for some unknown reason. > The devs have asked a couple times for dumps of those logs replaying > huge-memory causing pglogs. > > In this case -- Benjamin's issue -- I'm trying to understand if this > is related to: > * a huge pg log -- would need t

[ceph-users] Re: Lots of OSDs crashlooping (DRAFT - feedback?)

2022-01-24 Thread Benjamin Staffin
oh jeez, sorry about the subject line - I forgot to change it after asking a coworker to review the message. This is not a draft. On Mon, Jan 24, 2022 at 6:44 PM Benjamin Staffin wrote: > I have a cluster where 46 out of 120 OSDs have begun crash looping with > the same stack trace (see

[ceph-users] Fwd: Lots of OSDs crashlooping (DRAFT - feedback?)

2022-01-24 Thread Benjamin Staffin
I have a cluster where 46 out of 120 OSDs have begun crash looping with the same stack trace (see pasted output below). The cluster is in a very bad state with this many OSDs down, unsurprisingly. The day before this problem showed up, the k8s cluster was under extreme memory pressure and a lot o