[ceph-users] Re: ceph_leadership_team_meeting_s18e06.mkv

2023-09-08 Thread Rok Jaklič
We do not use containers. Anything special for debugging or should we try something from previous email? - Enable profiling (Mark Nelson) - Try Bloomberg's Python mem profiler (Matthew Leonard) Profiling means instructions from https://docs.ceph.com/

[ceph-users] MGR Memory Leak in Restful

2023-09-08 Thread Chris Palmer
I first posted this on 17 April but did not get any response (although IIRC a number of other posts referred to it). Seeing as MGR OOM is being discussed at the moment I am re-posting. These clusters are not containerized. Is this being tracked/fixed or not? Thanks, Chris -

[ceph-users] Re: ceph_leadership_team_meeting_s18e06.mkv

2023-09-08 Thread Loïc Tortay
On 07/09/2023 21:33, Mark Nelson wrote: Hi Rok, We're still try to catch what's causing the memory growth, so it's hard to guess at which releases are affected.  We know it's happening intermittently on a live Pacific cluster at least.  If you have the ability to catch it while it's happening

[ceph-users] Re: MGR Memory Leak in Restful

2023-09-08 Thread David Orman
Hi, I do not believe this is actively being worked on, but there is a tracker open, if you can submit an update it may help attract attention/develop a fix: https://tracker.ceph.com/issues/59580 David On Fri, Sep 8, 2023, at 03:29, Chris Palmer wrote: > I first posted this on 17 April but did

[ceph-users] Re: ceph_leadership_team_meeting_s18e06.mkv

2023-09-08 Thread David Orman
I would suggest updating: https://tracker.ceph.com/issues/59580 We did notice it with 16.2.13, as well, after upgrading from .10, so likely in-between those two releases. David On Fri, Sep 8, 2023, at 04:00, Loïc Tortay wrote: > On 07/09/2023 21:33, Mark Nelson wrote: >> Hi Rok, >> >> We're st

[ceph-users] Unhappy Cluster

2023-09-08 Thread Dave S
Hi Everyone, I've been fighting with a ceph cluster that we have recently physically relocated and lost 2 OSDs during the ensuing power down and relocation. After powering everything back on we have 3 incomplete 3 remapped+incomplete And indeed we have 2 OSDs that died

[ceph-users] Re: Unhappy Cluster

2023-09-08 Thread Alexander E. Patrakov
Hello Dave, I think your data is still intact. Nautilus, indeed, had issues when recovering erasure-coded pools. You can try temporarily setting min_size to 4. This bug has been fixed in Octopus or later releases. From the release notes at https://docs.ceph.com/en/latest/releases/octopus/: Ceph w

[ceph-users] Re: Unhappy Cluster

2023-09-08 Thread Dave S
OOH Thanks! I'll certainly give that a try. On Fri, Sep 8, 2023 at 7:50 PM Alexander E. Patrakov wrote: > > Hello Dave, > > I think your data is still intact. Nautilus, indeed, had issues when > recovering erasure-coded pools. You can try temporarily setting min_size to > 4. This bug has been

[ceph-users] Re: Unhappy Cluster

2023-09-08 Thread Dave S
Thanks Alexander! That did it. -Dave On Fri, Sep 8, 2023 at 7:55 PM Dave S wrote: > > OOH Thanks! I'll certainly give that a try. > > On Fri, Sep 8, 2023 at 7:50 PM Alexander E. Patrakov > wrote: > > > > Hello Dave, > > > > I think your data is still intact. Nautilus, indeed, had issues when