[ceph-users] Re: ceph-mon rocksdb write latency

2022-01-11 Thread Karsten Nielsen
On 11-01-2022 09:36, Anthony D'Atri wrote: Our hosts run all NVMe Which drives, specifically? And how many OSDs per? How many PGs per OSD? It is 3 types of devices: * HPE NS204i-p Gen10+ Boot Controller - stores the /var/lib/ceph folder * HPE 7.68TB NVMe x4 RI SFF SC U.3 SSD - We have 3

[ceph-users] Re: OSD META usage growing without bounds

2022-01-11 Thread Igor Fedotov
Hi Frank, you might want to collect a couple of perf dumps for osd in question in e.g. one hour interval. And inspect what counters are growing in bluefs sections. "log_bytes" is of particular interest... Thanks, Igor On 1/10/2022 2:25 PM, Frank Schilder wrote: Hi, I'm observing a strang

[ceph-users] Re: OSD META usage growing without bounds

2022-01-11 Thread Igor Fedotov
Frank, btw - are you aware of https://tracker.ceph.com/issues/45903 ? I can see it was rejected for mimic for whatever reason. Hence I presume that might be pretty relevant to your case... Thanks, Igor On 1/11/2022 2:45 PM, Frank Schilder wrote: Hi Igor, thanks for your reply. To avoid f

[ceph-users] Re: OSD META usage growing without bounds

2022-01-11 Thread Igor Fedotov
And here is an overview from the PR (https://github.com/ceph/ceph/pull/35473) which looks to some degree inline with your initial points (cluster in an idle state for a long period): "Original problem stemmed from BlueFS inability to replay log, which was caused by BlueFS previously wrote rep

[ceph-users] Re: v16.2.7 Pacific released

2022-01-11 Thread Dan van der Ster
Hi, Yes it's confusing -- the release notes are normally only published in master, which is shown as "latest", and are rarely backported to a release branch. The notes you're looking for are here: https://docs.ceph.com/en/latest/releases/pacific/#v16-2-7-pacific Zac is in cc -- maybe we can make

[ceph-users] Re: v16.2.7 Pacific released

2022-01-11 Thread Gregory Farnum
On Tue, Jan 11, 2022 at 5:29 AM Dan van der Ster wrote: > > Hi, > > Yes it's confusing -- the release notes are normally only published in > master, which is shown as "latest", and are rarely backported to a > release branch. > The notes you're looking for are here: > https://docs.ceph.com/en/late

[ceph-users] Re: Grafana version

2022-01-11 Thread Alfonso Martinez Hidalgo
Hi Jeremy, Thanks for the heads up! I cannot open the provided links. AFAIK you can set a custom grafana image by running: ceph config set mgr mgr/cephadm/container_image_grafana and then re-deploying the service. Plase see: https://docs.ceph.com/en/pacific/cephadm/services/monitoring/#using-

[ceph-users] Re: How to troubleshoot monitor node

2022-01-11 Thread Andre Tann
Hi 胡 玮文, On 10.01.22 19:27, 胡 玮文 wrote: So this cluster is deployed with cephadm. Please use systemctl status ceph-b61400fe-6e25-11ec-b322-896f8c260566@mon.mon01.service OK, this gives another picture: root@mon01:~# systemctl status ceph-b61400fe-6e25-11ec-b322-896f8c260566@mon.mon01.servi

[ceph-users] Re: RGW with keystone and dns-style buckets

2022-01-11 Thread Ansgar Jazdzewski
hi folks, i got it to work using haproxy just put some stuff into the frontend to rewrite the url from domainstyle with '_' to pathstyle with ':' acl dnsstyle_buckets hdr_end(host) -i .object.domain capture request header User-Agent len 256 capture request header Host len 128 http-request set

[ceph-users] Re: Infinite Dashboard 404 Loop On Failed SAML Authentication

2022-01-11 Thread Ernesto Puerta
Hi Edward, I tried to reproduce the issue (with Keycloak instead of Shibboleth) and I couldn't. After logging in with user credentials that only exists in the SSO service, I end up in the Dashboard's /auth/saml2 URL with the following error message: {"is_authenticated": false, "errors": ["invalid

[ceph-users] Re: Infinite Dashboard 404 Loop On Failed SAML Authentication

2022-01-11 Thread Edward R Huyer
Hmm, ok. It might be specific to Shib. I’ll investigate more. Thank you for checking. -- Edward Huyer Interactive Games and Media Department Golisano 70-2375 102 Lomb Memorial Drive Rochester, NY 14623 585-475-6651 erh...@rit.edu Obligatory Legalese: The information tr

[ceph-users] Re: Grafana version

2022-01-11 Thread Ernesto Puerta
https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2021-28148 states that this only happens to the Enterprise edition of Grafana, while the default version deployed by Cephadm is the community one. Kind Regards, Ernesto On Tue, Jan 4, 2022 at 4:14 AM Jeremy Hansen wrote: > I’m running 16.2.7 P

[ceph-users] Re: Infinite Dashboard 404 Loop On Failed SAML Authentication

2022-01-11 Thread Edward R Huyer
Actually, one other question occurred to me: Was your testing environment bare metal or a cephadm containerized install? It shouldn't matter, and I don't know that it does matter, but my environment is containerized. -- Edward Huyer -Original Message- From: Edward R Huyer [mailto:erh

[ceph-users] Re: OSDs use 200GB RAM and crash

2022-01-11 Thread Dan van der Ster
Hi, It sounds like https://tracker.ceph.com/issues/53729 -- Dan On Tue., Jan. 11, 2022, 18:32 Konstantin Larin, wrote: > Hi all, > > We have a problem with our 3 node all-in-one cluster (15.2.15). > > There are 16 OSDs on each node, 16 HDDs for data and 4 SSDs for DB. > > At some point 2 node

[ceph-users] Re: OSDs use 200GB RAM and crash

2022-01-11 Thread Marc
I think not even a week ago, someone posted what looks like the same. How is your situation different from that one? > > Hi all, > > We have a problem with our 3 node all-in-one cluster (15.2.15). > > There are 16 OSDs on each node, 16 HDDs for data and 4 SSDs for DB. > > At some point 2 nod

[ceph-users] Re: OSDs use 200GB RAM and crash

2022-01-11 Thread Lee
We had the exact same issue last week, in the end unless the dataset can fit in memory it will never boot.. To be honest this bug seems to being seen by quite a few, in our case it happened after a PGNUM change on a pool.. In the end I had to manually export the PG's from the OSD, ad them back in

[ceph-users] Re: OSDs use 200GB RAM and crash

2022-01-11 Thread Dan van der Ster
Hi Konstantin How many pglog entries did you have before and after trimming? Could you please also grab a log with debug_osd=20 and debug_ms=1 just before the crash? You can add all that to the tracker so the devs can try to get to the bottom of this. Best, Dan On Tue., Jan. 11, 2022, 19:13

[ceph-users] Re: OSDs use 200GB RAM and crash

2022-01-11 Thread David Yang
Hi, I have also encountered this problem before, I did not do other operations, just added a ssd as large as possible to create a swap partition. At the most when osd is restored, a storage node uses up 2T of swap. Then after the osd boots back to normal, the memory will be released and return to

[ceph-users] Re: OSDs use 200GB RAM and crash

2022-01-11 Thread Marius Leustean
Had the same issue after a pg_num increase. Indeed the convenient solution was to add the needed memory (either a Swap partition or physical RAM). Things will get back to normal after the initial start, you won’t have to keep that extra ram into your storage nodes. This is a really annoying issue