[ceph-users] Re: Ceph symbols for v15_2_0 in pacific libceph-common

2025-01-16 Thread Bill Scales
Hi,

Nothing to worry about here you are using the correct symbols – the v15_2_0 in 
symbols like ceph::buffer::v15_2_0::ptr::copy_out is an API version, not the 
code version. There have not been any API changes to ceph::buffer for several 
years so it still has v15_2_0 even in the latest squid release.

Cheers,

Bill.
bill_sca...@uk.ibm.com
IBM Distinguished Engineer, IBM Storage


From: Frank Schilder 
Date: Wednesday, 15 January 2025 at 18:33
To: ceph-users@ceph.io 
Subject: [EXTERNAL] [ceph-users] Ceph symbols for v15_2_0 in pacific 
libceph-common
Hi all,

during debugging of an MDS problem we observed something that looks odd. The 
output of perf top seems to show symbols from v15 (octopus) on a pacific (v16) 
installation:

  23.56%  ceph-mds  [.] std::_Rb_tree
   7.02%  libceph-common.so.2   [.] ceph::buffer::v15_2_0::ptr::copy_out
   4.99%  ceph-mds  [.] std::_Hashtable::copy
   2.53%  ceph-mds  [.] std::_Rb_tree::operator+=
   1.68%  ceph-mds  [.] MDCache::populate_mydir
   1.56%  ceph-mds  [.] std::_Rb_tree, std::_S
   1.38%  [kernel]  [k] clear_page_erms
   1.06%  [kernel]  [k] native_irq_return_iret

The pacific packages are installed from download.ceph.com and all package 
candidates claim to be v16:

# yum provides "/*/libceph-common.so.2"
Last metadata expiration check: 0:45:57 ago on Wed 15 Jan 2025 12:14:09 PM CET.
librados2-2:16.2.15-0.el8.x86_64 : RADOS distributed object store client library
Repo: @System
Matched from:
Filename: /usr/lib64/ceph/libceph-common.so.2

librados2-2:16.2.15-0.el8.x86_64 : RADOS distributed object store client library
Repo: ceph
Matched from:
Filename: /usr/lib64/ceph/libceph-common.so.2

Why do we see v15 symbols here or am I interpreting the symbol name 
ceph::buffer::v15_2_0::list::iterator_impl::copy incorrectly?

Thanks and best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Unless otherwise stated above:

IBM United Kingdom Limited
Registered in England and Wales with number 741598
Registered office: Building C, IBM Hursley Office, Hursley Park Road, 
Winchester, Hampshire SO21 2JN
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: MDS hung in purge_stale_snap_data after populating cache

2025-01-16 Thread Frank Schilder
The MDS was up over night and it started showing CPU load again. I added a 
screen show to the imgur post 
(https://imgur.com/a/mds-hung-purge-stale-snap-data-after-populating-cache-RF7ExSP).
 Unfortunately, its only the messenger threads. The MDS seems to idle around.

Best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Many misplaced PG's, full OSD's and a good amount of manual intervention to keep my Ceph cluster alive.

2025-01-16 Thread Janne Johansson
Den tors 16 jan. 2025 kl 00:08 skrev Bruno Gomes Pessanha <
bruno.pessa...@gmail.com>:

> Hi everyone. Yes. All the tips definitely helped! Now I have more free
> space in the pools, the number of misplaced PG's decreased a lot and lower
> std deviation of the usage of OSD's. The storage looks way healthier now.
> Thanks a bunch!
>
> I'm only confused by the number of misplaced PG's which never goes
> below 5%. Every time it hits 5% it goes up and down like shown in this
> quite interesting graph:
> [image: image.png]
>
> Any idea why that might be?
>
> I had the impression that it might be related to the autobalancer that
> kicks in and pg's are misplaced again. Or am I missing something?
>

Yes, this seems to be the balancer keeping 5% of your PGs moving to the
"correct" places. If you run an upmap remapper (or had PGs hindered by
backfill_toofull) then the balancer might have a long list it wants to
move, but it "only" does 5% at most at a time.

-- 
May the most significant bit of your life be positive.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: More objects misplaced than exist?

2025-01-16 Thread Andre Tann

Hi Anthony,

answering also to the list...


Am 16.01.25 um 15:52 schrieb Anthony D'Atri:

When I see anomalous status my first thought is to manually failover the mgr
I stopped the active mgr, another took over, but still the status is the 
same:


  data:
    volumes: 1/1 healthy
    pools:   4 pools, 2081 pgs
    objects: 3.53M objects, 12 TiB
    usage:   36 TiB used, 226 TiB / 262 TiB avail
    pgs: 17669564/10601739 objects misplaced (166.667%)
 2081 active+clean+remapped

What I still don't understand is: why does Ceph report usage 100% for 
all pools, which is nonsense? Can that be related to the confusing 
numbers of misplaces objects?


   root@pve01:~# ceph df
   --- RAW STORAGE ---
   CLASS SIZE    AVAIL    USED  RAW USED %RAW USED
   ssd    262 TiB  226 TiB  36 TiB    36 TiB  13.58
   TOTAL  262 TiB  226 TiB  36 TiB    36 TiB  13.58

   --- POOLS ---
   POOL ID   PGS   STORED OBJECTS USED %USED  MAX AVAIL
   .mgr  1 1  7.6 MiB 3   15 MiB  100.00    0 B
   ReplicationPool   2  1024  8.0 TiB 2.11M   24 TiB  100.00    0 B
   cephfs_data   7  1024  3.9 TiB 1.43M   12 TiB  100.00    0 B
   cephfs_metadata   8    32  268 MiB 143  804 MiB  100.00    0 B

--
Andre Tann
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Modify or override ceph_default_alerts.yml

2025-01-16 Thread Redouane Kachach
Hi Eugen,

Not sure if that will work or not (I didn't try it myself) but there's an
option to configure the ceph alerts path in cephadm:

Option(
'prometheus_alerts_path',
type='str',
*default='/etc/prometheus/ceph/ceph_default_alerts.yml'*,
desc='location of alerts to include in prometheus deployments',
),

The file */etc/prometheus/ceph/ceph_default_alerts.yml* comes with the ceph
container but you can adjust the above path variable to have the container
read other file of your choice (passing the corresponding mount)

As I said I didn't test the above... but sounds like an option.

Best,
Redo.

On Thu, Jan 16, 2025 at 3:26 PM Eugen Block  wrote:

> Hi Redo,
>
> I've been looking into the templates and have a question. Maybe you
> could help clarify. I understand that I can create custom alerts and
> inject them with:
>
> ceph config-key set
> mgr/cephadm/services/prometheus/alerting/custom_alerts.yml -i
> custom_alerts.yml
>
> It works when I want additional alerts, okay.
>
> But this way I can not override the original alert (let's stay at
> "CephPGImbalance" as an example. I can create my own alert as
> described above (I don't even have to rename it), let's say 3%
> deviation in a test cluster, but it would show up in *addition* to the
> original 30% deviation. And although this command works as well
> (trying to override the defaults):
>
> ceph config-key set
> mgr/cephadm/services/prometheus/alerting/ceph_alerts.yml -i
> ceph_alerts.yml
>
> The default 30% value is not overridden. So the question is, how to
> actually change the original alert other than the workaround we
> already discussed here? Or am I misunderstanding something here?
>
> Thanks!
> Eugen
>
> Zitat von Redouane Kachach :
>
> > Just FYI: cephadm does support providing/using a custom template (see the
> > docs on [1]). For example using the following cmd you can override the
> > prometheus template:
> >
> >> ceph config-key set mgr/cephadm/services/prometheus/prometheus.yml
> 
> >
> > After changing the template you have to reconfigure the service in order
> to
> > redeploy the daemons with your new config by:
> >
> >> ceph orch reconfig prometheus
> >
> > Then you can go to the corresponding directory
> > on /var/lib/ceph///... to see if the container
> has
> > got the new config.
> >
> >
> > *Note:* In general most of the templates have some variables and they are
> > used to dynamically generate the configuration files. So be careful when
> > changing the template. I'd recommend
> > using the current one as base (you can see where to find them in the
> docs)
> > and then modify it to add your custom config but without altering the
> > dynamic parts of the template.
> >
> > [1]
> https://docs.ceph.com/en/reef/cephadm/services/monitoring/#option-names
> >
> > Best,
> > Redo.
> >
> >
> > On Tue, Jan 14, 2025 at 8:45 AM Eugen Block  wrote:
> >
> >> Ah, I checked on a newer test cluster (Squid) and now I see what you
> >> mean. The alert is shown per OSD in the dashboard, if you open the
> >> dropdown you see which daemons are affected. I think it works a bit
> >> different in Pacific (that's what the customer is still running) when
> >> I last had to modify this. How many OSDs do you have? I noticed that
> >> it takes a few seconds for prometheus to clear the warning with only 3
> >> OSDs in my lab cluster. Maybe you could share a screenshot (with
> >> redacted sensitive data) showing the alerts? And the status of the
> >> affected OSDs as well.
> >>
> >>
> >> Zitat von "Devin A. Bougie" :
> >>
> >> > Hi Eugen,
> >> >
> >> > No, as far as I can tell I only have one prometheus service running.
> >> >
> >> > ———
> >> >
> >> > [root@cephman2 ~]# ceph orch ls prometheus --export
> >> >
> >> > service_type: prometheus
> >> >
> >> > service_name: prometheus
> >> >
> >> > placement:
> >> >
> >> >   count: 1
> >> >
> >> >   label: _admin
> >> >
> >> >
> >> > [root@cephman2 ~]# ceph orch ps --daemon-type prometheus
> >> >
> >> > NAME HOST PORTS   STATUS
> >> > REFRESHED  AGE  MEM USE  MEM LIM  VERSION  IMAGE ID
> >> > CONTAINER ID
> >> >
> >> > prometheus.cephman2  cephman2.classe.cornell.edu  *:9095  running
> >> > (12h) 4m ago   3w 350M-  2.43.0   a07b618ecd1d
> >> > 5a8d88682c28
> >> >
> >> > ———
> >> >
> >> > Anything else I can check or do?
> >> >
> >> > Thanks,
> >> > Devin
> >> >
> >> > On Jan 13, 2025, at 6:39 PM, Eugen Block  wrote:
> >> >
> >> > Do you have two Prometheus instances? Maybe you could share
> >> > ceph orch ls prometheus --export
> >> >
> >> > Or alternatively:
> >> > ceph orch ps --daemon-type prometheus
> >> >
> >> > You can use two instances for HA, but then you need to change the
> >> > threshold for both, of course.
> >> >
> >> > Zitat von "Devin A. Bougie"
> >> > mailto:devin.bou...@cornell.edu>>:
> >> >
> >> > Thanks, Eugen!  Just incase you have any more suggestions, this
> >> > still is

[ceph-users] Re: MDS hung in purge_stale_snap_data after populating cache

2025-01-16 Thread Frank Schilder
I think I finally found the moment where everything goes downhill. Please take 
a look at this comment: 
https://tracker.ceph.com/issues/69547?next_issue_id=69546#note-4 . This looks a 
lot like a timeout, but I have no clue what to look for. Any hint is greatly 
appreciated.

Thanks and best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: [EXTERNAL] Re: Cephadm: Specifying RGW Certs & Keys By Filepath

2025-01-16 Thread Alex Hussein-Kershaw (HE/HIM)
Oh actually I have spoke to soon. That does work, but it also exposes port HTTP 
over port 80. 🙁

  beast port=80 ssl_port=7480 ssl_certificate=/etc/ssl/certs/server.crt 
ssl_private_key=/etc/ssl/private/server.key


From: Alex Hussein-Kershaw (HE/HIM)
Sent: Thursday, January 16, 2025 5:59 PM
To: Redouane Kachach
Cc: ceph-users
Subject: Re: [EXTERNAL] Re: [ceph-users] Cephadm: Specifying RGW Certs & Keys 
By Filepath

Amazing. How did I miss that.

Dropping "ssl: true" and adding "ssl_port=1234" to the rgw_frontend_extra_args 
values has me sorted.

Many thanks!



From: Redouane Kachach
Sent: Thursday, January 16, 2025 4:39 PM
To: Alex Hussein-Kershaw (HE/HIM)
Cc: ceph-users
Subject: [EXTERNAL] Re: [ceph-users] Cephadm: Specifying RGW Certs & Keys By 
Filepath

You are getting the double option because "ssl: true" ... try to disable ssl 
since you are passing the arguments and certificates by hand!

Another option is to have cephadm generate the certificates for you by setting 
the `generate_cert` field in the spec to true. But I'm not sure if that works 
for your environment or not ...

Best,
Redo.

On Thu, Jan 16, 2025 at 4:37 PM Alex Hussein-Kershaw (HE/HIM) 
mailto:alex...@microsoft.com>> wrote:
Hi Folks,

Looking for some advice on RGW service specs and Cephadm. I've read the docs 
here:
RGW Service — Ceph 
Documentation

Using a service spec I can deploy a RGW:

service_type: rgw
service_id: '25069123'
service_name: rgw.25069123
placement:
  hosts:
  - raynor-sc-1
extra_container_args:
- -v
- /etc/pki:/etc/pki:ro
- -v
- /etc/ssl/:/etc/ssl:ro
spec:
  rgw_frontend_port: 7480
  rgw_realm: geored_realm
  rgw_zone: siteA
  rgw_zonegroup: geored_zg
  ssl: true
  rgw_frontend_extra_args:
  - "ssl_certificate=/etc/ssl/certs/server.crt"
  - "ssl_private_key=/etc/ssl/private/server.key"

But it fails to start because my "rgw_frontends" config ends up as this. Notice 
the two entries of "ssl_certificate".

beast ssl_port=7480 ssl_certificate=config://rgw/cert/rgw.25069123 
ssl_certificate=/etc/ssl/certs/server.crt 
ssl_private_key=/etc/ssl/private/server.key

That causes it to fail to start:

debug 2025-01-16T15:26:54.219+ 7f65bcfdc840  0 deferred set uid:gid to 
167:167 (ceph:ceph)
debug 2025-01-16T15:26:54.219+ 7f65bcfdc840  0 ceph version 19.2.0 
(16063ff2022298c9300e49a547a16ffda59baf13) squid (stable), process radosgw, pid 
7
debug 2025-01-16T15:26:54.219+ 7f65bcfdc840  0 framework: beast
debug 2025-01-16T15:26:54.219+ 7f65bcfdc840  0 framework conf key: 
ssl_port, val: 7480
debug 2025-01-16T15:26:54.219+ 7f65bcfdc840  0 framework conf key: 
ssl_certificate, val: config://rgw/cert/rgw.25069123
debug 2025-01-16T15:26:54.219+ 7f65bcfdc840  0 framework conf key: 
ssl_certificate, val: /etc/ssl/certs/server.crt
debug 2025-01-16T15:26:54.219+ 7f65bcfdc840  0 framework conf key: 
ssl_private_key, val: /etc/ssl/private/server.key
debug 2025-01-16T15:26:54.520+ 7f65bcfdc840 -1 LDAP not started since no 
server URIs were provided in the configuration.
debug 2025-01-16T15:26:54.531+ 7f65bcfdc840  0 framework: beast
debug 2025-01-16T15:26:54.531+ 7f65bcfdc840  0 framework conf key: 
ssl_certificate, val: config://rgw/cert/$realm/$zone.crt
debug 2025-01-16T15:26:54.531+ 7f65bcfdc840  0 framework conf key: 
ssl_private_key, val: config://rgw/cert/$realm/$zone.key
debug 2025-01-16T15:26:54.597+ 7f65bcfdc840  0 starting handler: beast
debug 2025-01-16T15:26:54.601+ 7f65bcfdc840 -1 ssl_certificate was not 
found: rgw/cert/rgw.25069123
debug 2025-01-16T15:26:54.602+ 7f65bcfdc840 -1 no ssl_certificate 
configured for ssl_port
debug 2025-01-16T15:26:54.602+ 7f65bcfdc840 -1 ERROR: failed initializing 
frontend
debug 2025-01-16T15:26:54.602+ 7f65bcfdc840 -1 ERROR:  initialize frontend 
fail, r = 22

Correcting that config manually:

$ ceph config set client.rgw.25069123.raynor-sc-1.qiatgr  rgw_frontends "beast 
ssl_port=7480 ssl_certificate=/etc/ssl/certs/server.crt 
ssl_private_key=/etc/ssl/private/server.key"

Then allows the RGW to start:

debug 2025-01-16T15:28:17.391+ 7fba6e85a840  0 deferred set uid:gid to 
167:167 (ceph:ceph)
debug 2025-01-16T15:28:17.391+ 7fba6e85a840  0 ceph version 19.2.0 
(16063ff2022298c9300e49a547a16ffda59baf13) squid (stable), process radosgw, pid 
7
debug 2025-01-16T15:28:17.391+ 7fba6e85a840  0 framework: beast
debug 2025-01-16T15:28:17.391+ 7fba6e85a840  0 framework conf key: 
ssl_port, val: 7480
debug 2025-01-16T15:28:17.391+ 7fba6e85a840  0 framework conf key: 
ssl_certificate, val: /etc/ssl/certs/server.crt
debug 2025-01-16T15:28:17.391+ 7fba6e85a840  0 framework conf key: 
ssl_private_key, val: /etc/ssl/private/server.key
debug 2025-01-16T15:28:17.767+ 7fba6e85a840 -1 LDAP not started since no 
server URIs were provided in the configuration.
debug 2025-01-16T15:28:17.781+ 7fb

[ceph-users] Re: Cephadm: Specifying RGW Certs & Keys By Filepath

2025-01-16 Thread Alex Hussein-Kershaw (HE/HIM)
I had a look at the code and came to the conclusion that this isn't possible 
currently, but I think I can make a small code change to support this.
I've raised a tracker:  Bug #69567: Cephadm: Specifying RGW Certs & Keys By 
Filepath - Orchestrator - Ceph, and will 
submit a PR.


From: Alex Hussein-Kershaw (HE/HIM)
Sent: Thursday, January 16, 2025 3:36 PM
To: ceph-users
Subject: Cephadm: Specifying RGW Certs & Keys By Filepath

Hi Folks,

Looking for some advice on RGW service specs and Cephadm. I've read the docs 
here:
RGW Service — Ceph 
Documentation

Using a service spec I can deploy a RGW:

service_type: rgw
service_id: '25069123'
service_name: rgw.25069123
placement:
  hosts:
  - raynor-sc-1
extra_container_args:
- -v
- /etc/pki:/etc/pki:ro
- -v
- /etc/ssl/:/etc/ssl:ro
spec:
  rgw_frontend_port: 7480
  rgw_realm: geored_realm
  rgw_zone: siteA
  rgw_zonegroup: geored_zg
  ssl: true
  rgw_frontend_extra_args:
  - "ssl_certificate=/etc/ssl/certs/server.crt"
  - "ssl_private_key=/etc/ssl/private/server.key"

But it fails to start because my "rgw_frontends" config ends up as this. Notice 
the two entries of "ssl_certificate".

  beast ssl_port=7480 ssl_certificate=config://rgw/cert/rgw.25069123 
ssl_certificate=/etc/ssl/certs/server.crt 
ssl_private_key=/etc/ssl/private/server.key

That causes it to fail to start:

debug 2025-01-16T15:26:54.219+ 7f65bcfdc840  0 deferred set uid:gid to 
167:167 (ceph:ceph)
debug 2025-01-16T15:26:54.219+ 7f65bcfdc840  0 ceph version 19.2.0 
(16063ff2022298c9300e49a547a16ffda59baf13) squid (stable), process radosgw, pid 
7
debug 2025-01-16T15:26:54.219+ 7f65bcfdc840  0 framework: beast
debug 2025-01-16T15:26:54.219+ 7f65bcfdc840  0 framework conf key: 
ssl_port, val: 7480
debug 2025-01-16T15:26:54.219+ 7f65bcfdc840  0 framework conf key: 
ssl_certificate, val: config://rgw/cert/rgw.25069123
debug 2025-01-16T15:26:54.219+ 7f65bcfdc840  0 framework conf key: 
ssl_certificate, val: /etc/ssl/certs/server.crt
debug 2025-01-16T15:26:54.219+ 7f65bcfdc840  0 framework conf key: 
ssl_private_key, val: /etc/ssl/private/server.key
debug 2025-01-16T15:26:54.520+ 7f65bcfdc840 -1 LDAP not started since no 
server URIs were provided in the configuration.
debug 2025-01-16T15:26:54.531+ 7f65bcfdc840  0 framework: beast
debug 2025-01-16T15:26:54.531+ 7f65bcfdc840  0 framework conf key: 
ssl_certificate, val: config://rgw/cert/$realm/$zone.crt
debug 2025-01-16T15:26:54.531+ 7f65bcfdc840  0 framework conf key: 
ssl_private_key, val: config://rgw/cert/$realm/$zone.key
debug 2025-01-16T15:26:54.597+ 7f65bcfdc840  0 starting handler: beast
debug 2025-01-16T15:26:54.601+ 7f65bcfdc840 -1 ssl_certificate was not 
found: rgw/cert/rgw.25069123
debug 2025-01-16T15:26:54.602+ 7f65bcfdc840 -1 no ssl_certificate 
configured for ssl_port
debug 2025-01-16T15:26:54.602+ 7f65bcfdc840 -1 ERROR: failed initializing 
frontend
debug 2025-01-16T15:26:54.602+ 7f65bcfdc840 -1 ERROR:  initialize frontend 
fail, r = 22

Correcting that config manually:

  $ ceph config set client.rgw.25069123.raynor-sc-1.qiatgr  rgw_frontends 
"beast ssl_port=7480 ssl_certificate=/etc/ssl/certs/server.crt 
ssl_private_key=/etc/ssl/private/server.key"

Then allows the RGW to start:

debug 2025-01-16T15:28:17.391+ 7fba6e85a840  0 deferred set uid:gid to 
167:167 (ceph:ceph)
debug 2025-01-16T15:28:17.391+ 7fba6e85a840  0 ceph version 19.2.0 
(16063ff2022298c9300e49a547a16ffda59baf13) squid (stable), process radosgw, pid 
7
debug 2025-01-16T15:28:17.391+ 7fba6e85a840  0 framework: beast
debug 2025-01-16T15:28:17.391+ 7fba6e85a840  0 framework conf key: 
ssl_port, val: 7480
debug 2025-01-16T15:28:17.391+ 7fba6e85a840  0 framework conf key: 
ssl_certificate, val: /etc/ssl/certs/server.crt
debug 2025-01-16T15:28:17.391+ 7fba6e85a840  0 framework conf key: 
ssl_private_key, val: /etc/ssl/private/server.key
debug 2025-01-16T15:28:17.767+ 7fba6e85a840 -1 LDAP not started since no 
server URIs were provided in the configuration.
debug 2025-01-16T15:28:17.781+ 7fba6e85a840  0 framework: beast
debug 2025-01-16T15:28:17.781+ 7fba6e85a840  0 framework conf key: 
ssl_certificate, val: config://rgw/cert/$realm/$zone.crt
debug 2025-01-16T15:28:17.781+ 7fba6e85a840  0 framework conf key: 
ssl_private_key, val: config://rgw/cert/$realm/$zone.key
debug 2025-01-16T15:28:17.860+ 7fba6e85a840  0 starting handler: beast
debug 2025-01-16T15:28:17.908+ 7fba6e85a840  0 set uid:gid to 167:167 
(ceph:ceph)
debug 2025-01-16T15:28:17.912+ 7fba6e85a840  1 mgrc service_daemon_register 
rgw.643295 metadata {arch=x86_64,ceph_release=squid,ceph_version=ceph version 
19.2.0 (16063ff2022298c9300e49a547a16ffda59baf13) squid 
(stable),ceph_version_short=19.2.0,container_hostname=raynor-sc-1,container_image=ceph/squid:v19.2.0,cpu

[ceph-users] Re: Cephadm: Specifying RGW Certs & Keys By Filepath

2025-01-16 Thread Redouane Kachach
You are getting the double option because "ssl: true" ... try to disable
ssl since you are passing the arguments and certificates by hand!

Another option is to have cephadm generate the certificates for you by
setting the `generate_cert` field in the spec to true. But I'm not sure if
that works for your environment or not ...

Best,
Redo.

On Thu, Jan 16, 2025 at 4:37 PM Alex Hussein-Kershaw (HE/HIM) <
alex...@microsoft.com> wrote:

> Hi Folks,
>
> Looking for some advice on RGW service specs and Cephadm. I've read the
> docs here:
> RGW Service — Ceph Documentation<
> https://docs.ceph.com/en/reef/cephadm/services/rgw/>
>
> Using a service spec I can deploy a RGW:
>
> service_type: rgw
> service_id: '25069123'
> service_name: rgw.25069123
> placement:
>   hosts:
>   - raynor-sc-1
> extra_container_args:
> - -v
> - /etc/pki:/etc/pki:ro
> - -v
> - /etc/ssl/:/etc/ssl:ro
> spec:
>   rgw_frontend_port: 7480
>   rgw_realm: geored_realm
>   rgw_zone: siteA
>   rgw_zonegroup: geored_zg
>   ssl: true
>   rgw_frontend_extra_args:
>   - "ssl_certificate=/etc/ssl/certs/server.crt"
>   - "ssl_private_key=/etc/ssl/private/server.key"
>
> But it fails to start because my "rgw_frontends" config ends up as this.
> Notice the two entries of "ssl_certificate".
>
> beast ssl_port=7480 ssl_certificate=config://rgw/cert/rgw.25069123
> ssl_certificate=/etc/ssl/certs/server.crt
> ssl_private_key=/etc/ssl/private/server.key
>
> That causes it to fail to start:
>
> debug 2025-01-16T15:26:54.219+ 7f65bcfdc840  0 deferred set uid:gid to
> 167:167 (ceph:ceph)
> debug 2025-01-16T15:26:54.219+ 7f65bcfdc840  0 ceph version 19.2.0
> (16063ff2022298c9300e49a547a16ffda59baf13) squid (stable), process radosgw,
> pid 7
> debug 2025-01-16T15:26:54.219+ 7f65bcfdc840  0 framework: beast
> debug 2025-01-16T15:26:54.219+ 7f65bcfdc840  0 framework conf key:
> ssl_port, val: 7480
> debug 2025-01-16T15:26:54.219+ 7f65bcfdc840  0 framework conf key:
> ssl_certificate, val: config://rgw/cert/rgw.25069123
> debug 2025-01-16T15:26:54.219+ 7f65bcfdc840  0 framework conf key:
> ssl_certificate, val: /etc/ssl/certs/server.crt
> debug 2025-01-16T15:26:54.219+ 7f65bcfdc840  0 framework conf key:
> ssl_private_key, val: /etc/ssl/private/server.key
> debug 2025-01-16T15:26:54.520+ 7f65bcfdc840 -1 LDAP not started since
> no server URIs were provided in the configuration.
> debug 2025-01-16T15:26:54.531+ 7f65bcfdc840  0 framework: beast
> debug 2025-01-16T15:26:54.531+ 7f65bcfdc840  0 framework conf key:
> ssl_certificate, val: config://rgw/cert/$realm/$zone.crt
> debug 2025-01-16T15:26:54.531+ 7f65bcfdc840  0 framework conf key:
> ssl_private_key, val: config://rgw/cert/$realm/$zone.key
> debug 2025-01-16T15:26:54.597+ 7f65bcfdc840  0 starting handler: beast
> debug 2025-01-16T15:26:54.601+ 7f65bcfdc840 -1 ssl_certificate was not
> found: rgw/cert/rgw.25069123
> debug 2025-01-16T15:26:54.602+ 7f65bcfdc840 -1 no ssl_certificate
> configured for ssl_port
> debug 2025-01-16T15:26:54.602+ 7f65bcfdc840 -1 ERROR: failed
> initializing frontend
> debug 2025-01-16T15:26:54.602+ 7f65bcfdc840 -1 ERROR:  initialize
> frontend fail, r = 22
>
> Correcting that config manually:
>
> $ ceph config set client.rgw.25069123.raynor-sc-1.qiatgr  rgw_frontends
> "beast ssl_port=7480 ssl_certificate=/etc/ssl/certs/server.crt
> ssl_private_key=/etc/ssl/private/server.key"
>
> Then allows the RGW to start:
>
> debug 2025-01-16T15:28:17.391+ 7fba6e85a840  0 deferred set uid:gid to
> 167:167 (ceph:ceph)
> debug 2025-01-16T15:28:17.391+ 7fba6e85a840  0 ceph version 19.2.0
> (16063ff2022298c9300e49a547a16ffda59baf13) squid (stable), process radosgw,
> pid 7
> debug 2025-01-16T15:28:17.391+ 7fba6e85a840  0 framework: beast
> debug 2025-01-16T15:28:17.391+ 7fba6e85a840  0 framework conf key:
> ssl_port, val: 7480
> debug 2025-01-16T15:28:17.391+ 7fba6e85a840  0 framework conf key:
> ssl_certificate, val: /etc/ssl/certs/server.crt
> debug 2025-01-16T15:28:17.391+ 7fba6e85a840  0 framework conf key:
> ssl_private_key, val: /etc/ssl/private/server.key
> debug 2025-01-16T15:28:17.767+ 7fba6e85a840 -1 LDAP not started since
> no server URIs were provided in the configuration.
> debug 2025-01-16T15:28:17.781+ 7fba6e85a840  0 framework: beast
> debug 2025-01-16T15:28:17.781+ 7fba6e85a840  0 framework conf key:
> ssl_certificate, val: config://rgw/cert/$realm/$zone.crt
> debug 2025-01-16T15:28:17.781+ 7fba6e85a840  0 framework conf key:
> ssl_private_key, val: config://rgw/cert/$realm/$zone.key
> debug 2025-01-16T15:28:17.860+ 7fba6e85a840  0 starting handler: beast
> debug 2025-01-16T15:28:17.908+ 7fba6e85a840  0 set uid:gid to 167:167
> (ceph:ceph)
> debug 2025-01-16T15:28:17.912+ 7fba6e85a840  1 mgrc
> service_daemon_register rgw.643295 metadata
> {arch=x86_64,ceph_release=squid,ceph_version=ceph version 19.2.0
> (16063ff2022298c9300e49a547a16ffda59baf13) squid
> (stable),ceph_ver

[ceph-users] Re: [EXTERNAL] Re: Cephadm: Specifying RGW Certs & Keys By Filepath

2025-01-16 Thread Alex Hussein-Kershaw (HE/HIM)
Amazing. How did I miss that.

Dropping "ssl: true" and adding "ssl_port=1234" to the rgw_frontend_extra_args 
values has me sorted.

Many thanks!



From: Redouane Kachach
Sent: Thursday, January 16, 2025 4:39 PM
To: Alex Hussein-Kershaw (HE/HIM)
Cc: ceph-users
Subject: [EXTERNAL] Re: [ceph-users] Cephadm: Specifying RGW Certs & Keys By 
Filepath

You are getting the double option because "ssl: true" ... try to disable ssl 
since you are passing the arguments and certificates by hand!

Another option is to have cephadm generate the certificates for you by setting 
the `generate_cert` field in the spec to true. But I'm not sure if that works 
for your environment or not ...

Best,
Redo.

On Thu, Jan 16, 2025 at 4:37 PM Alex Hussein-Kershaw (HE/HIM) 
mailto:alex...@microsoft.com>> wrote:
Hi Folks,

Looking for some advice on RGW service specs and Cephadm. I've read the docs 
here:
RGW Service — Ceph 
Documentation

Using a service spec I can deploy a RGW:

service_type: rgw
service_id: '25069123'
service_name: rgw.25069123
placement:
  hosts:
  - raynor-sc-1
extra_container_args:
- -v
- /etc/pki:/etc/pki:ro
- -v
- /etc/ssl/:/etc/ssl:ro
spec:
  rgw_frontend_port: 7480
  rgw_realm: geored_realm
  rgw_zone: siteA
  rgw_zonegroup: geored_zg
  ssl: true
  rgw_frontend_extra_args:
  - "ssl_certificate=/etc/ssl/certs/server.crt"
  - "ssl_private_key=/etc/ssl/private/server.key"

But it fails to start because my "rgw_frontends" config ends up as this. Notice 
the two entries of "ssl_certificate".

beast ssl_port=7480 ssl_certificate=config://rgw/cert/rgw.25069123 
ssl_certificate=/etc/ssl/certs/server.crt 
ssl_private_key=/etc/ssl/private/server.key

That causes it to fail to start:

debug 2025-01-16T15:26:54.219+ 7f65bcfdc840  0 deferred set uid:gid to 
167:167 (ceph:ceph)
debug 2025-01-16T15:26:54.219+ 7f65bcfdc840  0 ceph version 19.2.0 
(16063ff2022298c9300e49a547a16ffda59baf13) squid (stable), process radosgw, pid 
7
debug 2025-01-16T15:26:54.219+ 7f65bcfdc840  0 framework: beast
debug 2025-01-16T15:26:54.219+ 7f65bcfdc840  0 framework conf key: 
ssl_port, val: 7480
debug 2025-01-16T15:26:54.219+ 7f65bcfdc840  0 framework conf key: 
ssl_certificate, val: config://rgw/cert/rgw.25069123
debug 2025-01-16T15:26:54.219+ 7f65bcfdc840  0 framework conf key: 
ssl_certificate, val: /etc/ssl/certs/server.crt
debug 2025-01-16T15:26:54.219+ 7f65bcfdc840  0 framework conf key: 
ssl_private_key, val: /etc/ssl/private/server.key
debug 2025-01-16T15:26:54.520+ 7f65bcfdc840 -1 LDAP not started since no 
server URIs were provided in the configuration.
debug 2025-01-16T15:26:54.531+ 7f65bcfdc840  0 framework: beast
debug 2025-01-16T15:26:54.531+ 7f65bcfdc840  0 framework conf key: 
ssl_certificate, val: config://rgw/cert/$realm/$zone.crt
debug 2025-01-16T15:26:54.531+ 7f65bcfdc840  0 framework conf key: 
ssl_private_key, val: config://rgw/cert/$realm/$zone.key
debug 2025-01-16T15:26:54.597+ 7f65bcfdc840  0 starting handler: beast
debug 2025-01-16T15:26:54.601+ 7f65bcfdc840 -1 ssl_certificate was not 
found: rgw/cert/rgw.25069123
debug 2025-01-16T15:26:54.602+ 7f65bcfdc840 -1 no ssl_certificate 
configured for ssl_port
debug 2025-01-16T15:26:54.602+ 7f65bcfdc840 -1 ERROR: failed initializing 
frontend
debug 2025-01-16T15:26:54.602+ 7f65bcfdc840 -1 ERROR:  initialize frontend 
fail, r = 22

Correcting that config manually:

$ ceph config set client.rgw.25069123.raynor-sc-1.qiatgr  rgw_frontends "beast 
ssl_port=7480 ssl_certificate=/etc/ssl/certs/server.crt 
ssl_private_key=/etc/ssl/private/server.key"

Then allows the RGW to start:

debug 2025-01-16T15:28:17.391+ 7fba6e85a840  0 deferred set uid:gid to 
167:167 (ceph:ceph)
debug 2025-01-16T15:28:17.391+ 7fba6e85a840  0 ceph version 19.2.0 
(16063ff2022298c9300e49a547a16ffda59baf13) squid (stable), process radosgw, pid 
7
debug 2025-01-16T15:28:17.391+ 7fba6e85a840  0 framework: beast
debug 2025-01-16T15:28:17.391+ 7fba6e85a840  0 framework conf key: 
ssl_port, val: 7480
debug 2025-01-16T15:28:17.391+ 7fba6e85a840  0 framework conf key: 
ssl_certificate, val: /etc/ssl/certs/server.crt
debug 2025-01-16T15:28:17.391+ 7fba6e85a840  0 framework conf key: 
ssl_private_key, val: /etc/ssl/private/server.key
debug 2025-01-16T15:28:17.767+ 7fba6e85a840 -1 LDAP not started since no 
server URIs were provided in the configuration.
debug 2025-01-16T15:28:17.781+ 7fba6e85a840  0 framework: beast
debug 2025-01-16T15:28:17.781+ 7fba6e85a840  0 framework conf key: 
ssl_certificate, val: config://rgw/cert/$realm/$zone.crt
debug 2025-01-16T15:28:17.781+ 7fba6e85a840  0 framework conf key: 
ssl_private_key, val: config://rgw/cert/$realm/$zone.key
debug 2025-01-16T15:28:17.860+ 7fba6e85a840  0 starting handler: beast
debug 2025-01-16T15:28:17.908+ 7fba6e85a840  0 set uid:gid to 167:167 
(ceph:ceph)
debug 20

[ceph-users] Re: Modify or override ceph_default_alerts.yml

2025-01-16 Thread Eugen Block

Hi Redo,

I've been looking into the templates and have a question. Maybe you  
could help clarify. I understand that I can create custom alerts and  
inject them with:


ceph config-key set  
mgr/cephadm/services/prometheus/alerting/custom_alerts.yml -i  
custom_alerts.yml


It works when I want additional alerts, okay.

But this way I can not override the original alert (let's stay at  
"CephPGImbalance" as an example. I can create my own alert as  
described above (I don't even have to rename it), let's say 3%  
deviation in a test cluster, but it would show up in *addition* to the  
original 30% deviation. And although this command works as well  
(trying to override the defaults):


ceph config-key set  
mgr/cephadm/services/prometheus/alerting/ceph_alerts.yml -i  
ceph_alerts.yml


The default 30% value is not overridden. So the question is, how to  
actually change the original alert other than the workaround we  
already discussed here? Or am I misunderstanding something here?


Thanks!
Eugen

Zitat von Redouane Kachach :


Just FYI: cephadm does support providing/using a custom template (see the
docs on [1]). For example using the following cmd you can override the
prometheus template:


ceph config-key set mgr/cephadm/services/prometheus/prometheus.yml 


After changing the template you have to reconfigure the service in order to
redeploy the daemons with your new config by:


ceph orch reconfig prometheus


Then you can go to the corresponding directory
on /var/lib/ceph///... to see if the container has
got the new config.


*Note:* In general most of the templates have some variables and they are
used to dynamically generate the configuration files. So be careful when
changing the template. I'd recommend
using the current one as base (you can see where to find them in the docs)
and then modify it to add your custom config but without altering the
dynamic parts of the template.

[1] https://docs.ceph.com/en/reef/cephadm/services/monitoring/#option-names

Best,
Redo.


On Tue, Jan 14, 2025 at 8:45 AM Eugen Block  wrote:


Ah, I checked on a newer test cluster (Squid) and now I see what you
mean. The alert is shown per OSD in the dashboard, if you open the
dropdown you see which daemons are affected. I think it works a bit
different in Pacific (that's what the customer is still running) when
I last had to modify this. How many OSDs do you have? I noticed that
it takes a few seconds for prometheus to clear the warning with only 3
OSDs in my lab cluster. Maybe you could share a screenshot (with
redacted sensitive data) showing the alerts? And the status of the
affected OSDs as well.


Zitat von "Devin A. Bougie" :

> Hi Eugen,
>
> No, as far as I can tell I only have one prometheus service running.
>
> ———
>
> [root@cephman2 ~]# ceph orch ls prometheus --export
>
> service_type: prometheus
>
> service_name: prometheus
>
> placement:
>
>   count: 1
>
>   label: _admin
>
>
> [root@cephman2 ~]# ceph orch ps --daemon-type prometheus
>
> NAME HOST PORTS   STATUS
> REFRESHED  AGE  MEM USE  MEM LIM  VERSION  IMAGE ID
> CONTAINER ID
>
> prometheus.cephman2  cephman2.classe.cornell.edu  *:9095  running
> (12h) 4m ago   3w 350M-  2.43.0   a07b618ecd1d
> 5a8d88682c28
>
> ———
>
> Anything else I can check or do?
>
> Thanks,
> Devin
>
> On Jan 13, 2025, at 6:39 PM, Eugen Block  wrote:
>
> Do you have two Prometheus instances? Maybe you could share
> ceph orch ls prometheus --export
>
> Or alternatively:
> ceph orch ps --daemon-type prometheus
>
> You can use two instances for HA, but then you need to change the
> threshold for both, of course.
>
> Zitat von "Devin A. Bougie"
> mailto:devin.bou...@cornell.edu>>:
>
> Thanks, Eugen!  Just incase you have any more suggestions, this
> still isn’t quite working for us.
>
> Perhaps one clue is that in the Alerts view of the cephadm
> dashboard, every alert is listed twice.  We see two CephPGImbalance
> alerts, both set to 30% after redeploying the service.  If I then
> follow your procedure, one of the alerts updates to 50% as
> configured, but the other stays at 30.  Is it normal to see each
> alert listed twice, or did I somehow make a mess of things when
> trying to change the default alerts?
>
> No problem if it’s not an obvious answer, we can live with and
> ignore the spurious CephPGImbalance alerts.
>
> Thanks again,
> Devin
>
> On Jan 7, 2025, at 2:14 AM, Eugen Block  wrote:
>
> Hi,
>
> sure thing, here's the diff how I changed it to 50% deviation instead of
30%:
>
> ---snip---
> diff -u
>
/var/lib/ceph/{FSID}/prometheus.host1/etc/prometheus/alerting/ceph_alerts.yml

>
/var/lib/ceph/{FSID}/prometheus.host1/etc/prometheus/alerting/ceph_alerts.yml.dist
> ---
>
/var/lib/ceph/{FSID}/prometheus.host1/etc/prometheus/alerting/ceph_alerts.yml
  2024-12-17 10:03:23.540179209
> +0100
> +++
>
/var/lib/ceph/{FSID}/prometheus.host1/etc/prometheus/alerting/ceph_alerts.yml.dist
 2024-12-17 10:03:00.380883413
> +

[ceph-users] More objects misplaced than exist?

2025-01-16 Thread Andre Tann

Hi all,

# ceph -w
...
volumes: 1/1 healthy
pools:   4 pools, 2081 pgs
objects: 3.72M objects, 12 TiB
usage:   36 TiB used, 226 TiB / 262 TiB avail
pgs: 18590764/11154459 objects misplaced (166.667%)
 2081 active+clean+remapped

How can more objects be misplaced than exist in the pools?

Also ceph reports for all pools 100% usage. But also it shows that 35 
out of 250 TB are used. So there is sth weird.


Any ideas what could be wrong here?

--
Andre Tann
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Slow initial boot of OSDs in large cluster with unclean state

2025-01-16 Thread Thomas Byrne - STFC UKRI
Hi Frédéric,

We've had an internal discussion, and we would love to share our experience as 
a case study. If you still think this would be of interest, please let us know 
what we need to do.

We have had 5 monitors on this cluster from about 2018 I think. I actually did 
a quick investigation in November '24 into OSD performance with our production 
cluster and workload with and without co-located RocksDB. The short answer was 
it made surprisingly little difference in average IOPS hitting the HDD under 
normal running, but as you say, I'm sure there are times where the HDD IOPS are 
a limiting factor. I'm happy to share my (fairly rough) report on the work with 
people if there is interest.

As I said elsewhere, I actually noted the slow OSDmap download/initial boot 
when adding fully SSD OSDs to this cluster, which I expected to be a lot 
faster, hence why I started looking into it.

Thanks,
Tom


From: Frédéric Nass 
Sent: Thursday, January 9, 2025 13:32
To: Byrne, Thomas (STFC,RAL,SC) 
Cc: Wesley Dillingham ; ceph-users 
Subject: Re: [ceph-users] Re: Slow initial boot of OSDs in large cluster with 
unclean state
 
Hi Tom,

Great talk there!

Since your cluster must be one of the largest in the world, it would be nice to 
share your experience with the community as a case study [1]. The Ceph project 
is looking for contributors right now.
If interested, let me know and we'll see how we can organize that.

I couldn't find how many MONs you're running in that big cluster. Hopefully 5 
MONs.

You said OSDs have collocated WAL/DBs on HDDs. Have you tried running OSDs with 
WAL/DBs on NVMes?

I'm wondering about the influence of WAL/DBs collocated on HDDs on OSD creation 
time, OSD startup time, peering and osdmap updates, and the role it might play 
regarding flapping, when DB IOs compete with client IOs, even with 100% 
active+clean PGs.

Cheers,
Frédéric.

[1] https://ceph.io/en/discover/case-studies/

- Le 8 Jan 25, à 16:10, Thomas Byrne, STFC UKRI tom.by...@stfc.ac.uk a 
écrit :

> Hi Frédéric,
>
> All of our recent OSD crashes can be attributed to genuine hardware issues 
> (i.e.
> failed IO due to unreadable sectors). For reference I've had a look and it
> looks like we've had a handful of drive failures on this cluster in the past
> month, with no other significant flapping. I was trying to say doesn't take
> many drive failures, combined with the balancer running to result in a
> persistent level of OSDMap churn.
>
> Storage nodes are all some derivative of a 24 bay, 2U chassis (e.g 760XD2)
> Single 25Gig connection, no jumboframes
> HDDs range from 12-20TB SAS HDDs depending on year purchased, with collocated
> WAL/DBs on the HDDs.
> All BlueStore OSDs
> Mons have dedicated flash devices for their stores
>
> The workload is radosstriper access to EC pools, so very limited metadata
> requirements (hence the lack of flash for OSDs). More info on the workload
> details can be seen on an very old talk of mine from a Ceph day [1].
>
> [1] https://indico.cern.ch/event/765214/contributions/3517140/
>
> 
> From: Frédéric Nass 
> Sent: Wednesday, January 8, 2025 12:59
> To: Byrne, Thomas (STFC,RAL,SC) 
> Cc: Wesley Dillingham ; ceph-users 
> 
> Subject: Re: [ceph-users] Re: Slow initial boot of OSDs in large cluster with
> unclean state
> 
>
> Hi Tom,
>
> Could you describe this cluster from a hardware perspective? Network speed and
> MTU size, HDD type and capacity, whether OSDs have their WAL/DB on SSD/NVMe or
> if they're collocated, whether MONs are using HDDs or SSDs/NVMe, what 
> workloads
> this cluster is handling?
>
> You mentioned OSD flapping. This phenomenon should no longer occur today on 
> any
> cluster, or very rarely, only in cases of actual hardware failure or when
> hardware is undersized relative to the workloads. All your OSDs are using
> Bluestore, correct?
>
> Regards,
> Frédéric.
>
> - Le 8 Jan 25, à 12:29, Thomas Byrne - STFC UKRI tom.by...@stfc.ac.uk a
> écrit :
>
>> Hi Wes,
>>
>> It works out at about five new osdmaps a minute, which is about normal for 
>> this
>> cluster's state changes as far as I can tell. It'll drop down to 2-3
>> maps/minute during quiet periods, but the combination of the upmap balancer
>> making changes and occasional OSD flaps or crashes due to hardware issues is
>> enough to cause a fairly reliable rate of osdmap churn.
>>
>> This churn is something that we are working on understanding, and reducing 
>> where
>> possible, now that we know becoming a pain point for us.
>>
>> Thanks,
>> Tom
>>
>> 
>> From: Wesley Dillingham 
>> Sent: Tuesday, January 7, 2025 18:41
>> To: Byrne, Thomas (STFC,RAL,SC) 
>> Cc: ceph-users@ceph.io 
>> Subject: Re: [ceph-users] Slow initial boot of OSDs in large cluster with
>> unclean state
>>
>> It went from normal osdmap range 500-1000 maps to 30,000 maps in 5 days? That
>> seems like excessive accumulatio

[ceph-users] Re: issue with new AWS cli when upload: MissingContentLength

2025-01-16 Thread Christian Rohmann
I added Matt as CC as he is the one who implemented the new checksum 
capabilities I reference below ...



On 16.01.25 8:26 AM, Szabo, Istvan (Agoda) wrote:

Amazon released a new version of their cli today 
https://github.com/aws/aws-cli/tags and seems to break our stuffs with the 
following error when PUT object happens:

bash-4.2# /usr/local/bin/aws --endpoint=https://endpoint --no-verify-ssl s3 cp 
online.txt s3://bucket/
upload failed: ./online.txt to s3://bucket/online.txt An error occurred 
(MissingContentLength) when calling the PutObject operation: Unknown

Wonder is there anything on ceph side that I can do to eliminate this?
The amazon workaround response_checksum_validation = when_required did not work.
https://github.com/aws/aws-cli/issues/9214


I had issues using their s3-mointpoint 
(https://github.com/awslabs/mountpoint-s3). This was caused by missing 
checksum features in RGW, see
https://tracker.ceph.com/issues/63153 -> 
https://github.com/ceph/ceph/pull/54856

While merged, I suppose these have not made it into a release yet.

Client using AWS Go SDK v2 had issues because that utilized the newer 
features. See e.g. terraform 
https://github.com/hashicorp/terraform/issues/34086


Looking at the changelog of the recent aws-cli release ...

 * https://github.com/aws/aws-cli/blob/2.23.0/CHANGELOG.rst
 * https://github.com/aws/aws-cli/blob/1.37.0/CHANGELOG.rst

it seems there is a lot more checksumming now being done.


Whatever new feature / checksum you find to cause this incompatibility 
should go as test case into the s3-tests 
(https://github.com/ceph/s3-tests) to ensure this feature is tested from 
then on.



Regards

Christian
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Cephadm: Specifying RGW Certs & Keys By Filepath

2025-01-16 Thread Alex Hussein-Kershaw (HE/HIM)
Hi Folks,

Looking for some advice on RGW service specs and Cephadm. I've read the docs 
here:
RGW Service — Ceph 
Documentation

Using a service spec I can deploy a RGW:

service_type: rgw
service_id: '25069123'
service_name: rgw.25069123
placement:
  hosts:
  - raynor-sc-1
extra_container_args:
- -v
- /etc/pki:/etc/pki:ro
- -v
- /etc/ssl/:/etc/ssl:ro
spec:
  rgw_frontend_port: 7480
  rgw_realm: geored_realm
  rgw_zone: siteA
  rgw_zonegroup: geored_zg
  ssl: true
  rgw_frontend_extra_args:
  - "ssl_certificate=/etc/ssl/certs/server.crt"
  - "ssl_private_key=/etc/ssl/private/server.key"

But it fails to start because my "rgw_frontends" config ends up as this. Notice 
the two entries of "ssl_certificate".

  beast ssl_port=7480 ssl_certificate=config://rgw/cert/rgw.25069123 
ssl_certificate=/etc/ssl/certs/server.crt 
ssl_private_key=/etc/ssl/private/server.key

That causes it to fail to start:

debug 2025-01-16T15:26:54.219+ 7f65bcfdc840  0 deferred set uid:gid to 
167:167 (ceph:ceph)
debug 2025-01-16T15:26:54.219+ 7f65bcfdc840  0 ceph version 19.2.0 
(16063ff2022298c9300e49a547a16ffda59baf13) squid (stable), process radosgw, pid 
7
debug 2025-01-16T15:26:54.219+ 7f65bcfdc840  0 framework: beast
debug 2025-01-16T15:26:54.219+ 7f65bcfdc840  0 framework conf key: 
ssl_port, val: 7480
debug 2025-01-16T15:26:54.219+ 7f65bcfdc840  0 framework conf key: 
ssl_certificate, val: config://rgw/cert/rgw.25069123
debug 2025-01-16T15:26:54.219+ 7f65bcfdc840  0 framework conf key: 
ssl_certificate, val: /etc/ssl/certs/server.crt
debug 2025-01-16T15:26:54.219+ 7f65bcfdc840  0 framework conf key: 
ssl_private_key, val: /etc/ssl/private/server.key
debug 2025-01-16T15:26:54.520+ 7f65bcfdc840 -1 LDAP not started since no 
server URIs were provided in the configuration.
debug 2025-01-16T15:26:54.531+ 7f65bcfdc840  0 framework: beast
debug 2025-01-16T15:26:54.531+ 7f65bcfdc840  0 framework conf key: 
ssl_certificate, val: config://rgw/cert/$realm/$zone.crt
debug 2025-01-16T15:26:54.531+ 7f65bcfdc840  0 framework conf key: 
ssl_private_key, val: config://rgw/cert/$realm/$zone.key
debug 2025-01-16T15:26:54.597+ 7f65bcfdc840  0 starting handler: beast
debug 2025-01-16T15:26:54.601+ 7f65bcfdc840 -1 ssl_certificate was not 
found: rgw/cert/rgw.25069123
debug 2025-01-16T15:26:54.602+ 7f65bcfdc840 -1 no ssl_certificate 
configured for ssl_port
debug 2025-01-16T15:26:54.602+ 7f65bcfdc840 -1 ERROR: failed initializing 
frontend
debug 2025-01-16T15:26:54.602+ 7f65bcfdc840 -1 ERROR:  initialize frontend 
fail, r = 22

Correcting that config manually:

  $ ceph config set client.rgw.25069123.raynor-sc-1.qiatgr  rgw_frontends 
"beast ssl_port=7480 ssl_certificate=/etc/ssl/certs/server.crt 
ssl_private_key=/etc/ssl/private/server.key"

Then allows the RGW to start:

debug 2025-01-16T15:28:17.391+ 7fba6e85a840  0 deferred set uid:gid to 
167:167 (ceph:ceph)
debug 2025-01-16T15:28:17.391+ 7fba6e85a840  0 ceph version 19.2.0 
(16063ff2022298c9300e49a547a16ffda59baf13) squid (stable), process radosgw, pid 
7
debug 2025-01-16T15:28:17.391+ 7fba6e85a840  0 framework: beast
debug 2025-01-16T15:28:17.391+ 7fba6e85a840  0 framework conf key: 
ssl_port, val: 7480
debug 2025-01-16T15:28:17.391+ 7fba6e85a840  0 framework conf key: 
ssl_certificate, val: /etc/ssl/certs/server.crt
debug 2025-01-16T15:28:17.391+ 7fba6e85a840  0 framework conf key: 
ssl_private_key, val: /etc/ssl/private/server.key
debug 2025-01-16T15:28:17.767+ 7fba6e85a840 -1 LDAP not started since no 
server URIs were provided in the configuration.
debug 2025-01-16T15:28:17.781+ 7fba6e85a840  0 framework: beast
debug 2025-01-16T15:28:17.781+ 7fba6e85a840  0 framework conf key: 
ssl_certificate, val: config://rgw/cert/$realm/$zone.crt
debug 2025-01-16T15:28:17.781+ 7fba6e85a840  0 framework conf key: 
ssl_private_key, val: config://rgw/cert/$realm/$zone.key
debug 2025-01-16T15:28:17.860+ 7fba6e85a840  0 starting handler: beast
debug 2025-01-16T15:28:17.908+ 7fba6e85a840  0 set uid:gid to 167:167 
(ceph:ceph)
debug 2025-01-16T15:28:17.912+ 7fba6e85a840  1 mgrc service_daemon_register 
rgw.643295 metadata {arch=x86_64,ceph_release=squid,ceph_version=ceph version 
19.2.0 (16063ff2022298c9300e49a547a16ffda59baf13) squid 
(stable),ceph_version_short=19.2.0,container_hostname=raynor-sc-1,container_image=ceph/squid:v19.2.0,cpu=Intel(R)
 Xeon(R) CPU E5-2683 v3 @ 2.00GHz,distro=centos,distro_description=CentOS 
Stream 9,distro_version=9,frontend_config#0=beast ssl_port=7480 
ssl_certificate=/etc/ssl/certs/server.crt 
ssl_private_key=/etc/ssl/private/server.key,frontend_type#0=beast,hostname=raynor-sc-1,id=25069123.raynor-sc-1.qiatgr,kernel_description=#1
 SMP Thu Dec 19 20:58:14 EST 
2024,kernel_version=5.4.288-1.el8.elrepo.x86_64,mem_swap_kb=0,mem_total_kb=12251748,num_handles=1,os=Linux,pid=7,realm_id=c99973d7-27cb-4abd-8

[ceph-users] Re: More objects misplaced than exist?

2025-01-16 Thread Andre Tann

Am 16.01.25 um 16:53 schrieb Andre Tann:


    --- POOLS ---
    POOL ID   PGS   STORED OBJECTS USED %USED  MAX AVAIL
    .mgr  1 1  7.6 MiB 3   15 MiB  100.00    0 B
    ReplicationPool   2  1024  8.0 TiB 2.11M   24 TiB  100.00    0 B
    cephfs_data   7  1024  3.9 TiB 1.43M   12 TiB  100.00    0 B
    cephfs_metadata   8    32  268 MiB 143  804 MiB  100.00    0 B


For the records:

When moving OSDs in the crush location, I forgot to set root=default. 
But the crush rule wanted to take the default bucket in the first step, 
so it could not find OSDs anymore.



--
Andre Tann
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: squid 19.2.1 RC QE validation status

2025-01-16 Thread Guillaume ABRIOUX
Hello,

The rook ci must be failing because a `ceph-bluestore-tool` backport [1] is 
missing.
This backport was merged ~6 hours ago.

[1] https://github.com/ceph/ceph/pull/60543

Regards,

--
Guillaume Abrioux
Software Engineer

De : Travis Nielsen 
Envoyé : mercredi 15 janvier 2025 20:32
À : Laura Flores 
Cc : Yuri Weinstein ; dev ; ceph-users 

Objet : [EXTERNAL] [ceph-users] Re: squid 19.2.1 RC QE validation status

When running the Rook CI against the latest squid devel image, we are
seeing issues creating OSDs, investigating with Guillaume...
https://github.com/rook/rook/issues/15282 

Travis

On Wed, Jan 15, 2025 at 7:57 AM Laura Flores  wrote:

> The Gibba cluster has been upgraded.
>
> On Wed, Jan 15, 2025 at 7:27 AM Christian Rohmann <
> christian.rohm...@inovex.de> wrote:
>
> > Hey Adam,
> >
> > On 11.01.25 12:52 AM, Adam Emerson wrote:
> > > On 10/01/2025, Yuri Weinstein wrote:
> > >> This PR https://github.com/ceph/ceph/pull/61306   was cherry-picked
> > >> Adam, pls see the run for the Build 4
> > >>
> > >> Laura, Adam approves rgw, we are ready for gibba and LRC/sepia
> upgrades.
> > >
> > > I hereby approve the RGW run. Thanks and sorry for the last minute fix.
> >
> >
> > Is the  broken rgw_s3_auth_order https://tracker.ceph.com/issues/68393 
> > not relevant enough for the release then?
> > There is a PR open https://github.com/ceph/ceph/pull/61162 
> >
> > Also there are some desperate comments about this breaking / hindering
> > multi-site sync (https://tracker.ceph.com/issues/69183#note-4  )
> > which I totally agree with.
> >
> >
> >
> > Regards
> >
> >
> > Christian
> >
> >
> >
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
> >
>
>
> --
>
> Laura Flores
>
> She/Her/Hers
>
> Software Engineer, Ceph Storage 
>
> Chicago, IL
>
> lflo...@ibm.com | lflo...@redhat.com 
> M: +17087388804
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Unless otherwise stated above:

Compagnie IBM France
Siège Social : 17, avenue de l'Europe, 92275 Bois-Colombes Cedex
RCS Nanterre 552 118 465
Forme Sociale : S.A.S.
Capital Social : 664 614 175,50 €
SIRET : 552 118 465 03644 - Code NAF 6203Z
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: squid 19.2.1 RC QE validation status

2025-01-16 Thread Yuri Weinstein
Does this have to be cherry-picked to 19.2.1?

What tests are to be rerun if yes?

On Thu, Jan 16, 2025 at 11:21 AM Guillaume ABRIOUX  wrote:
>
> Hello,
>
> The rook ci must be failing because a `ceph-bluestore-tool` backport [1] is 
> missing.
> This backport was merged ~6 hours ago.
>
> [1] https://github.com/ceph/ceph/pull/60543
>
> Regards,
>
> --
> Guillaume Abrioux
> Software Engineer
> 
> De : Travis Nielsen 
> Envoyé : mercredi 15 janvier 2025 20:32
> À : Laura Flores 
> Cc : Yuri Weinstein ; dev ; ceph-users 
> 
> Objet : [EXTERNAL] [ceph-users] Re: squid 19.2.1 RC QE validation status
>
> When running the Rook CI against the latest squid devel image, we are
> seeing issues creating OSDs, investigating with Guillaume...
> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_rook_rook_issues_15282&d=DwIGaQ&c=BSDicqBQBDjDI9RkVyTcHQ&r=C4Heawwemsb-V70dqf3IAEkIXkptzf45zNk5eDDqzFQ&m=aJ-H045Z8CAoePgK7a9DrtfUE-AcAMsVn59AhzFq_wyqe4vDarnmctb5hosU5pkJ&s=3yKlL-fIl7-Q0HRnATPF5lC3gF6Fnz7Va_JNJMoK22g&e=
>
> Travis
>
> On Wed, Jan 15, 2025 at 7:57 AM Laura Flores  wrote:
>
> > The Gibba cluster has been upgraded.
> >
> > On Wed, Jan 15, 2025 at 7:27 AM Christian Rohmann <
> > christian.rohm...@inovex.de> wrote:
> >
> > > Hey Adam,
> > >
> > > On 11.01.25 12:52 AM, Adam Emerson wrote:
> > > > On 10/01/2025, Yuri Weinstein wrote:
> > > >> This PR 
> > > >> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_ceph_ceph_pull_61306&d=DwIGaQ&c=BSDicqBQBDjDI9RkVyTcHQ&r=C4Heawwemsb-V70dqf3IAEkIXkptzf45zNk5eDDqzFQ&m=aJ-H045Z8CAoePgK7a9DrtfUE-AcAMsVn59AhzFq_wyqe4vDarnmctb5hosU5pkJ&s=AFPJEqe9WdKCH6e1-lQu7E9IQSmRerXeDEKed9RCL_4&e=
> > > >>   was cherry-picked
> > > >> Adam, pls see the run for the Build 4
> > > >>
> > > >> Laura, Adam approves rgw, we are ready for gibba and LRC/sepia
> > upgrades.
> > > >
> > > > I hereby approve the RGW run. Thanks and sorry for the last minute fix.
> > >
> > >
> > > Is the  broken rgw_s3_auth_order 
> > > https://urldefense.proofpoint.com/v2/url?u=https-3A__tracker.ceph.com_issues_68393&d=DwIGaQ&c=BSDicqBQBDjDI9RkVyTcHQ&r=C4Heawwemsb-V70dqf3IAEkIXkptzf45zNk5eDDqzFQ&m=aJ-H045Z8CAoePgK7a9DrtfUE-AcAMsVn59AhzFq_wyqe4vDarnmctb5hosU5pkJ&s=g0TCbLbCXpdqGmv5FG2ZQXn406ROsJCIzi7r2XT_9P4&e=
> > > not relevant enough for the release then?
> > > There is a PR open 
> > > https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_ceph_ceph_pull_61162&d=DwIGaQ&c=BSDicqBQBDjDI9RkVyTcHQ&r=C4Heawwemsb-V70dqf3IAEkIXkptzf45zNk5eDDqzFQ&m=aJ-H045Z8CAoePgK7a9DrtfUE-AcAMsVn59AhzFq_wyqe4vDarnmctb5hosU5pkJ&s=iGdwR1eo5Tko2r9k5eB3y3YT251wCHI2AiMGzsdMFOM&e=
> > >
> > > Also there are some desperate comments about this breaking / hindering
> > > multi-site sync 
> > > (https://urldefense.proofpoint.com/v2/url?u=https-3A__tracker.ceph.com_issues_69183-23note-2D4&d=DwIGaQ&c=BSDicqBQBDjDI9RkVyTcHQ&r=C4Heawwemsb-V70dqf3IAEkIXkptzf45zNk5eDDqzFQ&m=aJ-H045Z8CAoePgK7a9DrtfUE-AcAMsVn59AhzFq_wyqe4vDarnmctb5hosU5pkJ&s=Xtf9Ig7h_WDIj7F6C-UIiHhd8Vj_xprX38C3DHVAl0M&e=
> > >  )
> > > which I totally agree with.
> > >
> > >
> > >
> > > Regards
> > >
> > >
> > > Christian
> > >
> > >
> > >
> > > ___
> > > ceph-users mailing list -- ceph-users@ceph.io
> > > To unsubscribe send an email to ceph-users-le...@ceph.io
> > >
> >
> >
> > --
> >
> > Laura Flores
> >
> > She/Her/Hers
> >
> > Software Engineer, Ceph Storage 
> >  >  >
> >
> > Chicago, IL
> >
> > lflo...@ibm.com | lflo...@redhat.com 
> > M: +17087388804
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
> >
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
> Unless otherwise stated above:
>
> Compagnie IBM France
> Siège Social : 17, avenue de l'Europe, 92275 Bois-Colombes Cedex
> RCS Nanterre 552 118 465
> Forme Sociale : S.A.S.
> Capital Social : 664 614 175,50 €
> SIRET : 552 118 465 03644 - Code NAF 6203Z
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: MDS hung in purge_stale_snap_data after populating cache

2025-01-16 Thread Bailey Allison

Frank,

Are you able to share an update to date ceph config dump and ceph daemon 
mds.X perf dump | grep strays from the cluster?


We're just getting through our comically long ceph outage, so i'd like 
to be able to share the love here hahahaha


Regards,

Bailey Allison
Service Team Lead
45Drives, Ltd.
866-594-7199 x868

On 1/16/25 13:27, Frank Schilder wrote:

I think I finally found the moment where everything goes downhill. Please take 
a look at this comment: 
https://tracker.ceph.com/issues/69547?next_issue_id=69546#note-4 . This looks a 
lot like a timeout, but I have no clue what to look for. Any hint is greatly 
appreciated.

Thanks and best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: squid 19.2.1 RC QE validation status

2025-01-16 Thread Travis Nielsen
I confirmed that the Rook CI is now passing with the latest squid devel
image that was pushed a couple hours ago, including this fix.
It is a blocker for properly starting OSDs, at least for Rook. Guillaume
was also able to repro the related issue outside Rook.
So yes, please it needs to be included in 19.2.1.

Thanks,
Travis

On Thu, Jan 16, 2025 at 1:29 PM Yuri Weinstein  wrote:

> Does this have to be cherry-picked to 19.2.1?
>
> What tests are to be rerun if yes?
>
> On Thu, Jan 16, 2025 at 11:21 AM Guillaume ABRIOUX 
> wrote:
> >
> > Hello,
> >
> > The rook ci must be failing because a `ceph-bluestore-tool` backport [1]
> is missing.
> > This backport was merged ~6 hours ago.
> >
> > [1] https://github.com/ceph/ceph/pull/60543
> >
> > Regards,
> >
> > --
> > Guillaume Abrioux
> > Software Engineer
> > 
> > De : Travis Nielsen 
> > Envoyé : mercredi 15 janvier 2025 20:32
> > À : Laura Flores 
> > Cc : Yuri Weinstein ; dev ;
> ceph-users 
> > Objet : [EXTERNAL] [ceph-users] Re: squid 19.2.1 RC QE validation status
> >
> > When running the Rook CI against the latest squid devel image, we are
> > seeing issues creating OSDs, investigating with Guillaume...
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_rook_rook_issues_15282&d=DwIGaQ&c=BSDicqBQBDjDI9RkVyTcHQ&r=C4Heawwemsb-V70dqf3IAEkIXkptzf45zNk5eDDqzFQ&m=aJ-H045Z8CAoePgK7a9DrtfUE-AcAMsVn59AhzFq_wyqe4vDarnmctb5hosU5pkJ&s=3yKlL-fIl7-Q0HRnATPF5lC3gF6Fnz7Va_JNJMoK22g&e=
> >
> > Travis
> >
> > On Wed, Jan 15, 2025 at 7:57 AM Laura Flores  wrote:
> >
> > > The Gibba cluster has been upgraded.
> > >
> > > On Wed, Jan 15, 2025 at 7:27 AM Christian Rohmann <
> > > christian.rohm...@inovex.de> wrote:
> > >
> > > > Hey Adam,
> > > >
> > > > On 11.01.25 12:52 AM, Adam Emerson wrote:
> > > > > On 10/01/2025, Yuri Weinstein wrote:
> > > > >> This PR
> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_ceph_ceph_pull_61306&d=DwIGaQ&c=BSDicqBQBDjDI9RkVyTcHQ&r=C4Heawwemsb-V70dqf3IAEkIXkptzf45zNk5eDDqzFQ&m=aJ-H045Z8CAoePgK7a9DrtfUE-AcAMsVn59AhzFq_wyqe4vDarnmctb5hosU5pkJ&s=AFPJEqe9WdKCH6e1-lQu7E9IQSmRerXeDEKed9RCL_4&e=
> was cherry-picked
> > > > >> Adam, pls see the run for the Build 4
> > > > >>
> > > > >> Laura, Adam approves rgw, we are ready for gibba and LRC/sepia
> > > upgrades.
> > > > >
> > > > > I hereby approve the RGW run. Thanks and sorry for the last minute
> fix.
> > > >
> > > >
> > > > Is the  broken rgw_s3_auth_order
> https://urldefense.proofpoint.com/v2/url?u=https-3A__tracker.ceph.com_issues_68393&d=DwIGaQ&c=BSDicqBQBDjDI9RkVyTcHQ&r=C4Heawwemsb-V70dqf3IAEkIXkptzf45zNk5eDDqzFQ&m=aJ-H045Z8CAoePgK7a9DrtfUE-AcAMsVn59AhzFq_wyqe4vDarnmctb5hosU5pkJ&s=g0TCbLbCXpdqGmv5FG2ZQXn406ROsJCIzi7r2XT_9P4&e=
> > > > not relevant enough for the release then?
> > > > There is a PR open
> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_ceph_ceph_pull_61162&d=DwIGaQ&c=BSDicqBQBDjDI9RkVyTcHQ&r=C4Heawwemsb-V70dqf3IAEkIXkptzf45zNk5eDDqzFQ&m=aJ-H045Z8CAoePgK7a9DrtfUE-AcAMsVn59AhzFq_wyqe4vDarnmctb5hosU5pkJ&s=iGdwR1eo5Tko2r9k5eB3y3YT251wCHI2AiMGzsdMFOM&e=
> > > >
> > > > Also there are some desperate comments about this breaking /
> hindering
> > > > multi-site sync (
> https://urldefense.proofpoint.com/v2/url?u=https-3A__tracker.ceph.com_issues_69183-23note-2D4&d=DwIGaQ&c=BSDicqBQBDjDI9RkVyTcHQ&r=C4Heawwemsb-V70dqf3IAEkIXkptzf45zNk5eDDqzFQ&m=aJ-H045Z8CAoePgK7a9DrtfUE-AcAMsVn59AhzFq_wyqe4vDarnmctb5hosU5pkJ&s=Xtf9Ig7h_WDIj7F6C-UIiHhd8Vj_xprX38C3DHVAl0M&e=
> )
> > > > which I totally agree with.
> > > >
> > > >
> > > >
> > > > Regards
> > > >
> > > >
> > > > Christian
> > > >
> > > >
> > > >
> > > > ___
> > > > ceph-users mailing list -- ceph-users@ceph.io
> > > > To unsubscribe send an email to ceph-users-le...@ceph.io
> > > >
> > >
> > >
> > > --
> > >
> > > Laura Flores
> > >
> > > She/Her/Hers
> > >
> > > Software Engineer, Ceph Storage <
> https://urldefense.proofpoint.com/v2/url?u=https-3A__ceph.io&d=DwIGaQ&c=BSDicqBQBDjDI9RkVyTcHQ&r=C4Heawwemsb-V70dqf3IAEkIXkptzf45zNk5eDDqzFQ&m=aJ-H045Z8CAoePgK7a9DrtfUE-AcAMsVn59AhzFq_wyqe4vDarnmctb5hosU5pkJ&s=e_CVR7WGo4-QP_VswyglgXv4hiKXEJkoP_N_rOYijYg&e=
> >
> > >
> > > Chicago, IL
> > >
> > > lflo...@ibm.com | lflo...@redhat.com 
> > > M: +17087388804
> > > ___
> > > ceph-users mailing list -- ceph-users@ceph.io
> > > To unsubscribe send an email to ceph-users-le...@ceph.io
> > >
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
> > Unless otherwise stated above:
> >
> > Compagnie IBM France
> > Siège Social : 17, avenue de l'Europe, 92275 Bois-Colombes Cedex
> > RCS Nanterre 552 118 465
> > Forme Sociale : S.A.S.
> > Capital Social : 664 614 175,50 €
> > SIRET : 552 118 465 03644 - Code NAF 6203Z
>
>
___
ceph-users m

[ceph-users] Re: squid 19.2.1 RC QE validation status

2025-01-16 Thread Travis Nielsen
Although, I am not clear the difference between the squid branch since we
started seeing this issue last week, and the 19.2.1 branch, so Guillaume or
RADOS team should confirm for sure.

On Thu, Jan 16, 2025 at 1:38 PM Travis Nielsen  wrote:

> I confirmed that the Rook CI is now passing with the latest squid devel
> image that was pushed a couple hours ago, including this fix.
> It is a blocker for properly starting OSDs, at least for Rook. Guillaume
> was also able to repro the related issue outside Rook.
> So yes, please it needs to be included in 19.2.1.
>
> Thanks,
> Travis
>
> On Thu, Jan 16, 2025 at 1:29 PM Yuri Weinstein 
> wrote:
>
>> Does this have to be cherry-picked to 19.2.1?
>>
>> What tests are to be rerun if yes?
>>
>> On Thu, Jan 16, 2025 at 11:21 AM Guillaume ABRIOUX 
>> wrote:
>> >
>> > Hello,
>> >
>> > The rook ci must be failing because a `ceph-bluestore-tool` backport
>> [1] is missing.
>> > This backport was merged ~6 hours ago.
>> >
>> > [1] https://github.com/ceph/ceph/pull/60543
>> >
>> > Regards,
>> >
>> > --
>> > Guillaume Abrioux
>> > Software Engineer
>> > 
>> > De : Travis Nielsen 
>> > Envoyé : mercredi 15 janvier 2025 20:32
>> > À : Laura Flores 
>> > Cc : Yuri Weinstein ; dev ;
>> ceph-users 
>> > Objet : [EXTERNAL] [ceph-users] Re: squid 19.2.1 RC QE validation status
>> >
>> > When running the Rook CI against the latest squid devel image, we are
>> > seeing issues creating OSDs, investigating with Guillaume...
>> >
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_rook_rook_issues_15282&d=DwIGaQ&c=BSDicqBQBDjDI9RkVyTcHQ&r=C4Heawwemsb-V70dqf3IAEkIXkptzf45zNk5eDDqzFQ&m=aJ-H045Z8CAoePgK7a9DrtfUE-AcAMsVn59AhzFq_wyqe4vDarnmctb5hosU5pkJ&s=3yKlL-fIl7-Q0HRnATPF5lC3gF6Fnz7Va_JNJMoK22g&e=
>> >
>> > Travis
>> >
>> > On Wed, Jan 15, 2025 at 7:57 AM Laura Flores 
>> wrote:
>> >
>> > > The Gibba cluster has been upgraded.
>> > >
>> > > On Wed, Jan 15, 2025 at 7:27 AM Christian Rohmann <
>> > > christian.rohm...@inovex.de> wrote:
>> > >
>> > > > Hey Adam,
>> > > >
>> > > > On 11.01.25 12:52 AM, Adam Emerson wrote:
>> > > > > On 10/01/2025, Yuri Weinstein wrote:
>> > > > >> This PR
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_ceph_ceph_pull_61306&d=DwIGaQ&c=BSDicqBQBDjDI9RkVyTcHQ&r=C4Heawwemsb-V70dqf3IAEkIXkptzf45zNk5eDDqzFQ&m=aJ-H045Z8CAoePgK7a9DrtfUE-AcAMsVn59AhzFq_wyqe4vDarnmctb5hosU5pkJ&s=AFPJEqe9WdKCH6e1-lQu7E9IQSmRerXeDEKed9RCL_4&e=
>> was cherry-picked
>> > > > >> Adam, pls see the run for the Build 4
>> > > > >>
>> > > > >> Laura, Adam approves rgw, we are ready for gibba and LRC/sepia
>> > > upgrades.
>> > > > >
>> > > > > I hereby approve the RGW run. Thanks and sorry for the last
>> minute fix.
>> > > >
>> > > >
>> > > > Is the  broken rgw_s3_auth_order
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__tracker.ceph.com_issues_68393&d=DwIGaQ&c=BSDicqBQBDjDI9RkVyTcHQ&r=C4Heawwemsb-V70dqf3IAEkIXkptzf45zNk5eDDqzFQ&m=aJ-H045Z8CAoePgK7a9DrtfUE-AcAMsVn59AhzFq_wyqe4vDarnmctb5hosU5pkJ&s=g0TCbLbCXpdqGmv5FG2ZQXn406ROsJCIzi7r2XT_9P4&e=
>> > > > not relevant enough for the release then?
>> > > > There is a PR open
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_ceph_ceph_pull_61162&d=DwIGaQ&c=BSDicqBQBDjDI9RkVyTcHQ&r=C4Heawwemsb-V70dqf3IAEkIXkptzf45zNk5eDDqzFQ&m=aJ-H045Z8CAoePgK7a9DrtfUE-AcAMsVn59AhzFq_wyqe4vDarnmctb5hosU5pkJ&s=iGdwR1eo5Tko2r9k5eB3y3YT251wCHI2AiMGzsdMFOM&e=
>> > > >
>> > > > Also there are some desperate comments about this breaking /
>> hindering
>> > > > multi-site sync (
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__tracker.ceph.com_issues_69183-23note-2D4&d=DwIGaQ&c=BSDicqBQBDjDI9RkVyTcHQ&r=C4Heawwemsb-V70dqf3IAEkIXkptzf45zNk5eDDqzFQ&m=aJ-H045Z8CAoePgK7a9DrtfUE-AcAMsVn59AhzFq_wyqe4vDarnmctb5hosU5pkJ&s=Xtf9Ig7h_WDIj7F6C-UIiHhd8Vj_xprX38C3DHVAl0M&e=
>> )
>> > > > which I totally agree with.
>> > > >
>> > > >
>> > > >
>> > > > Regards
>> > > >
>> > > >
>> > > > Christian
>> > > >
>> > > >
>> > > >
>> > > > ___
>> > > > ceph-users mailing list -- ceph-users@ceph.io
>> > > > To unsubscribe send an email to ceph-users-le...@ceph.io
>> > > >
>> > >
>> > >
>> > > --
>> > >
>> > > Laura Flores
>> > >
>> > > She/Her/Hers
>> > >
>> > > Software Engineer, Ceph Storage <
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__ceph.io&d=DwIGaQ&c=BSDicqBQBDjDI9RkVyTcHQ&r=C4Heawwemsb-V70dqf3IAEkIXkptzf45zNk5eDDqzFQ&m=aJ-H045Z8CAoePgK7a9DrtfUE-AcAMsVn59AhzFq_wyqe4vDarnmctb5hosU5pkJ&s=e_CVR7WGo4-QP_VswyglgXv4hiKXEJkoP_N_rOYijYg&e=
>> >
>> > >
>> > > Chicago, IL
>> > >
>> > > lflo...@ibm.com | lflo...@redhat.com 
>> > > M: +17087388804
>> > > ___
>> > > ceph-users mailing list -- ceph-users@ceph.io
>> > > To unsubscribe send an email to ceph-users-le...@ceph.io
>> > >
>> > ___
>> > ceph-users mailing list -- ceph-users@ceph.io
>> > To unsubscribe send a

[ceph-users] Re: [EXTERNAL] Re: Cephadm: Specifying RGW Certs & Keys By Filepath

2025-01-16 Thread Redouane Kachach
That's strange... in the code I can see that when rgw_frontend_port it's
used so I can't see why you get port=80 ... can you plz post your spec?



On Thu, Jan 16, 2025 at 7:09 PM Alex Hussein-Kershaw (HE/HIM) <
alex...@microsoft.com> wrote:

> Oh actually I have spoke to soon. That does work, but it also exposes port
> HTTP over port 80. 🙁
>
> beast port=80 ssl_port=7480 ssl_certificate=/etc/ssl/certs/server.crt
> ssl_private_key=/etc/ssl/private/server.key
>
> --
> *From:* Alex Hussein-Kershaw (HE/HIM)
> *Sent:* Thursday, January 16, 2025 5:59 PM
> *To:* Redouane Kachach
> *Cc:* ceph-users
> *Subject:* Re: [EXTERNAL] Re: [ceph-users] Cephadm: Specifying RGW Certs
> & Keys By Filepath
>
> Amazing. How did I miss that.
>
> Dropping "ssl: true" and adding "ssl_port=1234" to the
> rgw_frontend_extra_args values has me sorted.
>
> Many thanks!
>
>
> --
> *From:* Redouane Kachach
> *Sent:* Thursday, January 16, 2025 4:39 PM
> *To:* Alex Hussein-Kershaw (HE/HIM)
> *Cc:* ceph-users
> *Subject:* [EXTERNAL] Re: [ceph-users] Cephadm: Specifying RGW Certs &
> Keys By Filepath
>
> You are getting the double option because "ssl: true" ... try to disable
> ssl since you are passing the arguments and certificates by hand!
>
> Another option is to have cephadm generate the certificates for you by
> setting the `generate_cert` field in the spec to true. But I'm not sure if
> that works for your environment or not ...
>
> Best,
> Redo.
>
> On Thu, Jan 16, 2025 at 4:37 PM Alex Hussein-Kershaw (HE/HIM) <
> alex...@microsoft.com> wrote:
>
> Hi Folks,
>
> Looking for some advice on RGW service specs and Cephadm. I've read the
> docs here:
> RGW Service — Ceph Documentation<
> https://docs.ceph.com/en/reef/cephadm/services/rgw/>
>
> Using a service spec I can deploy a RGW:
>
> service_type: rgw
> service_id: '25069123'
> service_name: rgw.25069123
> placement:
>   hosts:
>   - raynor-sc-1
> extra_container_args:
> - -v
> - /etc/pki:/etc/pki:ro
> - -v
> - /etc/ssl/:/etc/ssl:ro
> spec:
>   rgw_frontend_port: 7480
>   rgw_realm: geored_realm
>   rgw_zone: siteA
>   rgw_zonegroup: geored_zg
>   ssl: true
>   rgw_frontend_extra_args:
>   - "ssl_certificate=/etc/ssl/certs/server.crt"
>   - "ssl_private_key=/etc/ssl/private/server.key"
>
> But it fails to start because my "rgw_frontends" config ends up as this.
> Notice the two entries of "ssl_certificate".
>
> beast ssl_port=7480 ssl_certificate=config://rgw/cert/rgw.25069123
> ssl_certificate=/etc/ssl/certs/server.crt
> ssl_private_key=/etc/ssl/private/server.key
>
> That causes it to fail to start:
>
> debug 2025-01-16T15:26:54.219+ 7f65bcfdc840  0 deferred set uid:gid to
> 167:167 (ceph:ceph)
> debug 2025-01-16T15:26:54.219+ 7f65bcfdc840  0 ceph version 19.2.0
> (16063ff2022298c9300e49a547a16ffda59baf13) squid (stable), process radosgw,
> pid 7
> debug 2025-01-16T15:26:54.219+ 7f65bcfdc840  0 framework: beast
> debug 2025-01-16T15:26:54.219+ 7f65bcfdc840  0 framework conf key:
> ssl_port, val: 7480
> debug 2025-01-16T15:26:54.219+ 7f65bcfdc840  0 framework conf key:
> ssl_certificate, val: config://rgw/cert/rgw.25069123
> debug 2025-01-16T15:26:54.219+ 7f65bcfdc840  0 framework conf key:
> ssl_certificate, val: /etc/ssl/certs/server.crt
> debug 2025-01-16T15:26:54.219+ 7f65bcfdc840  0 framework conf key:
> ssl_private_key, val: /etc/ssl/private/server.key
> debug 2025-01-16T15:26:54.520+ 7f65bcfdc840 -1 LDAP not started since
> no server URIs were provided in the configuration.
> debug 2025-01-16T15:26:54.531+ 7f65bcfdc840  0 framework: beast
> debug 2025-01-16T15:26:54.531+ 7f65bcfdc840  0 framework conf key:
> ssl_certificate, val: config://rgw/cert/$realm/$zone.crt
> debug 2025-01-16T15:26:54.531+ 7f65bcfdc840  0 framework conf key:
> ssl_private_key, val: config://rgw/cert/$realm/$zone.key
> debug 2025-01-16T15:26:54.597+ 7f65bcfdc840  0 starting handler: beast
> debug 2025-01-16T15:26:54.601+ 7f65bcfdc840 -1 ssl_certificate was not
> found: rgw/cert/rgw.25069123
> debug 2025-01-16T15:26:54.602+ 7f65bcfdc840 -1 no ssl_certificate
> configured for ssl_port
> debug 2025-01-16T15:26:54.602+ 7f65bcfdc840 -1 ERROR: failed
> initializing frontend
> debug 2025-01-16T15:26:54.602+ 7f65bcfdc840 -1 ERROR:  initialize
> frontend fail, r = 22
>
> Correcting that config manually:
>
> $ ceph config set client.rgw.25069123.raynor-sc-1.qiatgr  rgw_frontends
> "beast ssl_port=7480 ssl_certificate=/etc/ssl/certs/server.crt
> ssl_private_key=/etc/ssl/private/server.key"
>
> Then allows the RGW to start:
>
> debug 2025-01-16T15:28:17.391+ 7fba6e85a840  0 deferred set uid:gid to
> 167:167 (ceph:ceph)
> debug 2025-01-16T15:28:17.391+ 7fba6e85a840  0 ceph version 19.2.0
> (16063ff2022298c9300e49a547a16ffda59baf13) squid (stable), process radosgw,
> pid 7
> debug 2025-01-16T15:28:17.391+ 7fba6e85a840  0 framework: beast
> debug 2025-01-16T15:28:17.391+ 7fba6e85a840