What I still don't understand: If the request to get the ec2 credentials from keystone ran into a 429, why did it work the first time? How does the keystone authentication work for radosgw under the hood? Is there a documentation I can read up on?
I would have thought it was something like: 1. Authenticate with the keystone with a special set of credentials (rgw_keystone_admin_user) 2. Fetch the EC2 credentials for the provided access key 3. Save those credentials for the time that keystone told with the token lifetime 4. Do the normal s3 authentication with the cached credentials I would have thought that these tokens live in the memory of the radosgw daemon and every rgw daemon keeps track on it's own. But as I am writing this down I think "what happens to the cached credentials if the ec2 credentials will be invalidated by the end user?" btw: This is what our keystone team told me: >Tobias is top notch, he knows both keystone and ceph, he has some comments on the recent keystone cve we patched: launchpad.net/bugs/2119646 >What he says makes sense. We already have the new policy that comes with the OSSA patch on /v3/s3tokens and you have it. >If you want you can also tell him that we fixed the issue not by granting admin but by granting /v3/users/<user>/OS-EC2/<credential> via a custom role policy. Am Do., 6. Nov. 2025 um 15:45 Uhr schrieb Boris <[email protected]>: > Hi Tobias, > > thanks a lot for the in depth explanation. The keystone team fixed > something yesterday regarding the mentioned bug and now the 429s are gone. > We still have no clue why rgw worked at all and I sill try to understand > it. > > Do you attend the ceph days in Berlin next week? > > > > Am Do., 6. Nov. 2025 um 08:26 Uhr schrieb Tobias Urdin - Binero < > [email protected]>: > >> Hello Boris, >> >> Then that is probably your issue, ask the team maintaining OpenStack >> Keystone to check the logs for requests >> to that API endpoint that is failing with 403 on every request from >> RadosGW, similiar to this: >> >> "GET /v3/users/<user>/credentials/OS-EC2/<credential> HTTP/1.1" 403 >> 140 "-" "-" >> >> What this means is that your authentication will work but because RadosGW >> cannot retrieve the EC2 credential >> secret and will not populate the cache and you will do authentication >> against Keystone on each request. >> >> --- >> >> Let me try to clear things up a bit, hopefully. RadosGW needs to perform >> these API requests against Keystone: >> >> /v3/auth/tokens – No policy enforce on who can talk to this API >> (rgw_keystone_admin_user does not need any >> special role). Patches and backports [1] has been done to simply drop the >> admin token usage in this API request. >> >> /v3/s3tokens – No policy until this week due to OSSA-2025-002 [2] >> security issue, this endpoint will now be enforced >> in future release (including for stable releases!) to require admin or >> service role for the rgw_keystone_admin_user in >> Keystone [3]. >> >> /v3/users/<user>/OS-EC2/<credential> – Policy enforcement to retrieve >> _other_ peoples EC2 credentials says >> this must have the admin role (see identity:ec2_get_credential in [4]). >> I’m working on a proposal in Keystone [5] to make >> the policy allow both admin and service roles. My proposal in [5] also >> includes the same changes for identity:get_credential >> due to a pending PR [6] that might change this API request. >> >> If my proposal [5] is merged this would allow us to remove the admin role >> from the configured rgw_keystone_admin_user >> user and only use the `service` role, the service role also has some >> elevated permissions and can do some damage but >> it’s atleast not a complete admin on the entire cloud. >> >> Hope this helps. >> >> /Tobias >> >> [1] https://github.com/ceph/ceph/pull/60515 >> [2] https://security.openstack.org/ossa/OSSA-2025-002.html >> [3] https://review.opendev.org/c/openstack/keystone/+/966069 >> [4] https://docs.openstack.org/keystone/latest/configuration/policy.html >> [5] https://review.opendev.org/c/openstack/keystone/+/966189 >> [6] https://github.com/ceph/ceph/pull/63283 >> >> On 5 Nov 2025, at 17:31, Boris <[email protected]> wrote: >> >> Hi Tobias, >> >> I just pumped up the rgw_debug to 20 and generated new output: >> https://pastebin.com/PcSUSWGY >> I hope that I redacted all the sensitive data. :) >> >> 3 Requests to list all my buckets in <10 seconds. >> The 1st request showd me my buckets, then 2nd requests resulted in a 500 >> error and thew 3rd showed me my buckets again. >> >> For me this currently looks like I get a "429 Too Many Requests" from the >> keystone on all the three requests that I made and I would have expected to >> see this error only on the 2nd requests. >> Weird is also line 104-109. I have no idea how the content of the >> /etc/hosts file made it into the log. >> >> The keystone user that we have in the "rgw_keystone_admin_user" is not a >> keystone admin. The people that maintain the keystone just told me "The >> user doesn't have admin and we would not grant it." >> The "rgw_s3_auth_order" is default. We didn't touch it. "sts, external, >> local" >> >> >> Am Mi., 5. Nov. 2025 um 16:32 Uhr schrieb Tobias Urdin - Binero < >> [email protected]>: >> >>> Hello Boris, >>> >>> What roles is assigned to the Keystone user configured in >>> rgw_keystone_admin_user? It needs the >>> admin role in order to be allowed the >>> /v3/users/<user_id>/credentials/OS-EC2/<access_key> API request. >>> >>> openstack role assignment list —names —user <rgw_keystone_admin_user >>> value> >>> >>> A part from that I don’t understand the “2nd request failed” part as >>> that seems to be from the LocalEngine >>> and is not related to Keystone, if you have the default value >>> for rgw_s3_auth_order the only thing I can >>> think off is that there is a bug or you’re missing some patch like [1] >>> [2] but that’s just a guess. >>> >>> /Tobias >>> >>> [1] https://github.com/ceph/ceph/pull/53846 >>> [2] https://github.com/ceph/ceph/pull/53680 >>> >>> >>> On 4 Nov 2025, at 11:32, Boris <[email protected]> wrote: >>> >>> I've created an upstream ticket >>> https://eur05.safelinks.protection.outlook.com/?url=https%3A%2F%2Ftracker.ceph.com%2Fissues%2F73709&data=05%7C02%7Ctobias.urdin%40binero.com%7C17fa249f3ee94151d37a08de1b8d8e9f%7C89d97f28180f459da0e585855aa63f6c%7C0%7C0%7C638978491958438817%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=MVedgbK0xyCFJY%2FuA%2FskKoY1VwBv6ikMrfVCjT9f%2Bro%3D&reserved=0 >>> <https://tracker.ceph.com/issues/73709> >>> >>> Am Mo., 3. Nov. 2025 um 17:13 Uhr schrieb Boris <[email protected]>: >>> >>> yes, via ceph orch. >>> >>> --- >>> service_type: rgw >>> service_id: eu-central-lz >>> service_name: rgw.eu-central-lz >>> placement: >>> count_per_host: 1 >>> label: rgw >>> spec: >>> config: >>> debug_rgw: 0 >>> rgw_dns_name: s3.eu-central-lz.tld >>> rgw_dns_s3website_name: s3-website.eu-central-lz.tld >>> rgw_keystone_token_cache_size: 100000 >>> rgw_thread_pool_size: 512 >>> rgw_frontend_port: 7480 >>> rgw_frontend_type: beast >>> rgw_realm: ovh >>> rgw_zone: eu-central-lz >>> rgw_zonegroup: eu-central-lz >>> >>> Am Mo., 3. Nov. 2025 um 17:09 Uhr schrieb Anthony D'Atri < >>> [email protected]>: >>> >>> How is your RGW service deployed? ceph orch? Something else? >>> >>> On Nov 3, 2025, at 10:56 AM, Boris <[email protected]> wrote: >>> >>> Hi Anthony, >>> here are the config values we've set or with their defaults. There is >>> no rgw_keystone_token_cache_ttl (neither in the documentation, nor can I >>> set it via ceph config set client.rgw rgw_keystone_token_cache_ttl 3600): >>> >>> ~# ceph config show-with-defaults rgw.rgw1 | grep rgw_keystone | column >>> -t >>> rgw_keystone_accepted_admin_roles default >>> >>> rgw_keystone_accepted_roles objectstore_operator >>> mon >>> rgw_keystone_admin_domain default >>> mon >>> rgw_keystone_admin_password yyyyyyyy >>> mon >>> rgw_keystone_admin_password_path default >>> >>> rgw_keystone_admin_project services >>> mon >>> rgw_keystone_admin_tenant default >>> >>> rgw_keystone_admin_token default >>> >>> rgw_keystone_admin_token_path default >>> >>> rgw_keystone_admin_user xxxxxxx >>> mon >>> rgw_keystone_api_version 3 >>> mon >>> rgw_keystone_barbican_domain default >>> >>> rgw_keystone_barbican_password default >>> >>> rgw_keystone_barbican_project default >>> >>> rgw_keystone_barbican_tenant default >>> >>> rgw_keystone_barbican_user default >>> >>> rgw_keystone_expired_token_cache_expiration 3600 >>> default >>> rgw_keystone_implicit_tenants false >>> default >>> rgw_keystone_service_token_accepted_roles admin >>> default >>> rgw_keystone_service_token_enabled false >>> default >>> rgw_keystone_token_cache_size 100000 >>> mon <-- i've set this to test if this solves the problem, >>> but >>> this is the default value >>> rgw_keystone_url >>> https://eur05.safelinks.protection.outlook.com/?url=https%3A%2F%2Fauth.tld%2F&data=05%7C02%7Ctobias.urdin%40binero.com%7C17fa249f3ee94151d37a08de1b8d8e9f%7C89d97f28180f459da0e585855aa63f6c%7C0%7C0%7C638978491958459086%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=jNFs4KDcToTVmzLnSEg0lgDIKHLtO6yt5zYTE6fpWao%3D&reserved=0 >>> <https://auth.tld/> >>> mon >>> rgw_keystone_verify_ssl true >>> default >>> >>> >>> >>> Am Mo., 3. Nov. 2025 um 16:40 Uhr schrieb Anthony D'Atri < >>> [email protected]>: >>> >>> Check the values of rgw_keystone_token_cache_size and >>> rgw_keystone_token_cache_ttl and other rgw_keystone options. >>> >>> I've seen at least one deployment tool that disabled Keystone caching >>> for dev purposes, but leaked that into the release code, which deployed >>> RGW >>> with Rook with a configmap override. >>> >>> >>> On Nov 3, 2025, at 9:52 AM, Boris <[email protected]> wrote: >>> >>> Hi, >>> I am currently debugging a problem that the radosgw keystone token >>> >>> cache >>> >>> seems not to work properly. Or at all. I tried to debug it and >>> >>> attached the >>> >>> rgw_debug log set to 10. I've truncated to only show the part from "No >>> stored secret string, cache miss" until the request is done. >>> >>> The failed request hits a rate limit on the keystone which currently >>> >>> takes >>> >>> around 2k answered requests per minute. >>> Any ideas what I did wrong? >>> >>> * All requests were done within 10 seconds and were only an ls to show >>> buckets. >>> * This particular RGW only took my requests during testing. >>> * We didn't set any timeouts or special cache configs in ceph >>> * system time is correct >>> >>> >>> First request worked instantly: >>> >>> req 8122732607072897744 0.106001295s s3:list_buckets No stored secret >>> string, cache miss >>> [4.0K blob data] >>> req 8122732607072897744 0.315003842s s3:list_buckets s3 keystone: >>> >>> validated >>> >>> token: 8144848695793469:user-9XGYcbFNUVTQ expires: 1762266594 >>> req 8122732607072897744 0.315003842s s3:list_buckets cache get: >>> >>> >>> name=eu-central-lz.rgw.meta+users.uid+a13f0472be744104ad1f64bb2855cdee$a13f0472be744104ad1f64bb2855cdee >>> >>> : hit (negative entry) >>> req 8122732607072897744 0.315003842s s3:list_buckets cache get: >>> name=eu-central-lz.rgw.meta+users.uid+a13f0472be744104ad1f64bb2855cdee >>> >>> : >>> >>> hit (requested=0x13, cached=0x13) >>> req 8122732607072897744 0.315003842s s3:list_buckets normalizing >>> >>> buckets >>> >>> and tenants >>> req 8122732607072897744 0.315003842s s->object=<NULL> s->bucket= >>> req 8122732607072897744 0.315003842s s3:list_buckets init permissions >>> req 8122732607072897744 0.315003842s s3:list_buckets cache get: >>> name=eu-central-lz.rgw.meta+users.uid+a13f0472be744104ad1f64bb2855cdee >>> >>> : >>> >>> hit (requested=0x13, cached=0x13) >>> req 8122732607072897744 0.315003842s s3:list_buckets recalculating >>> >>> target >>> >>> req 8122732607072897744 0.315003842s s3:list_buckets reading >>> >>> permissions >>> >>> req 8122732607072897744 0.315003842s s3:list_buckets init op >>> req 8122732607072897744 0.315003842s s3:list_buckets verifying op mask >>> req 8122732607072897744 0.315003842s s3:list_buckets verifying op >>> permissions >>> req 8122732607072897744 0.315003842s s3:list_buckets verifying op >>> >>> params >>> >>> req 8122732607072897744 0.315003842s s3:list_buckets pre-executing >>> req 8122732607072897744 0.315003842s s3:list_buckets check rate >>> >>> limiting >>> >>> req 8122732607072897744 0.315003842s s3:list_buckets executing >>> req 8122732607072897744 0.315003842s s3:list_buckets completing >>> req 8122732607072897744 0.315003842s cache get: >>> name=eu-central-lz.rgw.log++script.postrequest. : hit (negative entry) >>> req 8122732607072897744 0.315003842s s3:list_buckets op status=0 >>> req 8122732607072897744 0.315003842s s3:list_buckets http status=200 >>> ====== req done req=0x74659e51b6f0 op status=0 http_status=200 >>> latency=0.315003842s ====== >>> >>> 2nd request failed >>> >>> req 10422983006485317789 0.061000749s s3:list_buckets cache get: >>> >>> name=eu-central-lz.rgw.meta+users.keys+05917cf2ee9d4fdea8baf6a3348ca33a : >>> >>> hit (negative entry) >>> req 10422983006485317789 0.061000749s s3:list_buckets error reading >>> >>> user >>> >>> info, uid=05917cf2ee9d4fdea8baf6a3348ca33a can't authenticate >>> req 10422983006485317789 0.061000749s s3:list_buckets Failed the auth >>> strategy, reason=-5 >>> failed to authorize request >>> WARNING: set_req_state_err err_no=5 resorting to 500 >>> req 10422983006485317789 0.061000749s cache get: >>> name=eu-central-lz.rgw.log++script.postrequest. : hit (negative entry) >>> req 10422983006485317789 0.061000749s s3:list_buckets op status=0 >>> req 10422983006485317789 0.061000749s s3:list_buckets http status=500 >>> ====== req done req=0x74659e51b6f0 op status=0 http_status=500 >>> latency=0.061000749s ====== >>> >>> 3rd requests went through again >>> >>> req 13123970335019889535 0.000000000s s3:list_buckets No stored secret >>> string, cache miss >>> [250B blob data] >>> req 13123970335019889535 0.204002500s s3:list_buckets s3 keystone: >>> validated token: 8144848695793469:user-9XGYcbFNUVTQ expires: 1762266602 >>> req 13123970335019889535 0.204002500s s3:list_buckets cache get: >>> >>> >>> name=eu-central-lz.rgw.meta+users.uid+a13f0472be744104ad1f64bb2855cdee$a13f0472be744104ad1f64bb2855cdee >>> >>> : hit (negative entry) >>> req 13123970335019889535 0.204002500s s3:list_buckets cache get: >>> name=eu-central-lz.rgw.meta+users.uid+a13f0472be744104ad1f64bb2855cdee >>> >>> : >>> >>> hit (requested=0x13, cached=0x13) >>> req 13123970335019889535 0.204002500s s3:list_buckets normalizing >>> >>> buckets >>> >>> and tenants >>> req 13123970335019889535 0.204002500s s->object=<NULL> s->bucket= >>> req 13123970335019889535 0.204002500s s3:list_buckets init permissions >>> req 13123970335019889535 0.204002500s s3:list_buckets cache get: >>> name=eu-central-lz.rgw.meta+users.uid+a13f0472be744104ad1f64bb2855cdee >>> >>> : >>> >>> hit (requested=0x13, cached=0x13) >>> req 13123970335019889535 0.204002500s s3:list_buckets recalculating >>> >>> target >>> >>> req 13123970335019889535 0.204002500s s3:list_buckets reading >>> >>> permissions >>> >>> req 13123970335019889535 0.204002500s s3:list_buckets init op >>> req 13123970335019889535 0.204002500s s3:list_buckets verifying op mask >>> req 13123970335019889535 0.204002500s s3:list_buckets verifying op >>> permissions >>> req 13123970335019889535 0.204002500s s3:list_buckets verifying op >>> >>> params >>> >>> req 13123970335019889535 0.204002500s s3:list_buckets pre-executing >>> req 13123970335019889535 0.204002500s s3:list_buckets check rate >>> >>> limiting >>> >>> req 13123970335019889535 0.204002500s s3:list_buckets executing >>> req 13123970335019889535 0.204002500s s3:list_buckets completing >>> req 13123970335019889535 0.204002500s cache get: >>> name=eu-central-lz.rgw.log++script.postrequest. : hit (negative entry) >>> req 13123970335019889535 0.204002500s s3:list_buckets op status=0 >>> req 13123970335019889535 0.204002500s s3:list_buckets http status=200 >>> ====== req done req=0x74659e51b6f0 op status=0 http_status=200 >>> latency=0.204002500s ====== >>> >>> >>> >>> >>> >>> -- >>> Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend >>> >>> im >>> >>> groüen Saal. >>> _______________________________________________ >>> ceph-users mailing list -- [email protected] >>> To unsubscribe send an email to [email protected] >>> >>> >>> >>> >>> -- >>> Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im >>> groüen Saal. >>> >>> >>> >>> >>> -- >>> Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im >>> groüen Saal. >>> >>> >>> >>> -- >>> Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im >>> groüen Saal. >>> _______________________________________________ >>> ceph-users mailing list -- [email protected] >>> To unsubscribe send an email to [email protected] >>> >>> >>> >> >> -- >> Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im >> groüen Saal. >> >> >> > > -- > Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im > groüen Saal. > -- Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im groüen Saal. _______________________________________________ ceph-users mailing list -- [email protected] To unsubscribe send an email to [email protected]
