Hello Boris, What you’re missing in your flow is accounting for the AWS v4 signature algorithm used for S3, you can look it up to understand it better.
I will try to walk through the flow but will grossly oversimplify it so that it’s easier to follow in bullet list form. 1. Client prepares a HTTP request, computes a hash of multiple things contained in the request combined with the secret key (giving us: the signature) and adds that to the headers with the access key. 2. RadosGW receieves the request and sends the signature to Keystone’s /v3/s3tokens endpoint which verifies that the the access key exists and that the signature is the same if it’s computed with the secret key stored in Keystone. The request is now authenticated, Keystone says it’s OK, but RadosGW has a problem, it doesn’t know the secret key so we cannot cache anything and must talk to Keystone on each request, so... 3. RadosGW does a request against Keystone /v3/users/<user>/OS-EC2/<credential> endpoint, it knows the user because step #2 returns a Keystone token containing the info and the credential is the access key. If this request is successful RadosGW now has the secret key and adds it in the secret cache. Now on the next request to RadosGW it will lookup the secret key in the secret cache keyed by the access key, we now have the secret key and RadosGW itself can perform step #2 internally by computing the signature and verify the request. Now if #3 fails – you end up in the state you were, it does requests against Keystone on each request and it cannot populate the secret cache. This is a good example of a chicken and egg problem with caching, we could fail the request because we could not cache it but it’s already authenticated, perhaps the Keystone error was temporary and we can try again on next iteration, or should we drop the request even though we could perform it. It’s semantics on how we should handle the result of the attempt to populate the secret cache, both scenarios is valid, so maybe a config option :) Personally I would prefer to reject the request, but that is not what’s done today. When the request in #3 fails it should log a "s3 keystone: secret fetching error: %s” error in the RadosGW log if config option debug_rgw >= 2. Hope that helps. /Tobias On 7 Nov 2025, at 10:07, Boris <[email protected]> wrote: What I still don't understand: If the request to get the ec2 credentials from keystone ran into a 429, why did it work the first time? How does the keystone authentication work for radosgw under the hood? Is there a documentation I can read up on? I would have thought it was something like: 1. Authenticate with the keystone with a special set of credentials (rgw_keystone_admin_user) 2. Fetch the EC2 credentials for the provided access key 3. Save those credentials for the time that keystone told with the token lifetime 4. Do the normal s3 authentication with the cached credentials I would have thought that these tokens live in the memory of the radosgw daemon and every rgw daemon keeps track on it's own. But as I am writing this down I think "what happens to the cached credentials if the ec2 credentials will be invalidated by the end user?" btw: This is what our keystone team told me: >Tobias is top notch, he knows both keystone and ceph, he has some comments on >the recent keystone cve we patched: >launchpad.net/bugs/2119646<http://launchpad.net/bugs/2119646> >What he says makes sense. We already have the new policy that comes with the >OSSA patch on /v3/s3tokens and you have it. >If you want you can also tell him that we fixed the issue not by granting >admin but by granting /v3/users/<user>/OS-EC2/<credential> via a custom role >policy. Am Do., 6. Nov. 2025 um 15:45 Uhr schrieb Boris <[email protected]<mailto:[email protected]>>: Hi Tobias, thanks a lot for the in depth explanation. The keystone team fixed something yesterday regarding the mentioned bug and now the 429s are gone. We still have no clue why rgw worked at all and I sill try to understand it. Do you attend the ceph days in Berlin next week? Am Do., 6. Nov. 2025 um 08:26 Uhr schrieb Tobias Urdin - Binero <[email protected]<mailto:[email protected]>>: Hello Boris, Then that is probably your issue, ask the team maintaining OpenStack Keystone to check the logs for requests to that API endpoint that is failing with 403 on every request from RadosGW, similiar to this: "GET /v3/users/<user>/credentials/OS-EC2/<credential> HTTP/1.1" 403 140 "-" "-" What this means is that your authentication will work but because RadosGW cannot retrieve the EC2 credential secret and will not populate the cache and you will do authentication against Keystone on each request. --- Let me try to clear things up a bit, hopefully. RadosGW needs to perform these API requests against Keystone: /v3/auth/tokens – No policy enforce on who can talk to this API (rgw_keystone_admin_user does not need any special role). Patches and backports [1] has been done to simply drop the admin token usage in this API request. /v3/s3tokens – No policy until this week due to OSSA-2025-002 [2] security issue, this endpoint will now be enforced in future release (including for stable releases!) to require admin or service role for the rgw_keystone_admin_user in Keystone [3]. /v3/users/<user>/OS-EC2/<credential> – Policy enforcement to retrieve _other_ peoples EC2 credentials says this must have the admin role (see identity:ec2_get_credential in [4]). I’m working on a proposal in Keystone [5] to make the policy allow both admin and service roles. My proposal in [5] also includes the same changes for identity:get_credential due to a pending PR [6] that might change this API request. If my proposal [5] is merged this would allow us to remove the admin role from the configured rgw_keystone_admin_user user and only use the `service` role, the service role also has some elevated permissions and can do some damage but it’s atleast not a complete admin on the entire cloud. Hope this helps. /Tobias [1] https://github.com/ceph/ceph/pull/60515 [2] https://security.openstack.org/ossa/OSSA-2025-002.html [3] https://review.opendev.org/c/openstack/keystone/+/966069 [4] https://docs.openstack.org/keystone/latest/configuration/policy.html [5] https://review.opendev.org/c/openstack/keystone/+/966189 [6] https://github.com/ceph/ceph/pull/63283 On 5 Nov 2025, at 17:31, Boris <[email protected]<mailto:[email protected]>> wrote: Hi Tobias, I just pumped up the rgw_debug to 20 and generated new output: https://pastebin.com/PcSUSWGY I hope that I redacted all the sensitive data. :) 3 Requests to list all my buckets in <10 seconds. The 1st request showd me my buckets, then 2nd requests resulted in a 500 error and thew 3rd showed me my buckets again. For me this currently looks like I get a "429 Too Many Requests" from the keystone on all the three requests that I made and I would have expected to see this error only on the 2nd requests. Weird is also line 104-109. I have no idea how the content of the /etc/hosts file made it into the log. The keystone user that we have in the "rgw_keystone_admin_user" is not a keystone admin. The people that maintain the keystone just told me "The user doesn't have admin and we would not grant it." The "rgw_s3_auth_order" is default. We didn't touch it. "sts, external, local" Am Mi., 5. Nov. 2025 um 16:32 Uhr schrieb Tobias Urdin - Binero <[email protected]<mailto:[email protected]>>: Hello Boris, What roles is assigned to the Keystone user configured in rgw_keystone_admin_user? It needs the admin role in order to be allowed the /v3/users/<user_id>/credentials/OS-EC2/<access_key> API request. openstack role assignment list —names —user <rgw_keystone_admin_user value> A part from that I don’t understand the “2nd request failed” part as that seems to be from the LocalEngine and is not related to Keystone, if you have the default value for rgw_s3_auth_order the only thing I can think off is that there is a bug or you’re missing some patch like [1] [2] but that’s just a guess. /Tobias [1] https://github.com/ceph/ceph/pull/53846 [2] https://github.com/ceph/ceph/pull/53680 On 4 Nov 2025, at 11:32, Boris <[email protected]<mailto:[email protected]>> wrote: I've created an upstream ticket https://eur05.safelinks.protection.outlook.com/?url=https%3A%2F%2Ftracker.ceph.com%2Fissues%2F73709&data=05%7C02%7Ctobias.urdin%40binero.com%7C17fa249f3ee94151d37a08de1b8d8e9f%7C89d97f28180f459da0e585855aa63f6c%7C0%7C0%7C638978491958438817%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=MVedgbK0xyCFJY%2FuA%2FskKoY1VwBv6ikMrfVCjT9f%2Bro%3D&reserved=0<https://tracker.ceph.com/issues/73709> Am Mo., 3. Nov. 2025 um 17:13 Uhr schrieb Boris <[email protected]<mailto:[email protected]>>: yes, via ceph orch. --- service_type: rgw service_id: eu-central-lz service_name: rgw.eu-central-lz placement: count_per_host: 1 label: rgw spec: config: debug_rgw: 0 rgw_dns_name: s3.eu-central-lz.tld rgw_dns_s3website_name: s3-website.eu-central-lz.tld rgw_keystone_token_cache_size: 100000 rgw_thread_pool_size: 512 rgw_frontend_port: 7480 rgw_frontend_type: beast rgw_realm: ovh rgw_zone: eu-central-lz rgw_zonegroup: eu-central-lz Am Mo., 3. Nov. 2025 um 17:09 Uhr schrieb Anthony D'Atri < [email protected]<mailto:[email protected]>>: How is your RGW service deployed? ceph orch? Something else? On Nov 3, 2025, at 10:56 AM, Boris <[email protected]<mailto:[email protected]>> wrote: Hi Anthony, here are the config values we've set or with their defaults. There is no rgw_keystone_token_cache_ttl (neither in the documentation, nor can I set it via ceph config set client.rgw rgw_keystone_token_cache_ttl 3600): ~# ceph config show-with-defaults rgw.rgw1 | grep rgw_keystone | column -t rgw_keystone_accepted_admin_roles default rgw_keystone_accepted_roles objectstore_operator mon rgw_keystone_admin_domain default mon rgw_keystone_admin_password yyyyyyyy mon rgw_keystone_admin_password_path default rgw_keystone_admin_project services mon rgw_keystone_admin_tenant default rgw_keystone_admin_token default rgw_keystone_admin_token_path default rgw_keystone_admin_user xxxxxxx mon rgw_keystone_api_version 3 mon rgw_keystone_barbican_domain default rgw_keystone_barbican_password default rgw_keystone_barbican_project default rgw_keystone_barbican_tenant default rgw_keystone_barbican_user default rgw_keystone_expired_token_cache_expiration 3600 default rgw_keystone_implicit_tenants false default rgw_keystone_service_token_accepted_roles admin default rgw_keystone_service_token_enabled false default rgw_keystone_token_cache_size 100000 mon <-- i've set this to test if this solves the problem, but this is the default value rgw_keystone_url https://eur05.safelinks.protection.outlook.com/?url=https%3A%2F%2Fauth.tld%2F&data=05%7C02%7Ctobias.urdin%40binero.com%7C17fa249f3ee94151d37a08de1b8d8e9f%7C89d97f28180f459da0e585855aa63f6c%7C0%7C0%7C638978491958459086%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=jNFs4KDcToTVmzLnSEg0lgDIKHLtO6yt5zYTE6fpWao%3D&reserved=0<https://auth.tld/> mon rgw_keystone_verify_ssl true default Am Mo., 3. Nov. 2025 um 16:40 Uhr schrieb Anthony D'Atri < [email protected]<mailto:[email protected]>>: Check the values of rgw_keystone_token_cache_size and rgw_keystone_token_cache_ttl and other rgw_keystone options. I've seen at least one deployment tool that disabled Keystone caching for dev purposes, but leaked that into the release code, which deployed RGW with Rook with a configmap override. On Nov 3, 2025, at 9:52 AM, Boris <[email protected]<mailto:[email protected]>> wrote: Hi, I am currently debugging a problem that the radosgw keystone token cache seems not to work properly. Or at all. I tried to debug it and attached the rgw_debug log set to 10. I've truncated to only show the part from "No stored secret string, cache miss" until the request is done. The failed request hits a rate limit on the keystone which currently takes around 2k answered requests per minute. Any ideas what I did wrong? * All requests were done within 10 seconds and were only an ls to show buckets. * This particular RGW only took my requests during testing. * We didn't set any timeouts or special cache configs in ceph * system time is correct First request worked instantly: req 8122732607072897744 0.106001295s s3:list_buckets No stored secret string, cache miss [4.0K blob data] req 8122732607072897744 0.315003842s s3:list_buckets s3 keystone: validated token: 8144848695793469:user-9XGYcbFNUVTQ expires: 1762266594 req 8122732607072897744 0.315003842s s3:list_buckets cache get: name=eu-central-lz.rgw.meta+users.uid+a13f0472be744104ad1f64bb2855cdee$a13f0472be744104ad1f64bb2855cdee : hit (negative entry) req 8122732607072897744 0.315003842s s3:list_buckets cache get: name=eu-central-lz.rgw.meta+users.uid+a13f0472be744104ad1f64bb2855cdee : hit (requested=0x13, cached=0x13) req 8122732607072897744 0.315003842s s3:list_buckets normalizing buckets and tenants req 8122732607072897744 0.315003842s s->object=<NULL> s->bucket= req 8122732607072897744 0.315003842s s3:list_buckets init permissions req 8122732607072897744 0.315003842s s3:list_buckets cache get: name=eu-central-lz.rgw.meta+users.uid+a13f0472be744104ad1f64bb2855cdee : hit (requested=0x13, cached=0x13) req 8122732607072897744 0.315003842s s3:list_buckets recalculating target req 8122732607072897744 0.315003842s s3:list_buckets reading permissions req 8122732607072897744 0.315003842s s3:list_buckets init op req 8122732607072897744 0.315003842s s3:list_buckets verifying op mask req 8122732607072897744 0.315003842s s3:list_buckets verifying op permissions req 8122732607072897744 0.315003842s s3:list_buckets verifying op params req 8122732607072897744 0.315003842s s3:list_buckets pre-executing req 8122732607072897744 0.315003842s s3:list_buckets check rate limiting req 8122732607072897744 0.315003842s s3:list_buckets executing req 8122732607072897744 0.315003842s s3:list_buckets completing req 8122732607072897744 0.315003842s cache get: name=eu-central-lz.rgw.log++script.postrequest. : hit (negative entry) req 8122732607072897744 0.315003842s s3:list_buckets op status=0 req 8122732607072897744 0.315003842s s3:list_buckets http status=200 ====== req done req=0x74659e51b6f0 op status=0 http_status=200 latency=0.315003842s ====== 2nd request failed req 10422983006485317789 0.061000749s s3:list_buckets cache get: name=eu-central-lz.rgw.meta+users.keys+05917cf2ee9d4fdea8baf6a3348ca33a : hit (negative entry) req 10422983006485317789 0.061000749s s3:list_buckets error reading user info, uid=05917cf2ee9d4fdea8baf6a3348ca33a can't authenticate req 10422983006485317789 0.061000749s s3:list_buckets Failed the auth strategy, reason=-5 failed to authorize request WARNING: set_req_state_err err_no=5 resorting to 500 req 10422983006485317789 0.061000749s cache get: name=eu-central-lz.rgw.log++script.postrequest. : hit (negative entry) req 10422983006485317789 0.061000749s s3:list_buckets op status=0 req 10422983006485317789 0.061000749s s3:list_buckets http status=500 ====== req done req=0x74659e51b6f0 op status=0 http_status=500 latency=0.061000749s ====== 3rd requests went through again req 13123970335019889535 0.000000000s s3:list_buckets No stored secret string, cache miss [250B blob data] req 13123970335019889535 0.204002500s s3:list_buckets s3 keystone: validated token: 8144848695793469:user-9XGYcbFNUVTQ expires: 1762266602 req 13123970335019889535 0.204002500s s3:list_buckets cache get: name=eu-central-lz.rgw.meta+users.uid+a13f0472be744104ad1f64bb2855cdee$a13f0472be744104ad1f64bb2855cdee : hit (negative entry) req 13123970335019889535 0.204002500s s3:list_buckets cache get: name=eu-central-lz.rgw.meta+users.uid+a13f0472be744104ad1f64bb2855cdee : hit (requested=0x13, cached=0x13) req 13123970335019889535 0.204002500s s3:list_buckets normalizing buckets and tenants req 13123970335019889535 0.204002500s s->object=<NULL> s->bucket= req 13123970335019889535 0.204002500s s3:list_buckets init permissions req 13123970335019889535 0.204002500s s3:list_buckets cache get: name=eu-central-lz.rgw.meta+users.uid+a13f0472be744104ad1f64bb2855cdee : hit (requested=0x13, cached=0x13) req 13123970335019889535 0.204002500s s3:list_buckets recalculating target req 13123970335019889535 0.204002500s s3:list_buckets reading permissions req 13123970335019889535 0.204002500s s3:list_buckets init op req 13123970335019889535 0.204002500s s3:list_buckets verifying op mask req 13123970335019889535 0.204002500s s3:list_buckets verifying op permissions req 13123970335019889535 0.204002500s s3:list_buckets verifying op params req 13123970335019889535 0.204002500s s3:list_buckets pre-executing req 13123970335019889535 0.204002500s s3:list_buckets check rate limiting req 13123970335019889535 0.204002500s s3:list_buckets executing req 13123970335019889535 0.204002500s s3:list_buckets completing req 13123970335019889535 0.204002500s cache get: name=eu-central-lz.rgw.log++script.postrequest. : hit (negative entry) req 13123970335019889535 0.204002500s s3:list_buckets op status=0 req 13123970335019889535 0.204002500s s3:list_buckets http status=200 ====== req done req=0x74659e51b6f0 op status=0 http_status=200 latency=0.204002500s ====== -- Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im groüen Saal. _______________________________________________ ceph-users mailing list -- [email protected]<mailto:[email protected]> To unsubscribe send an email to [email protected]<mailto:[email protected]> -- Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im groüen Saal. -- Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im groüen Saal. -- Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im groüen Saal. _______________________________________________ ceph-users mailing list -- [email protected]<mailto:[email protected]> To unsubscribe send an email to [email protected]<mailto:[email protected]> -- Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im groüen Saal. -- Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im groüen Saal. -- Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im groüen Saal. _______________________________________________ ceph-users mailing list -- [email protected] To unsubscribe send an email to [email protected]
