Oooooohhhhh.. *click! I wasn't aware that keystone could do the header verification. That was my missing puzzle piece. And now everything makes sense. Thanks a lot for your patience!
Then I can also close the bug report now, because after the changes made by our keystone team the amount of request on the keystone dropped off a cliff. Wish you all a happy weekend. Am Fr., 7. Nov. 2025 um 11:57 Uhr schrieb Tobias Urdin - Binero < [email protected]>: > Hello Boris, > > What you’re missing in your flow is accounting for the AWS v4 signature > algorithm used for S3, you can look it up to understand it better. > > I will try to walk through the flow but will grossly oversimplify it so > that > it’s easier to follow in bullet list form. > > 1. Client prepares a HTTP request, computes a hash of multiple things > contained in the request combined with the secret key (giving us: the > signature) and > adds that to the headers with the access key. > > 2. RadosGW receieves the request and sends the signature to Keystone’s > /v3/s3tokens > endpoint which verifies that the the access key exists and that the > signature is the > same if it’s computed with the secret key stored in Keystone. > > The request is now authenticated, Keystone says it’s OK, but RadosGW has > a problem, it doesn’t know the secret key so we cannot cache anything and > must talk to Keystone on each request, so... > > 3. RadosGW does a request against Keystone > /v3/users/<user>/OS-EC2/<credential> > endpoint, it knows the user because step #2 returns a Keystone token > containing the > info and the credential is the access key. If this request is successful > RadosGW now > has the secret key and adds it in the secret cache. > > Now on the next request to RadosGW it will lookup the secret key in the > secret cache > keyed by the access key, we now have the secret key and RadosGW itself can > perform > step #2 internally by computing the signature and verify the request. > > Now if #3 fails – you end up in the state you were, it does requests > against Keystone > on each request and it cannot populate the secret cache. > > This is a good example of a chicken and egg problem with caching, we could > fail the > request because we could not cache it but it’s already authenticated, > perhaps the > Keystone error was temporary and we can try again on next iteration, or > should we > drop the request even though we could perform it. It’s semantics on how we > should > handle the result of the attempt to populate the secret cache, both > scenarios is valid, so > maybe a config option :) > > Personally I would prefer to reject the request, but that is not what’s > done today. When > the request in #3 fails it should log a "s3 keystone: secret fetching > error: %s” error in > the RadosGW log if config option debug_rgw >= 2. > > Hope that helps. > > /Tobias > > On 7 Nov 2025, at 10:07, Boris <[email protected]> wrote: > > What I still don't understand: > > If the request to get the ec2 credentials from keystone ran into a 429, > why did it work the first time? > How does the keystone authentication work for radosgw under the hood? Is > there a documentation I can read up on? > > I would have thought it was something like: > 1. Authenticate with the keystone with a special set of credentials > (rgw_keystone_admin_user) > 2. Fetch the EC2 credentials for the provided access key > 3. Save those credentials for the time that keystone told with the token > lifetime > 4. Do the normal s3 authentication with the cached credentials > > I would have thought that these tokens live in the memory of the radosgw > daemon and every rgw daemon keeps track on it's own. > But as I am writing this down I think "what happens to the cached > credentials if the ec2 credentials will be invalidated by the end user?" > > > > btw: This is what our keystone team told me: > >Tobias is top notch, he knows both keystone and ceph, he has some > comments on the recent keystone cve we patched: launchpad.net/bugs/2119646 > >What he says makes sense. We already have the new policy that comes with > the OSSA patch on /v3/s3tokens and you have it. > >If you want you can also tell him that we fixed the issue not by granting > admin but by granting /v3/users/<user>/OS-EC2/<credential> via a custom > role policy. > > Am Do., 6. Nov. 2025 um 15:45 Uhr schrieb Boris <[email protected]>: > >> Hi Tobias, >> >> thanks a lot for the in depth explanation. The keystone team fixed >> something yesterday regarding the mentioned bug and now the 429s are gone. >> We still have no clue why rgw worked at all and I sill try to understand >> it. >> >> Do you attend the ceph days in Berlin next week? >> >> >> >> Am Do., 6. Nov. 2025 um 08:26 Uhr schrieb Tobias Urdin - Binero < >> [email protected]>: >> >>> Hello Boris, >>> >>> Then that is probably your issue, ask the team maintaining OpenStack >>> Keystone to check the logs for requests >>> to that API endpoint that is failing with 403 on every request from >>> RadosGW, similiar to this: >>> >>> "GET /v3/users/<user>/credentials/OS-EC2/<credential> HTTP/1.1" 403 >>> 140 "-" "-" >>> >>> What this means is that your authentication will work but because >>> RadosGW cannot retrieve the EC2 credential >>> secret and will not populate the cache and you will do authentication >>> against Keystone on each request. >>> >>> --- >>> >>> Let me try to clear things up a bit, hopefully. RadosGW needs to perform >>> these API requests against Keystone: >>> >>> /v3/auth/tokens – No policy enforce on who can talk to this API >>> (rgw_keystone_admin_user does not need any >>> special role). Patches and backports [1] has been done to simply drop >>> the admin token usage in this API request. >>> >>> /v3/s3tokens – No policy until this week due to OSSA-2025-002 [2] >>> security issue, this endpoint will now be enforced >>> in future release (including for stable releases!) to require admin or >>> service role for the rgw_keystone_admin_user in >>> Keystone [3]. >>> >>> /v3/users/<user>/OS-EC2/<credential> – Policy enforcement to retrieve >>> _other_ peoples EC2 credentials says >>> this must have the admin role (see identity:ec2_get_credential in [4]). >>> I’m working on a proposal in Keystone [5] to make >>> the policy allow both admin and service roles. My proposal in [5] also >>> includes the same changes for identity:get_credential >>> due to a pending PR [6] that might change this API request. >>> >>> If my proposal [5] is merged this would allow us to remove the admin >>> role from the configured rgw_keystone_admin_user >>> user and only use the `service` role, the service role also has some >>> elevated permissions and can do some damage but >>> it’s atleast not a complete admin on the entire cloud. >>> >>> Hope this helps. >>> >>> /Tobias >>> >>> [1] https://github.com/ceph/ceph/pull/60515 >>> [2] https://security.openstack.org/ossa/OSSA-2025-002.html >>> [3] https://review.opendev.org/c/openstack/keystone/+/966069 >>> [4] https://docs.openstack.org/keystone/latest/configuration/policy.html >>> [5] https://review.opendev.org/c/openstack/keystone/+/966189 >>> [6] https://github.com/ceph/ceph/pull/63283 >>> >>> On 5 Nov 2025, at 17:31, Boris <[email protected]> wrote: >>> >>> Hi Tobias, >>> >>> I just pumped up the rgw_debug to 20 and generated new output: >>> https://pastebin.com/PcSUSWGY >>> I hope that I redacted all the sensitive data. :) >>> >>> 3 Requests to list all my buckets in <10 seconds. >>> The 1st request showd me my buckets, then 2nd requests resulted in a 500 >>> error and thew 3rd showed me my buckets again. >>> >>> For me this currently looks like I get a "429 Too Many Requests" from >>> the keystone on all the three requests that I made and I would have >>> expected to see this error only on the 2nd requests. >>> Weird is also line 104-109. I have no idea how the content of the >>> /etc/hosts file made it into the log. >>> >>> The keystone user that we have in the "rgw_keystone_admin_user" is not a >>> keystone admin. The people that maintain the keystone just told me "The >>> user doesn't have admin and we would not grant it." >>> The "rgw_s3_auth_order" is default. We didn't touch it. "sts, external, >>> local" >>> >>> >>> Am Mi., 5. Nov. 2025 um 16:32 Uhr schrieb Tobias Urdin - Binero < >>> [email protected]>: >>> >>>> Hello Boris, >>>> >>>> What roles is assigned to the Keystone user configured in >>>> rgw_keystone_admin_user? It needs the >>>> admin role in order to be allowed the >>>> /v3/users/<user_id>/credentials/OS-EC2/<access_key> API request. >>>> >>>> openstack role assignment list —names —user >>>> <rgw_keystone_admin_user value> >>>> >>>> A part from that I don’t understand the “2nd request failed” part as >>>> that seems to be from the LocalEngine >>>> and is not related to Keystone, if you have the default value >>>> for rgw_s3_auth_order the only thing I can >>>> think off is that there is a bug or you’re missing some patch like [1] >>>> [2] but that’s just a guess. >>>> >>>> /Tobias >>>> >>>> [1] https://github.com/ceph/ceph/pull/53846 >>>> [2] https://github.com/ceph/ceph/pull/53680 >>>> >>>> >>>> On 4 Nov 2025, at 11:32, Boris <[email protected]> wrote: >>>> >>>> I've created an upstream ticket >>>> https://eur05.safelinks.protection.outlook.com/?url=https%3A%2F%2Ftracker.ceph.com%2Fissues%2F73709&data=05%7C02%7Ctobias.urdin%40binero.com%7C17fa249f3ee94151d37a08de1b8d8e9f%7C89d97f28180f459da0e585855aa63f6c%7C0%7C0%7C638978491958438817%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=MVedgbK0xyCFJY%2FuA%2FskKoY1VwBv6ikMrfVCjT9f%2Bro%3D&reserved=0 >>>> <https://tracker.ceph.com/issues/73709> >>>> >>>> Am Mo., 3. Nov. 2025 um 17:13 Uhr schrieb Boris <[email protected]>: >>>> >>>> yes, via ceph orch. >>>> >>>> --- >>>> service_type: rgw >>>> service_id: eu-central-lz >>>> service_name: rgw.eu-central-lz >>>> placement: >>>> count_per_host: 1 >>>> label: rgw >>>> spec: >>>> config: >>>> debug_rgw: 0 >>>> rgw_dns_name: s3.eu-central-lz.tld >>>> rgw_dns_s3website_name: s3-website.eu-central-lz.tld >>>> rgw_keystone_token_cache_size: 100000 >>>> rgw_thread_pool_size: 512 >>>> rgw_frontend_port: 7480 >>>> rgw_frontend_type: beast >>>> rgw_realm: ovh >>>> rgw_zone: eu-central-lz >>>> rgw_zonegroup: eu-central-lz >>>> >>>> Am Mo., 3. Nov. 2025 um 17:09 Uhr schrieb Anthony D'Atri < >>>> [email protected]>: >>>> >>>> How is your RGW service deployed? ceph orch? Something else? >>>> >>>> On Nov 3, 2025, at 10:56 AM, Boris <[email protected]> wrote: >>>> >>>> Hi Anthony, >>>> here are the config values we've set or with their defaults. There is >>>> no rgw_keystone_token_cache_ttl (neither in the documentation, nor can I >>>> set it via ceph config set client.rgw rgw_keystone_token_cache_ttl >>>> 3600): >>>> >>>> ~# ceph config show-with-defaults rgw.rgw1 | grep rgw_keystone | column >>>> -t >>>> rgw_keystone_accepted_admin_roles default >>>> >>>> rgw_keystone_accepted_roles objectstore_operator >>>> mon >>>> rgw_keystone_admin_domain default >>>> mon >>>> rgw_keystone_admin_password yyyyyyyy >>>> mon >>>> rgw_keystone_admin_password_path default >>>> >>>> rgw_keystone_admin_project services >>>> mon >>>> rgw_keystone_admin_tenant default >>>> >>>> rgw_keystone_admin_token default >>>> >>>> rgw_keystone_admin_token_path default >>>> >>>> rgw_keystone_admin_user xxxxxxx >>>> mon >>>> rgw_keystone_api_version 3 >>>> mon >>>> rgw_keystone_barbican_domain default >>>> >>>> rgw_keystone_barbican_password default >>>> >>>> rgw_keystone_barbican_project default >>>> >>>> rgw_keystone_barbican_tenant default >>>> >>>> rgw_keystone_barbican_user default >>>> >>>> rgw_keystone_expired_token_cache_expiration 3600 >>>> default >>>> rgw_keystone_implicit_tenants false >>>> default >>>> rgw_keystone_service_token_accepted_roles admin >>>> default >>>> rgw_keystone_service_token_enabled false >>>> default >>>> rgw_keystone_token_cache_size 100000 >>>> mon <-- i've set this to test if this solves the problem, >>>> but >>>> this is the default value >>>> rgw_keystone_url >>>> https://eur05.safelinks.protection.outlook.com/?url=https%3A%2F%2Fauth.tld%2F&data=05%7C02%7Ctobias.urdin%40binero.com%7C17fa249f3ee94151d37a08de1b8d8e9f%7C89d97f28180f459da0e585855aa63f6c%7C0%7C0%7C638978491958459086%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=jNFs4KDcToTVmzLnSEg0lgDIKHLtO6yt5zYTE6fpWao%3D&reserved=0 >>>> <https://auth.tld/> >>>> mon >>>> rgw_keystone_verify_ssl true >>>> default >>>> >>>> >>>> >>>> Am Mo., 3. Nov. 2025 um 16:40 Uhr schrieb Anthony D'Atri < >>>> [email protected]>: >>>> >>>> Check the values of rgw_keystone_token_cache_size and >>>> rgw_keystone_token_cache_ttl and other rgw_keystone options. >>>> >>>> I've seen at least one deployment tool that disabled Keystone caching >>>> for dev purposes, but leaked that into the release code, which deployed >>>> RGW >>>> with Rook with a configmap override. >>>> >>>> >>>> On Nov 3, 2025, at 9:52 AM, Boris <[email protected]> wrote: >>>> >>>> Hi, >>>> I am currently debugging a problem that the radosgw keystone token >>>> >>>> cache >>>> >>>> seems not to work properly. Or at all. I tried to debug it and >>>> >>>> attached the >>>> >>>> rgw_debug log set to 10. I've truncated to only show the part from "No >>>> stored secret string, cache miss" until the request is done. >>>> >>>> The failed request hits a rate limit on the keystone which currently >>>> >>>> takes >>>> >>>> around 2k answered requests per minute. >>>> Any ideas what I did wrong? >>>> >>>> * All requests were done within 10 seconds and were only an ls to show >>>> buckets. >>>> * This particular RGW only took my requests during testing. >>>> * We didn't set any timeouts or special cache configs in ceph >>>> * system time is correct >>>> >>>> >>>> First request worked instantly: >>>> >>>> req 8122732607072897744 0.106001295s s3:list_buckets No stored secret >>>> string, cache miss >>>> [4.0K blob data] >>>> req 8122732607072897744 0.315003842s s3:list_buckets s3 keystone: >>>> >>>> validated >>>> >>>> token: 8144848695793469:user-9XGYcbFNUVTQ expires: 1762266594 >>>> req 8122732607072897744 0.315003842s s3:list_buckets cache get: >>>> >>>> >>>> name=eu-central-lz.rgw.meta+users.uid+a13f0472be744104ad1f64bb2855cdee$a13f0472be744104ad1f64bb2855cdee >>>> >>>> : hit (negative entry) >>>> req 8122732607072897744 0.315003842s s3:list_buckets cache get: >>>> name=eu-central-lz.rgw.meta+users.uid+a13f0472be744104ad1f64bb2855cdee >>>> >>>> : >>>> >>>> hit (requested=0x13, cached=0x13) >>>> req 8122732607072897744 0.315003842s s3:list_buckets normalizing >>>> >>>> buckets >>>> >>>> and tenants >>>> req 8122732607072897744 0.315003842s s->object=<NULL> s->bucket= >>>> req 8122732607072897744 0.315003842s s3:list_buckets init permissions >>>> req 8122732607072897744 0.315003842s s3:list_buckets cache get: >>>> name=eu-central-lz.rgw.meta+users.uid+a13f0472be744104ad1f64bb2855cdee >>>> >>>> : >>>> >>>> hit (requested=0x13, cached=0x13) >>>> req 8122732607072897744 0.315003842s s3:list_buckets recalculating >>>> >>>> target >>>> >>>> req 8122732607072897744 0.315003842s s3:list_buckets reading >>>> >>>> permissions >>>> >>>> req 8122732607072897744 0.315003842s s3:list_buckets init op >>>> req 8122732607072897744 0.315003842s s3:list_buckets verifying op mask >>>> req 8122732607072897744 0.315003842s s3:list_buckets verifying op >>>> permissions >>>> req 8122732607072897744 0.315003842s s3:list_buckets verifying op >>>> >>>> params >>>> >>>> req 8122732607072897744 0.315003842s s3:list_buckets pre-executing >>>> req 8122732607072897744 0.315003842s s3:list_buckets check rate >>>> >>>> limiting >>>> >>>> req 8122732607072897744 0.315003842s s3:list_buckets executing >>>> req 8122732607072897744 0.315003842s s3:list_buckets completing >>>> req 8122732607072897744 0.315003842s cache get: >>>> name=eu-central-lz.rgw.log++script.postrequest. : hit (negative entry) >>>> req 8122732607072897744 0.315003842s s3:list_buckets op status=0 >>>> req 8122732607072897744 0.315003842s s3:list_buckets http status=200 >>>> ====== req done req=0x74659e51b6f0 op status=0 http_status=200 >>>> latency=0.315003842s ====== >>>> >>>> 2nd request failed >>>> >>>> req 10422983006485317789 0.061000749s s3:list_buckets cache get: >>>> >>>> name=eu-central-lz.rgw.meta+users.keys+05917cf2ee9d4fdea8baf6a3348ca33a >>>> : >>>> >>>> hit (negative entry) >>>> req 10422983006485317789 0.061000749s s3:list_buckets error reading >>>> >>>> user >>>> >>>> info, uid=05917cf2ee9d4fdea8baf6a3348ca33a can't authenticate >>>> req 10422983006485317789 0.061000749s s3:list_buckets Failed the auth >>>> strategy, reason=-5 >>>> failed to authorize request >>>> WARNING: set_req_state_err err_no=5 resorting to 500 >>>> req 10422983006485317789 0.061000749s cache get: >>>> name=eu-central-lz.rgw.log++script.postrequest. : hit (negative entry) >>>> req 10422983006485317789 0.061000749s s3:list_buckets op status=0 >>>> req 10422983006485317789 0.061000749s s3:list_buckets http status=500 >>>> ====== req done req=0x74659e51b6f0 op status=0 http_status=500 >>>> latency=0.061000749s ====== >>>> >>>> 3rd requests went through again >>>> >>>> req 13123970335019889535 0.000000000s s3:list_buckets No stored secret >>>> string, cache miss >>>> [250B blob data] >>>> req 13123970335019889535 0.204002500s s3:list_buckets s3 keystone: >>>> validated token: 8144848695793469:user-9XGYcbFNUVTQ expires: 1762266602 >>>> req 13123970335019889535 0.204002500s s3:list_buckets cache get: >>>> >>>> >>>> name=eu-central-lz.rgw.meta+users.uid+a13f0472be744104ad1f64bb2855cdee$a13f0472be744104ad1f64bb2855cdee >>>> >>>> : hit (negative entry) >>>> req 13123970335019889535 0.204002500s s3:list_buckets cache get: >>>> name=eu-central-lz.rgw.meta+users.uid+a13f0472be744104ad1f64bb2855cdee >>>> >>>> : >>>> >>>> hit (requested=0x13, cached=0x13) >>>> req 13123970335019889535 0.204002500s s3:list_buckets normalizing >>>> >>>> buckets >>>> >>>> and tenants >>>> req 13123970335019889535 0.204002500s s->object=<NULL> s->bucket= >>>> req 13123970335019889535 0.204002500s s3:list_buckets init permissions >>>> req 13123970335019889535 0.204002500s s3:list_buckets cache get: >>>> name=eu-central-lz.rgw.meta+users.uid+a13f0472be744104ad1f64bb2855cdee >>>> >>>> : >>>> >>>> hit (requested=0x13, cached=0x13) >>>> req 13123970335019889535 0.204002500s s3:list_buckets recalculating >>>> >>>> target >>>> >>>> req 13123970335019889535 0.204002500s s3:list_buckets reading >>>> >>>> permissions >>>> >>>> req 13123970335019889535 0.204002500s s3:list_buckets init op >>>> req 13123970335019889535 0.204002500s s3:list_buckets verifying op mask >>>> req 13123970335019889535 0.204002500s s3:list_buckets verifying op >>>> permissions >>>> req 13123970335019889535 0.204002500s s3:list_buckets verifying op >>>> >>>> params >>>> >>>> req 13123970335019889535 0.204002500s s3:list_buckets pre-executing >>>> req 13123970335019889535 0.204002500s s3:list_buckets check rate >>>> >>>> limiting >>>> >>>> req 13123970335019889535 0.204002500s s3:list_buckets executing >>>> req 13123970335019889535 0.204002500s s3:list_buckets completing >>>> req 13123970335019889535 0.204002500s cache get: >>>> name=eu-central-lz.rgw.log++script.postrequest. : hit (negative entry) >>>> req 13123970335019889535 0.204002500s s3:list_buckets op status=0 >>>> req 13123970335019889535 0.204002500s s3:list_buckets http status=200 >>>> ====== req done req=0x74659e51b6f0 op status=0 http_status=200 >>>> latency=0.204002500s ====== >>>> >>>> >>>> >>>> >>>> >>>> -- >>>> Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend >>>> >>>> im >>>> >>>> groüen Saal. >>>> _______________________________________________ >>>> ceph-users mailing list -- [email protected] >>>> To unsubscribe send an email to [email protected] >>>> >>>> >>>> >>>> >>>> -- >>>> Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im >>>> groüen Saal. >>>> >>>> >>>> >>>> >>>> -- >>>> Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im >>>> groüen Saal. >>>> >>>> >>>> >>>> -- >>>> Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im >>>> groüen Saal. >>>> _______________________________________________ >>>> ceph-users mailing list -- [email protected] >>>> To unsubscribe send an email to [email protected] >>>> >>>> >>>> >>> >>> -- >>> Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im >>> groüen Saal. >>> >>> >>> >> >> -- >> Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im >> groüen Saal. >> > > > -- > Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im > groüen Saal. > > > -- Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im groüen Saal. _______________________________________________ ceph-users mailing list -- [email protected] To unsubscribe send an email to [email protected]
