[Yahoo-eng-team] [Bug 2007982] [NEW] keystone role cache misbehaving in HA setup

ebl...@nde.ag Tue, 21 Feb 2023 06:17:26 -0800

Public bug reported:

Following up on two mailing list reports [1][2] which seem to have the same 
root cause. 
In a HA setup with 3 control nodes (Victoria, baremetal) terraform is used to 
deploy lots of different k8s clusters (and other stuff). We noticed keystone 
errors when a project is purged with terraform (cleanly) and a redeployment of 
the same project (with the same name) is started immediately after that. We did 
some tests to find out which exact keystone cache it is and it seems to be the 
role cache (default 600 seconds) which leads to an error in terraform, it 
reports that the project was not found and refers to the previous ID of the 
project which is already deleted from the database during the project cleanup.
The same deployment works in an identical cloud version except with only one 
control node, it just works although the cache is enabled as well.
I already tried to reduce the cache_time to 30 seconds but that doesn't help 
(although it takes more than 30 seconds until terraform is ready after the 
prechecks). I also disabled the role cache entirely which helps with the faster 
redeployment but the downside of disabling it leads to significantly longer 
response times when using the dashboard or querying the APIs.
Is there any way to tune the role cache in a way so we could have both a 
reasonable performance as well as being able to redeploy projects without a 
"sleep 600"?


Storage back end is Ceph (Pacific), keystone versions are:

control01:~ # rpm -qa | grep keystone
python3-keystonemiddleware-9.1.0-lp152.3.20.noarch
python3-keystone-18.0.1~dev11-lp152.1.21.noarch
python3-keystoneauth1-4.2.1-lp152.3.19.noarch
python3-keystoneclient-4.1.0-lp152.5.2.noarch
openstack-keystone-18.0.1~dev11-lp152.1.21.noarch

[1] 
https://lists.openstack.org/pipermail/openstack-discuss/2022-November/031122.html
[2] 
https://lists.openstack.org/pipermail/openstack-discuss/2023-February/032258.html

** Affects: keystone
     Importance: Undecided
         Status: New

** Project changed: nova => keystone

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/2007982

Title:
  keystone role cache misbehaving in HA setup

Status in OpenStack Identity (keystone):
  New

Bug description:
  Following up on two mailing list reports [1][2] which seem to have the same 
root cause. 
  In a HA setup with 3 control nodes (Victoria, baremetal) terraform is used to 
deploy lots of different k8s clusters (and other stuff). We noticed keystone 
errors when a project is purged with terraform (cleanly) and a redeployment of 
the same project (with the same name) is started immediately after that. We did 
some tests to find out which exact keystone cache it is and it seems to be the 
role cache (default 600 seconds) which leads to an error in terraform, it 
reports that the project was not found and refers to the previous ID of the 
project which is already deleted from the database during the project cleanup.
  The same deployment works in an identical cloud version except with only one 
control node, it just works although the cache is enabled as well.
  I already tried to reduce the cache_time to 30 seconds but that doesn't help 
(although it takes more than 30 seconds until terraform is ready after the 
prechecks). I also disabled the role cache entirely which helps with the faster 
redeployment but the downside of disabling it leads to significantly longer 
response times when using the dashboard or querying the APIs.
  Is there any way to tune the role cache in a way so we could have both a 
reasonable performance as well as being able to redeploy projects without a 
"sleep 600"?

  Storage back end is Ceph (Pacific), keystone versions are:

  control01:~ # rpm -qa | grep keystone
  python3-keystonemiddleware-9.1.0-lp152.3.20.noarch
  python3-keystone-18.0.1~dev11-lp152.1.21.noarch
  python3-keystoneauth1-4.2.1-lp152.3.19.noarch
  python3-keystoneclient-4.1.0-lp152.5.2.noarch
  openstack-keystone-18.0.1~dev11-lp152.1.21.noarch

  [1] 
https://lists.openstack.org/pipermail/openstack-discuss/2022-November/031122.html
  [2] 
https://lists.openstack.org/pipermail/openstack-discuss/2023-February/032258.html

To manage notifications about this bug go to:
https://bugs.launchpad.net/keystone/+bug/2007982/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to     : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

[Yahoo-eng-team] [Bug 2007982] [NEW] keystone role cache misbehaving in HA setup

Reply via email to