** Also affects: ceph
   Importance: Undecided
       Status: New

** Also affects: openstack-ansible
   Importance: Undecided
       Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/2051244

Title:
  Documentation of  Ceph auth caps for RBD clients used by Cinder /
  Glance / Nova is missing or inconsistent

Status in Ceph:
  New
Status in Cinder:
  New
Status in Glance:
  New
Status in glance_store:
  New
Status in OpenStack Compute (nova):
  New
Status in openstack-ansible:
  New

Bug description:
  This bug originates from my post to the openstack-discuss ML - 
https://lists.openstack.org/archives/list/openstack-disc...@lists.openstack.org/thread/E3VYY24HUGBNH7626ALOGZMJRVX5VOSZ/
  which was discussed at a cinder-weekly 
(https://meetings.opendev.org/meetings/cinder/2024/cinder.2024-01-24-14.01.log.html#l-43).

  In short: There seem to be inconsistencies in the correct and required Ceph 
authx permissions for the RBD clients in Cinder, Glance and also Nova.
  While it's nice to have the various deployment tools like openstack-ansible 
([4]) or charm[[5]]) do it somewhat "properly",
  first and foremost this needs to be properly documented in the source 
documentation of Glance and also Cinder and Nova for that matter.

  And achieving this is what this bug report is intended to do.
  The proposed steps are ...

   * determine and discuss the correct caps (least privileges, caps via 
profiles where possible, ...)
   * update the documentation / install guides and the devstack code. Those 
should all serve as references for the correct way of doing things.
   * write an upgrade bullet point to release notes for Caracal, to have 
operators check and align their caps
   * spread the word / open bugs for the deployment tools for them to update 
their config / code accordingly
   * send a PR to have Ceph update their docs


  The long story about the various Ceph (RBD) clients and uses withing
  Glance, Cinder and Nova:

  
  1) Glance

  First there was a simple issue reported for Glance [3].

  When Glance is requested to delete an image it will check if this image has 
depended children, see 
https://opendev.org/openstack/glance_store/src/commit/6f5011d1f05c99894fb8b909d33ad23a20bf83a9/glance_store/_drivers/rbd.py#L459.
  The children of Glance images usually are (Cinder) volumes, which therefore 
live in a different RBD pool "volumes". But if such children do exist a 500 
error is thrown by Glance API.

  Manually using the RBD client shows the same error:

  > # rbd -n client.glance -k /etc/ceph/ceph.client.glance.keyring -p images 
children $IMAGE_ID
  >
  > 2023-12-13T16:51:48.131+0000 7f198cf4e640 -1 librbd::image::OpenRequest: 
failed to retrieve name: (1) Operation not permitted
  > 2023-12-13T16:51:48.131+0000 7f198d74f640 -1 librbd::ImageState: 
0x5639fdd5af60 failed to open image: (1) Operation not permitted
  > rbd: listing children failed: (1) Operation not permitted
  > 2023-12-13T16:51:48.131+0000 7f1990c474c0 -1 librbd::api::Image: 
list_descendants: failed to open descendant b7078ed7ace50d from pool 
instances:(1) Operation not permitted

  So it's a permission error. Following either the documentation of Glance [1] 
or Ceph [2] on configuring the ceph auth caps there is no mention of granting 
anything towards the volume pool to Glance.
  So this is what I currently have configured:

  > client.cinder
  >         key: REACTED
  >         caps: [mgr] profile rbd pool=volumes, profile rbd-read-only 
pool=images
  >         caps: [mon] profile rbd
  >         caps: [osd] profile rbd pool=volumes, profile rbd-read-only 
pool=images
  >
  > client.glance
  >         key: REACTED
  >         caps: [mgr] profile rbd pool=images
  >         caps: [mon] profile rbd
  >         caps: [osd] profile rbd pool=images
  >
  >    client.nova
  >         key: REACTED
  >         caps: [mgr] profile rbd pool=instances, profile rbd pool=images
  >         caps: [mon] profile rbd
  >         caps: [osd] profile rbd pool=instances, profile rbd pool=images
  >

  When granting the glance client e.g. "rbd-read-only" to the volumes pool via:
  >
  > # ceph auth caps client.glance mon 'profile rbd' osd 'profile rbd 
pool=images, profile rbd-read-only pool=volumes' mgr 'profile rbd pool=images, 
profile rbd-read-only pool=volumes'
  >
  the error is gone.
  This is the wrong approach though! Which was established during the 
discussion on the ML:

  
  a) Commit [10] introduced the method "_snapshot_has_external_reference" to 
the yoga
  release to fix [11]. The commit message also briefly states:
  ...

      NOTE: To check this dependency glance osd needs 'read' access to
      cinder and nova side RBD pool.
  ```

  but there is zero mention of this requirement in the release notes for
  Yoga, only for glance_store [13]. Also this (temporary, Yoga only) 
requirement to grant read-only rights to the volumes to Glance
  was never revoked. So likely operators did miss this.

  b) The mentioned method to check for snapshot references was removed again 
with [12], this change was also backported to the 2023.1 release.
  There again was no mention of the change to operators via the release notes, 
who could now remove the read access for volumes from the Glance user again.

  c) For none of the changes a and b there was any update to the actual
  documentation on how to configure the glance user ceph caps.

  d) The "_snapshot_has_external_reference" method is currently just
  dangling and unused [14].

  e) I am still wondering what the caps to allow reading "rbd_children" 
prefixed rados objects is or was used for? Especially with the managed profiles 
such as "rbd" or "rbd-readonly",
  things should be pretty well covered.


  And finally: The Glance documentation at [18] is outdated.


  
  2) DevStack

  I also wondered why there are no unit tests that fail in CI because of this 
[3]?
  Looking at what devstack does at [6] it appears that

  a) it actually applies "allow class-read object_prefix rbd_children",
  which is not what is currently documented in the setup guide(s) (see [7]
  and [2])

  b) it unnecessarily grants read permissions to NOVA_CEPH_POOL ("vms")
  and CINDER_CEPH_POOL ("volumes") also for the Glance user

  c) does NOT use the managed capabilities called "profiles" such as "rbd"
  or "rbd-readonly" instead of raw ACLs such das "rwx", see [9].

  This also differs in the Cinder / Glance documentation and makes a great
  difference as "such privileges include the ability to blocklist other
  client users.", required for lock of stale RBD clients to be removed from 
images, see
  https://docs.ceph.com/en/latest/rbd/rbd-exclusive-locks/#rbd-exclusive-locks.

  
  This might not matter for CI / DevStack environments in itself. But since 
those are used to validate,
  they should at best use the default / documented settings where possible to 
also validate they work.


  3) Cinder

  There seems to be no documented caps when using the ceph-rbd volume
  driver [19].


  
  4) Cinder-Backup

  If cinder-backup is used with the ceph driver [17] a keyring is required 
allowing to create snapshots of volumes (RBD images), which then serve as 
source for backups.
  Also deletion of those snapshots has to be allowed as cinder-backups will 
remove them if they are not needed anymore. While full "profile rbd" access to 
the volume pool works,
  it's likely not required to allow e.g. cinder-backup to modify or even delete 
volumes. Also there could be user snapshots, which cinder-backup also does not 
need to be able to delete.
  Then there are the caps to store and retrieve backups via rbd import / rbd 
import-diff from another pool (potentially on a different cluster).

  There currently seems to be no caps required for cinder-backup that
  are documented in e.g. [17].


  4) Nova

  While there are lots of RBD related options, e.g. for libvirt [8] and
  more ...

   * instance storage (if `images_type=rbd``)
   * volumes
   * interaction with Glance images ([glance] -> enable_rbd_download)

  
  But, there seems to be no list of actually required capabilities and 
recommendations for the various interactions with RBD.


  5) OpenStack-Ansible

  OpenStack-Ansible uses ceph-ansible, but they actively override the keyrings 
and their caps.
  Overriding managed code should really just be a temporary fix (it was done 
for Stein if I read this correctly).
  Those openstack_keys in [15], once the proper caps are defined should be 
converted into a PR towards ceph-ansible [16] to fix things globally there as 
well.

  Likely there are other deployment tools, applying their home-grown set
  of caps and Ceph users/keyrings as there is no references to reply on.



  
  [1] 
https://docs.openstack.org/glance/latest/configuration/configuring.html#configuring-the-rbd-storage-backend
  [2] 
https://docs.ceph.com/en/latest/rbd/rbd-openstack/#setup-ceph-client-authentication
  [3] https://bugs.launchpad.net/glance/+bug/2045158
  [4] Openstack-Ansible: 
https://opendev.org/openstack/openstack-ansible/src/branch/master/inventory/group_vars/all/ceph.yml#L53-L60
  [5] Charm: https://review.opendev.org/q/topic:%22bug/1696073%22 // 
https://bugs.launchpad.net/charm-glance/+bug/1696073
  [6] 
https://opendev.org/openstack/devstack-plugin-ceph/src/commit/4c22c3d0905589d676bf4865ca5cf57994eb426d/devstack/lib/ceph#L712
  [7] 
https://docs.openstack.org/glance/latest/configuration/configuring.html#configuring-the-rbd-storage-backend
  [8] 
https://docs.openstack.org/nova/queens/configuration/config.html#libvirt.rbd_user
  [9] 
https://docs.ceph.com/en/latest/rados/operations/user-management/#authorization-capabilities
  [10] 
https://github.com/openstack/glance_store/commit/3d221ec529862d43ab303644e74ee9ad6ce8cd40
  [11] https://bugs.launchpad.net/glance-store/+bug/1954883
  [12] https://review.opendev.org/q/I34dcd90a09d43127ff2e8b477750c70f3cc01113
  [13] 
https://docs.openstack.org/releasenotes/glance_store/yoga.html#relnotes-3-0-0-stable-yoga
  [14] 
https://opendev.org/openstack/glance_store/src/commit/054bd5ddf5d4d255076bd5f44296f2521e899394/glance_store/_drivers/rbd.py#L455
  [15] 
https://opendev.org/openstack/openstack-ansible/commit/0f92985608c0f6ff941ea0445ae25eab20e94fb4
  [16] 
https://github.com/ceph/ceph-ansible/blob/b6102975549d8f870b0c20a01edda59d6ceac422/group_vars/all.yml.sample#L642
  [17] 
https://docs.openstack.org/cinder/latest/configuration/block-storage/backup/ceph-backup-driver.html
  [18] 
https://docs.openstack.org/glance/latest/configuration/configuring.html#configuring-the-rbd-storage-backend
  [19] 
https://docs.openstack.org/cinder/latest/configuration/block-storage/drivers/ceph-rbd-volume-driver.html

To manage notifications about this bug go to:
https://bugs.launchpad.net/ceph/+bug/2051244/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to     : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

Reply via email to