[ceph-users] Re: Reef: cephadm tries to apply specs frequently

Eugen Block Fri, 18 Jul 2025 03:28:15 -0700

For now I set the service to "unmanaged" to prevent further logflooding. But I would still like to know why the cache is not updatedproperly.


Zitat von Eugen Block <ebl...@nde.ag>:

Good morning,
I noticed something strange on a 18.2.7 cluster, running on Ubuntu22.04, deployed by cephadm. There are 10 hosts in total, 5 of themare all-flash and those aren't affected. The other 5 hosts arehdd-only, and only 4 of those are affected:
The /var/log/ceph/{FSID}/ceph-volume.log is flooded with attempts toapply the osd spec:
[2025-07-17 05:40:01,994][ceph_volume.main][INFO ] Running command:ceph-volume lvm batch --no-auto /dev/sdb /dev/sdc /dev/sdd /dev/sde/dev/sdf /dev/sdg /dev/sdh /dev/sdi /dev/sdj /dev/sdk /dev/sdl/dev/sdm /dev/sdn /dev/sdo /dev/sdp /dev/sdq /dev/sdr /dev/sds/dev/sdt /dev/sdu /dev/sdv /dev/sdw --yes --no-systemd[2025-07-17 05:42:00,216][ceph_volume.main][INFO ] Running command:ceph-volume lvm batch --no-auto /dev/sdb /dev/sdc /dev/sdd /dev/sde/dev/sdf /dev/sdg /dev/sdh /dev/sdi /dev/sdj /dev/sdk /dev/sdl/dev/sdm /dev/sdn /dev/sdo /dev/sdp /dev/sdq /dev/sdr /dev/sds/dev/sdt /dev/sdu /dev/sdv /dev/sdw --yes --no-systemd[2025-07-17 05:43:50,521][ceph_volume.main][INFO ] Running command:ceph-volume lvm batch --no-auto /dev/sdb /dev/sdc /dev/sdd /dev/sde/dev/sdf /dev/sdg /dev/sdh /dev/sdi /dev/sdj /dev/sdk /dev/sdl/dev/sdm /dev/sdn /dev/sdo /dev/sdp /dev/sdq /dev/sdr /dev/sds/dev/sdt /dev/sdu /dev/sdv /dev/sdw --yes --no-systemd
So that log file alone grows to more than 1 GB per day, as aconsequence other logs like syslog grow as well.
But for some reason, storage08 is skipped, the mgr reports:
[cephadm DEBUG root] skipping apply of storage08 onDriveGroupSpec.from_json(yaml.safe_load('''service_type: osd
This is the current hdd-only spec:

# ceph orch ls osd --export
service_type: osd
service_id: hdd-only
service_name: osd.hdd-only
placement:
  hosts:
  - storage06
  - storage07
  - storage08
  - storage09
  - storage10
spec:
  data_devices:
    rotational: 1
    size: '1T:'
  filter_logic: AND
  objectstore: bluestore
I verified that all disks are deployed as OSDs, so there's no orphanlying around or anything. I failed the mgr (of course), rebooted onehost because storage08 was recently rebooted as well. I don't knowfor how long this has been going on, unfortunately.
So I started to look at the code [0], [1], it seems like the mgrcache is not properly updated:
if not self.mgr.cache.osdspec_needs_apply(host, drive_group):
self.mgr.log.debug("skipping apply of %s on %s (nochange)" % (
                    host, drive_group))


So I looked at all these values:

    def osdspec_needs_apply(self, host: str, spec: ServiceSpec) -> bool:
        if (
            host not in self.devices
            or host not in self.last_device_change
            or host not in self.last_device_update
            or host not in self.osdspec_last_applied
            or spec.service_name() not in self.osdspec_last_applied[host]
        ):

but all keys are populated with similar values as on storage08, for example:
root@storage01:~# ceph config-key get mgr/cephadm/host.storage10 |jq -r '.last_device_change,.last_device_update,.osdspec_last_applied'
2025-02-12T16:27:21.979015Z
2025-07-17T05:35:18.852618Z
{
  "osd.hdd-only": "2025-07-17T06:03:38.860971Z"
}
root@storage01:~# ceph config-key get mgr/cephadm/host.storage08 |jq -r '.last_device_change,.last_device_update,.osdspec_last_applied'
2025-03-11T08:23:02.851969Z
2025-07-17T05:43:47.521682Z
{
  "osd.hdd-only": "2025-03-11T08:23:21.494004Z"
}


Can anyone make sense of it? I'd appreciate any pointers!

Thanks!
Eugen
[0]https://github.com/ceph/ceph/blob/v18.2.7/src/pybind/mgr/cephadm/services/osd.py#L42[1]https://github.com/ceph/ceph/blob/v18.2.7/src/pybind/mgr/cephadm/inventory.py#L1316



_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Reef: cephadm tries to apply specs frequently

Reply via email to