[ceph-users] Re: Ceph orchestrator not refreshing device list

Bob Gibson Fri, 27 Sep 2024 07:12:52 -0700

Hi Eugen,

Thanks again for taking the time to help us with this.


Here are answers to your questions:

Nothing stands out from the mgr logs. Even when `ceph orch device ls` stops 
reporting, it still shows a claim on the osd in the logs when I run it:

Sep 27 09:39:24 ceph-mon3 bash[476409]: debug 2024-09-27T13:39:24.731+0000 
7fd4dc6fa700  0 [cephadm INFO root] Found osd claims -> {'ceph-osd31': ['88']}
Sep 27 09:39:24 ceph-mon3 bash[476409]: debug 2024-09-27T13:39:24.731+0000 
7fd4dc6fa700  0 log_channel(cephadm) log [INF] : Found osd claims -> 
{'ceph-osd31': ['88']}
Sep 27 09:39:24 ceph-mon3 bash[476409]: debug 2024-09-27T13:39:24.731+0000 
7fd4dc6fa700  0 [cephadm INFO cephadm.services.osd] Found osd claims for 
drivegroup ceph-osd31 -> {'ceph-osd31': ['88']}
Sep 27 09:39:24 ceph-mon3 bash[476409]: debug 2024-09-27T13:39:24.731+0000 
7fd4dc6fa700  0 log_channel(cephadm) log [INF] : Found osd claims for 
drivegroup ceph-osd31 -> {'ceph-osd31': ['88’]}


Here’s a sample of mgr logs right after a mgr failover (I’ve filtered out some 
noise from pgmap, prometheus, pg_autoscaler, balancer, and progress):

Sep 27 09:55:18 ceph-mon3 bash[476409]: debug 2024-09-27T13:55:18.006+0000 
7f8d2f15c700  1 mgr handle_mgr_map Activating!
Sep 27 09:55:18 ceph-mon3 bash[476409]: debug 2024-09-27T13:55:18.006+0000 
7f8d2f15c700  1 mgr handle_mgr_map I am now activating
Sep 27 09:55:18 ceph-mon3 bash[476409]: [27/Sep/2024:13:55:18] ENGINE HTTP 
Server cherrypy._cpwsgi_server.CPWSGIServer(('::', 9283)) shut down
Sep 27 09:55:18 ceph-mon3 bash[476409]: debug 2024-09-27T13:55:18.102+0000 
7f8c4baa7700  0 [cephadm DEBUG root] setting log level based on debug_mgr: INFO 
(2/5)
Sep 27 09:55:18 ceph-mon3 bash[476409]: debug 2024-09-27T13:55:18.202+0000 
7f8c4baa7700  1 mgr load Constructed class from module: cephadm
Sep 27 09:55:18 ceph-mon3 bash[476409]: debug 2024-09-27T13:55:18.206+0000 
7f8c4baa7700  0 [crash DEBUG root] setting log level based on debug_mgr: INFO 
(2/5)
Sep 27 09:55:18 ceph-mon3 bash[476409]: debug 2024-09-27T13:55:18.206+0000 
7f8c4baa7700  1 mgr load Constructed class from module: crash
Sep 27 09:55:18 ceph-mon3 bash[476409]: debug 2024-09-27T13:55:18.222+0000 
7f8c4baa7700  0 [devicehealth DEBUG root] setting log level based on debug_mgr: 
INFO (2/5)
Sep 27 09:55:18 ceph-mon3 bash[476409]: debug 2024-09-27T13:55:18.222+0000 
7f8c4baa7700  1 mgr load Constructed class from module: devicehealth
Sep 27 09:55:18 ceph-mon3 bash[476409]: debug 2024-09-27T13:55:18.222+0000 
7f8c3e28c700  0 [devicehealth INFO root] Starting
Sep 27 09:55:18 ceph-mon3 bash[476409]: debug 2024-09-27T13:55:18.238+0000 
7f8c4baa7700  0 [orchestrator DEBUG root] setting log level based on debug_mgr: 
INFO (2/5)
Sep 27 09:55:18 ceph-mon3 bash[476409]: debug 2024-09-27T13:55:18.242+0000 
7f8c4baa7700  1 mgr load Constructed class from module: orchestrator
Sep 27 09:55:18 ceph-mon3 bash[476409]: debug 2024-09-27T13:55:18.318+0000 
7f8c4baa7700  0 [rbd_support DEBUG root] setting log level based on debug_mgr: 
INFO (2/5)
Sep 27 09:55:18 ceph-mon3 bash[476409]: [27/Sep/2024:13:55:18] ENGINE Bus 
STARTING
Sep 27 09:55:18 ceph-mon3 bash[476409]: debug 2024-09-27T13:55:18.346+0000 
7f8c31272700  0 [rbd_support INFO root] recovery thread starting
Sep 27 09:55:18 ceph-mon3 bash[476409]: debug 2024-09-27T13:55:18.346+0000 
7f8c31272700  0 [rbd_support INFO root] starting setup
Sep 27 09:55:18 ceph-mon3 bash[476409]: debug 2024-09-27T13:55:18.354+0000 
7f8c4baa7700  1 mgr load Constructed class from module: rbd_support
Sep 27 09:55:18 ceph-mon3 bash[476409]: debug 2024-09-27T13:55:18.358+0000 
7f8c31272700  0 [rbd_support INFO root] MirrorSnapshotScheduleHandler: 
load_schedules
Sep 27 09:55:18 ceph-mon3 bash[476409]: debug 2024-09-27T13:55:18.370+0000 
7f8c31272700  0 [rbd_support INFO root] load_schedules: rbd, start_after=
Sep 27 09:55:18 ceph-mon3 bash[476409]: debug 2024-09-27T13:55:18.374+0000 
7f8c4baa7700  0 [status DEBUG root] setting log level based on debug_mgr: INFO 
(2/5)
Sep 27 09:55:18 ceph-mon3 bash[476409]: debug 2024-09-27T13:55:18.374+0000 
7f8c4baa7700  1 mgr load Constructed class from module: status
Sep 27 09:55:18 ceph-mon3 bash[476409]: debug 2024-09-27T13:55:18.378+0000 
7f8c4baa7700  0 [telemetry DEBUG root] setting log level based on debug_mgr: 
INFO (2/5)
Sep 27 09:55:18 ceph-mon3 bash[476409]: debug 2024-09-27T13:55:18.378+0000 
7f8c4baa7700  1 mgr load Constructed class from module: telemetry
Sep 27 09:55:18 ceph-mon3 bash[476409]: debug 2024-09-27T13:55:18.382+0000 
7f8c4baa7700  0 [volumes DEBUG root] setting log level based on debug_mgr: INFO 
(2/5)
Sep 27 09:55:18 ceph-mon3 bash[476409]: debug 2024-09-27T13:55:18.386+0000 
7f8c31272700  0 [rbd_support INFO root] load_schedules: images, start_after=
Sep 27 09:55:18 ceph-mon3 bash[476409]: debug 2024-09-27T13:55:18.390+0000 
7f8c31272700  0 [rbd_support INFO root] load_schedules: volumes, start_after=
Sep 27 09:55:18 ceph-mon3 bash[476409]: debug 2024-09-27T13:55:18.394+0000 
7f8c31272700  0 [rbd_support INFO root] load_schedules: vms, start_after=
Sep 27 09:55:18 ceph-mon3 bash[476409]: debug 2024-09-27T13:55:18.398+0000 
7f8c31272700  0 [rbd_support INFO root] load_schedules: backups, start_after=
Sep 27 09:55:18 ceph-mon3 bash[476409]: debug 2024-09-27T13:55:18.402+0000 
7f8c21252700  0 [rbd_support INFO root] MirrorSnapshotScheduleHandler: starting
Sep 27 09:55:18 ceph-mon3 bash[476409]: debug 2024-09-27T13:55:18.402+0000 
7f8c1fa4f700  0 [rbd_support INFO root] PerfHandler: starting
Sep 27 09:55:18 ceph-mon3 bash[476409]: debug 2024-09-27T13:55:18.422+0000 
7f8c31272700  0 [rbd_support INFO root] load_task_task: rbd, start_after=
Sep 27 09:55:18 ceph-mon3 bash[476409]: debug 2024-09-27T13:55:18.430+0000 
7f8c4baa7700  1 mgr load Constructed class from module: volumes
Sep 27 09:55:18 ceph-mon3 bash[476409]: debug 2024-09-27T13:55:18.434+0000 
7f8c31272700  0 [rbd_support INFO root] load_task_task: images, start_after=
Sep 27 09:55:18 ceph-mon3 bash[476409]: debug 2024-09-27T13:55:18.446+0000 
7f8c31272700  0 [rbd_support INFO root] load_task_task: volumes, start_after=
Sep 27 09:55:18 ceph-mon3 bash[476409]: debug 2024-09-27T13:55:18.458+0000 
7f8c31272700  0 [rbd_support INFO root] load_task_task: vms, start_after=
Sep 27 09:55:18 ceph-mon3 bash[476409]: debug 2024-09-27T13:55:18.870+0000 
7f8c3e28c700  0 [devicehealth INFO root] Check health
Sep 27 09:55:18 ceph-mon3 bash[476409]: debug 2024-09-27T13:55:18.874+0000 
7f8c31272700  0 [rbd_support INFO root] load_task_task: backups, start_after=
Sep 27 09:55:18 ceph-mon3 bash[476409]: [27/Sep/2024:13:55:18] ENGINE Serving 
on http://:::9283
Sep 27 09:55:18 ceph-mon3 bash[476409]: debug 2024-09-27T13:55:18.914+0000 
7f8c179bf700  0 [rbd_support INFO root] TaskHandler: starting
Sep 27 09:55:19 ceph-mon3 bash[476409]: debug 2024-09-27T13:55:19.106+0000 
7f8c31272700  0 [rbd_support INFO root] TrashPurgeScheduleHandler: 
load_schedules
Sep 27 09:55:19 ceph-mon3 bash[476409]: debug 2024-09-27T13:55:19.122+0000 
7f8c31272700  0 [rbd_support INFO root] load_schedules: rbd, start_after=
Sep 27 09:55:19 ceph-mon3 bash[476409]: debug 2024-09-27T13:55:19.126+0000 
7f8c31272700  0 [rbd_support INFO root] load_schedules: images, start_after=
Sep 27 09:55:19 ceph-mon3 bash[476409]: debug 2024-09-27T13:55:19.134+0000 
7f8c31272700  0 [rbd_support INFO root] load_schedules: volumes, start_after=
Sep 27 09:55:19 ceph-mon3 bash[476409]: debug 2024-09-27T13:55:19.138+0000 
7f8c31272700  0 [rbd_support INFO root] load_schedules: vms, start_after=
Sep 27 09:55:19 ceph-mon3 bash[476409]: debug 2024-09-27T13:55:19.146+0000 
7f8c31272700  0 [rbd_support INFO root] load_schedules: backups, start_after=
Sep 27 09:55:19 ceph-mon3 bash[476409]: debug 2024-09-27T13:55:19.150+0000 
7f8c171be700  0 [rbd_support INFO root] TrashPurgeScheduleHandler: starting
Sep 27 09:55:19 ceph-mon3 bash[476409]: debug 2024-09-27T13:55:19.150+0000 
7f8c31272700  0 [rbd_support INFO root] setup complete
Sep 27 09:55:19 ceph-mon3 bash[476409]: debug 2024-09-27T13:55:19.638+0000 
7f8c43a97700  0 [cephadm INFO cherrypy.error] [27/Sep/2024:13:55:19] ENGINE Bus 
STARTING
Sep 27 09:55:19 ceph-mon3 bash[476409]: debug 2024-09-27T13:55:19.638+0000 
7f8c43a97700  0 log_channel(cephadm) log [INF] : [27/Sep/2024:13:55:19] ENGINE 
Bus STARTING
Sep 27 09:55:19 ceph-mon3 bash[476409]: debug 2024-09-27T13:55:19.770+0000 
7f8c43a97700  0 [cephadm INFO cherrypy.error] [27/Sep/2024:13:55:19] ENGINE 
Serving on https://10.5.74.23:7150
Sep 27 09:55:19 ceph-mon3 bash[476409]: debug 2024-09-27T13:55:19.770+0000 
7f8c43a97700  0 log_channel(cephadm) log [INF] : [27/Sep/2024:13:55:19] ENGINE 
Serving on https://10.5.74.23:7150
Sep 27 09:55:19 ceph-mon3 bash[476409]: debug 2024-09-27T13:55:19.770+0000 
7f8c43a97700  0 [cephadm INFO cherrypy.error] [27/Sep/2024:13:55:19] ENGINE Bus 
STARTED
Sep 27 09:55:19 ceph-mon3 bash[476409]: debug 2024-09-27T13:55:19.770+0000 
7f8c43a97700  0 log_channel(cephadm) log [INF] : [27/Sep/2024:13:55:19] ENGINE 
Bus STARTED
Sep 27 09:55:28 ceph-mon3 bash[476409]: debug 2024-09-27T13:55:28.030+0000 
7f8c4b2a6700  2 mgr.server handle_open ignoring open from mgr.ceph-mon1 
10.5.74.21:0/3328700921; not ready for session (expect reconnect)
Sep 27 09:55:29 ceph-mon3 bash[476409]: debug 2024-09-27T13:55:29.030+0000 
7f8c4b2a6700  2 mgr.server handle_open ignoring open from mgr.ceph-mon1 
10.5.74.21:0/3328700921; not ready for session (expect reconnect)
Sep 27 09:55:36 ceph-mon3 bash[476409]: debug 2024-09-27T13:55:36.410+0000 
7f8c42294700  0 [cephadm INFO root] Found osd claims -> {'ceph-osd31': ['88']}
Sep 27 09:55:36 ceph-mon3 bash[476409]: debug 2024-09-27T13:55:36.410+0000 
7f8c42294700  0 log_channel(cephadm) log [INF] : Found osd claims -> 
{'ceph-osd31': ['88']}
Sep 27 09:55:36 ceph-mon3 bash[476409]: debug 2024-09-27T13:55:36.410+0000 
7f8c42294700  0 [cephadm INFO cephadm.services.osd] Found osd claims for 
drivegroup ceph-osd31 -> {'ceph-osd31': ['88']}
Sep 27 09:55:36 ceph-mon3 bash[476409]: debug 2024-09-27T13:55:36.414+0000 
7f8c42294700  0 log_channel(cephadm) log [INF] : Found osd claims for 
drivegroup ceph-osd31 -> {'ceph-osd31': ['88']}
Sep 27 09:55:36 ceph-mon3 bash[476409]: debug 2024-09-27T13:55:36.426+0000 
7f8c42294700  0 [cephadm INFO root] Found osd claims -> {'ceph-osd31': ['88']}
Sep 27 09:55:36 ceph-mon3 bash[476409]: debug 2024-09-27T13:55:36.426+0000 
7f8c42294700  0 log_channel(cephadm) log [INF] : Found osd claims -> 
{'ceph-osd31': ['88']}


`ceph orch osd rm status` reports "No OSD remove/replace operations reported”

On the osd node:

`ceph-volume inventory --list-all --with-lsm /dev/sdg` reports:

====== Device report /dev/sdg ======

     path                      /dev/sdg
     ceph device               False
     lsm data                  {}
     available                 False
     rejected reasons          Insufficient space (<5GB)
     device id                 INTEL_SSDSC2KG038T8_PHYG039600UB3P8EGN

vs. a drive which has been successfully converted:

====== Device report /dev/sdf ======

     path                      /dev/sdf
     ceph device               True
     lsm data                  {}
     available                 False
     rejected reasons          Has a FileSystem, LVM detected, Insufficient 
space (<10 extents) on vgs, Insufficient space (<5GB)
     device id                 INTEL_SSDSC2KG038T8_PHYG039600CM3P8EGN
    --- Logical Volume ---
     name                      osd-block-88426db7-2322-4807-ac2e-b49929e170d6
     osd id                    87
     cluster name              ceph
     type                      block
     osd fsid                  88426db7-2322-4807-ac2e-b49929e170d6
     cluster fsid              9b3b3539-59a9-4338-8bab-3badfab6e855
     osdspec affinity          ceph-osd31
     block uuid                LNG2gB-pa0w-gl2v-hVQ3-6qTd-aXsR-Lenri3

We’ve zapped /dev/sdg a few times, initially when we ran the command to fail it 
out (`ceph orch osd rm 88 --replace —zap`), but also from the osd node itself  
with `ceph-volume lvm zap /dev/sdg —destroy`. We’ve also zapped the drive 
manually with:


sgdisk --zap-all /dev/sdg
wipefs --all --force /dev/sdg
dd if=/dev/zero bs=1M count=100 oflag=direct of=/dev/sdg
dd bs=512 if=/dev/zero of=/dev/sdg oflag=direct count=204800 seek=$(($(blockdev 
--getsz /dev/sdg) - 204800))
partprobe /dev/sdg

… based on the suggestion here: https://github.com/rook/rook/issues/11474

We’ve rebooted the osd node, with and without the drive inserted.

We’re unable to zap the drive from the orchestrator, as expected:

ceph orch device zap ceph-osd31 /dev/sdg --force
Error EINVAL: Device path '/dev/sdg' not found on host 'ceph-osd31’

Cheers,
/rjg

On Sep 27, 2024, at 2:07 AM, Eugen Block <ebl...@nde.ag> wrote:

EXTERNAL EMAIL | USE CAUTION

Right, if you need encryption, a rebuild is required. Your procedure
has already worked 4 times, so I'd say nothing seems wrong with that
per se.
Regarding the stuck device list, do you see the mgr logging anything
suspicious? Especially when you say that it only returns output after
a failover. Those two osd specs are not conflicting since the first is
"unmanaged" after adoption.
Is there something in 'ceph orch osd rm status'? Can you run 'cephadm
ceph-volume inventory' locally on that node? Do you see any hints in
the node's syslog? Maybe try a reboot or something?


Zitat von Bob Gibson <r...@oicr.on.ca>:

Thanks for your reply Eugen. I’m fairly new to cephadm so I wasn’t
aware that we could manage the drives without rebuilding them.
However, we thought we’d take advantage of this opportunity to also
encrypt the drives, and that does require a rebuild.

I have a theory on why the orchestrator is confused. I want to
create an osd service for each osd node so I can manage drives on a
per-node basis.

I started by creating a spec for the first node:

service_type: osd
service_id: ceph-osd31
placement:
 hosts:
 - ceph-osd31
spec:
 data_devices:
   rotational: 0
   size: '3TB:'
 encrypted: true
 filter_logic: AND
 objectstore: bluestore

But I also see a default spec, “osd”, which has placement set to “unmanaged”.

`ceph orch ls osd —export` shows the following:

service_type: osd
service_name: osd
unmanaged: true
spec:
 filter_logic: AND
 objectstore: bluestore
---
service_type: osd
service_id: ceph-osd31
service_name: osd.ceph-osd31
placement:
 hosts:
 - ceph-osd31
spec:
 data_devices:
   rotational: 0
   size: '3TB:'
 encrypted: true
 filter_logic: AND
 objectstore: bluestore

`ceph orch ls osd` shows that I was able to convert 4 drives using my spec:

NAME            PORTS  RUNNING  REFRESHED  AGE  PLACEMENT
osd                         95  10m ago    -    <unmanaged>
osd.ceph-osd31               4  10m ago    43m  ceph-osd31

Despite being able to convert 4 drives, I’m wondering if these specs
are conflicting with one another, and that has confused the
orchestrator. If so, how do I safely get from where I am now to
where I want to be? :-)

Cheers,
/rjg

On Sep 26, 2024, at 3:31 PM, Eugen Block <ebl...@nde.ag> wrote:

EXTERNAL EMAIL | USE CAUTION

Hi,

this seems a bit unnecessary to rebuild OSDs just to get them managed.
If you apply a spec file that targets your hosts/OSDs, they will
appear as managed. So when you would need to replace a drive, you
could already utilize the orchestrator to remove and zap the drive.
That works just fine.
How to get out of your current situation is not entirely clear to me
yet. I’ll reread your post tomorrow.

Regards,
Eugen

Zitat von Bob Gibson <r...@oicr.on.ca>:

Hi,

We recently converted a legacy cluster running Quincy v17.2.7 to
cephadm. The conversion went smoothly and left all osds unmanaged by
the orchestrator as expected. We’re now in the process of converting
the osds to be managed by the orchestrator. We successfully
converted a few of them, but then the orchestrator somehow got
confused. `ceph health detail` reports a “stray daemon” for the osd
we’re trying to convert, and the orchestrator is unable to refresh
its device list so it doesn’t see any available devices.

From the perspective of the osd node, the osd has been wiped and is
ready to be reinstalled. We’ve also rebooted the node for good
measure. `ceph osd tree` shows that the osd has been destroyed, but
the orchestrator won’t reinstall it because it thinks the device is
still active. The orchestrator device information is stale, but
we’re unable to refresh it. The usual recommended workaround of
failing over the mgr hasn’t helped. We’ve also tried `ceph orch
device ls —refresh` to no avail. In fact after running that command
subsequent runs of `ceph orch device ls` produce no output until the
mgr is failed over again.

Is there a way to force the orchestrator to refresh its list of
devices when in this state? If not, can anyone offer any suggestions
on how to fix this problem?

Cheers,
/rjg

P.S. Some additional information in case it’s helpful...

We’re using the following command to replace existing devices so
that they’re managed by the orchestrator:

```
ceph orch osd rm <osd> --replace —zap
```

and we’re currently stuck on osd 88.

```
ceph health detail
HEALTH_WARN 1 stray daemon(s) not managed by cephadm
[WRN] CEPHADM_STRAY_DAEMON: 1 stray daemon(s) not managed by cephadm
  stray daemon osd.88 on host ceph-osd31 not managed by cephadm
```

`ceph osd tree` shows that the osd has been destroyed and is ready
to be replaced:

```
ceph osd tree-from ceph-osd31
ID   CLASS  WEIGHT    TYPE NAME        STATUS     REWEIGHT  PRI-AFF
-46         34.93088  host ceph-osd31
84    ssd   3.49309      osd.84              up   1.00000  1.00000
85    ssd   3.49309      osd.85              up   1.00000  1.00000
86    ssd   3.49309      osd.86              up   1.00000  1.00000
87    ssd   3.49309      osd.87              up   1.00000  1.00000
88    ssd   3.49309      osd.88       destroyed         0  1.00000
89    ssd   3.49309      osd.89              up   1.00000  1.00000
90    ssd   3.49309      osd.90              up   1.00000  1.00000
91    ssd   3.49309      osd.91              up   1.00000  1.00000
92    ssd   3.49309      osd.92              up   1.00000  1.00000
93    ssd   3.49309      osd.93              up   1.00000  1.00000
```

The cephadm log shows a claim on node `ceph-osd31` for that osd:

```
2024-09-25T14:15:45.699348-0400 mgr.ceph-mon3.qzjgws [INF] Found osd
claims -> {'ceph-osd31': ['88']}
2024-09-25T14:15:45.699534-0400 mgr.ceph-mon3.qzjgws [INF] Found osd
claims for drivegroup ceph-osd31 -> {'ceph-osd31': ['88']}
```

`ceph orch device ls` shows that the device list isn’t refreshing:

```
ceph orch device ls ceph-osd31
HOST        PATH      TYPE  DEVICE ID
SIZE  AVAILABLE  REFRESHED  REJECT REASONS
ceph-osd31  /dev/sdc  ssd   INTEL_SSDSC2KG038T8_PHYG039603PE3P8EGN
3576G  No         22h ago    Insufficient space (<10 extents) on
vgs, LVM detected, locked
ceph-osd31  /dev/sdd  ssd   INTEL_SSDSC2KG038T8_PHYG039600AY3P8EGN
3576G  No         22h ago    Insufficient space (<10 extents) on
vgs, LVM detected, locked
ceph-osd31  /dev/sde  ssd   INTEL_SSDSC2KG038T8_PHYG039600CW3P8EGN
3576G  No         22h ago    Insufficient space (<10 extents) on
vgs, LVM detected, locked
ceph-osd31  /dev/sdf  ssd   INTEL_SSDSC2KG038T8_PHYG039600CM3P8EGN
3576G  No         22h ago    Insufficient space (<10 extents) on
vgs, LVM detected, locked
ceph-osd31  /dev/sdg  ssd   INTEL_SSDSC2KG038T8_PHYG039600UB3P8EGN
3576G  No         22h ago    Insufficient space (<10 extents) on
vgs, LVM detected, locked
ceph-osd31  /dev/sdh  ssd   INTEL_SSDSC2KG038T8_PHYG039603753P8EGN
3576G  No         22h ago    Insufficient space (<10 extents) on
vgs, LVM detected, locked
ceph-osd31  /dev/sdi  ssd   INTEL_SSDSC2KG038T8_PHYG039603R63P8EGN
3576G  No         22h ago    Insufficient space (<10 extents) on
vgs, LVM detected, locked
ceph-osd31  /dev/sdj  ssd   INTEL_SSDSC2KG038TZ_PHYJ4011032M3P8DGN
3576G  No         22h ago    Insufficient space (<10 extents) on
vgs, LVM detected, locked
ceph-osd31  /dev/sdk  ssd   INTEL_SSDSC2KG038TZ_PHYJ3234010J3P8DGN
3576G  No         22h ago    Insufficient space (<10 extents) on
vgs, LVM detected, locked
ceph-osd31  /dev/sdl  ssd   INTEL_SSDSC2KG038T8_PHYG039603NS3P8EGN
3576G  No         22h ago    Insufficient space (<10 extents) on
vgs, LVM detected, locked
```

`ceph node ls` thinks the osd still exists

```
ceph node ls osd | jq -r '."ceph-osd31"'
[
84,
85,
86,
87,
88, <— this shouldn’t exist
89,
90,
91,
92,
93
]
```

Each osd node has 10x 3.8 TB ssd drives for osds. On `ceph-osd31`,
cephadm doesn’t see osd.88 as expected:

```
cephadm ls --no-detail
[
  {
      "style": "cephadm:v1",
      "name": "osd.93",
      "fsid": "9b3b3539-59a9-4338-8bab-3badfab6e855",
      "systemd_unit": "ceph-9b3b3539-59a9-4338-8bab-3badfab6e855@osd.93"
  },
  {
      "style": "cephadm:v1",
      "name": "osd.85",
      "fsid": "9b3b3539-59a9-4338-8bab-3badfab6e855",
      "systemd_unit": "ceph-9b3b3539-59a9-4338-8bab-3badfab6e855@osd.85"
  },
  {
      "style": "cephadm:v1",
      "name": "osd.90",
      "fsid": "9b3b3539-59a9-4338-8bab-3badfab6e855",
      "systemd_unit": "ceph-9b3b3539-59a9-4338-8bab-3badfab6e855@osd.90"
  },
  {
      "style": "cephadm:v1",
      "name": "osd.92",
      "fsid": "9b3b3539-59a9-4338-8bab-3badfab6e855",
      "systemd_unit": "ceph-9b3b3539-59a9-4338-8bab-3badfab6e855@osd.92"
  },
  {
      "style": "cephadm:v1",
      "name": "osd.89",
      "fsid": "9b3b3539-59a9-4338-8bab-3badfab6e855",
      "systemd_unit": "ceph-9b3b3539-59a9-4338-8bab-3badfab6e855@osd.89"
  },
  {
      "style": "cephadm:v1",
      "name": "osd.87",
      "fsid": "9b3b3539-59a9-4338-8bab-3badfab6e855",
      "systemd_unit": "ceph-9b3b3539-59a9-4338-8bab-3badfab6e855@osd.87"
  },
  {
      "style": "cephadm:v1",
      "name": "osd.86",
      "fsid": "9b3b3539-59a9-4338-8bab-3badfab6e855",
      "systemd_unit": "ceph-9b3b3539-59a9-4338-8bab-3badfab6e855@osd.86"
  },
  {
      "style": "cephadm:v1",
      "name": "osd.84",
      "fsid": "9b3b3539-59a9-4338-8bab-3badfab6e855",
      "systemd_unit": "ceph-9b3b3539-59a9-4338-8bab-3badfab6e855@osd.84"
  },
  {
      "style": "cephadm:v1",
      "name": "osd.91",
      "fsid": "9b3b3539-59a9-4338-8bab-3badfab6e855",
      "systemd_unit": "ceph-9b3b3539-59a9-4338-8bab-3badfab6e855@osd.91"
  }
]
```

`lsblk` shows that `/dev/sdg` has been wiped.

```
NAME
                               MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINT
sda
                                 8:0    0 223.6G  0 disk
|-sda1
                                 8:1    0    94M  0 part
`-sda2
                                 8:2    0 223.5G  0 part
`-md0
                                 9:0    0 223.4G  0 raid1 /
sdb
                                 8:16   0 223.6G  0 disk
|-sdb1
                                 8:17   0    94M  0 part
`-sdb2
                                 8:18   0 223.5G  0 part
`-md0
                                 9:0    0 223.4G  0 raid1 /
sdc
                                 8:32   1   3.5T  0 disk
`-ceph--03782b4c--9faa--49f5--b554--98e7b8515834-osd--block--ba272724--daa6--45f5--9f69--789cc0bda077
 253:3    0   3.5T
lvm
`-keCkP2-o6h8-jKkw-RKiD-UBFf-A8EL-JDJGPR
                               253:9    0   3.5T  0 crypt
sdd
                                 8:48   1   3.5T  0 disk
`-ceph--c07907d8--4a75--4ba3--b5e1--2ebf49ecbdf6-osd--block--58d1d50d--6228--4e6f--9a52--2a305ba00700
 253:7    0   3.5T
lvm
`-WB8Mxn-qCHI-4T01-imiG-hNBR-by60-YuxgfD
                               253:11   0   3.5T  0 crypt
sde
                                 8:64   1   3.5T  0 disk
`-ceph--6f9d4df4--7ce6--44a4--a7b1--62c85af8cfe0-osd--block--aabcb30d--0084--490a--969b--78f7af6e94da
 253:8    0   3.5T
lvm
`-g9qErH-vTXY-JQbs-eh61-W0Mn-TAV8-gof4zy
                               253:12   0   3.5T  0 crypt
sdf
                                 8:80   1   3.5T  0 disk
`-ceph--d6b728f8--e365--46db--b30f--6c00805c752b-osd--block--88426db7--2322--4807--ac2e--b49929e170d6
 253:6    0   3.5T
lvm
`-LNG2gB-pa0w-gl2v-hVQ3-6qTd-aXsR-Lenri3
                               253:10   0   3.5T  0 crypt
sdg
                                 8:96   1   3.5T  0 disk
sdh
                                 8:112  1   3.5T  0 disk
`-ceph--de2cfee6--8e0a--4aa0--9e6b--90c09025768c-osd--block--a3b86251--2799--4243--a857--f218fa90f29a
 253:2    0   3.5T
lvm
sdi
                                 8:128  1   3.5T  0 disk
`-ceph--30dee450--0fdd--46ea--9eec--6a4c7706df9c-osd--block--bfc090db--dde4--47dd--a1c9--1cd838ea43b3
 253:4    0   3.5T
lvm
sdj
                                 8:144  1   3.5T  0 disk
`-ceph--78febcf5--43f4--4820--8dc7--0f6c22816c9f-osd--block--da1e69c7--6427--4562--8290--90bcb9526747
 253:0    0   3.5T
lvm
sdk
                                 8:160  1   3.5T  0 disk
`-ceph--fe210281--b1f5--4d5e--9ab0--2f226612af00-osd--block--6bb9f308--e853--4303--83ea--553c3a3513e1
 253:1    0   3.5T
lvm
sdl
                                 8:176  1   3.5T  0 disk
`-ceph--9f21c916--f211--4d1b--8214--6ad1cecac810-osd--block--572d850c--c201--4af4--ac42--0ed2a6ed73ed
 253:5    0   3.5T
lvm
```

_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io




_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Ceph orchestrator not refreshing device list

Reply via email to