Le 2025-08-21 12:59, Michel Jouvin a écrit :
Hi Gilles,

As for me, the timeouts you experience repeatedly since your first tests are not normal at all and should be fixed before anything else. Ceph, because of its very distributed architecture, is very sensitive to network problems and I'm afraid you cannot expect it to work properly until you have a reliable network connection.

I'm not so sure it's related. Here, It seems clearly either a bug, a misconfiguration or a lack of zapping DB LV devices...

I tend to remember that you said your cluster was deployed on a virtualized infrastructure. Are you sure that there is not something in the virtualization layer that prevents network connections to work properly?

No, This is a baremetal cluster, 7 Dell R730XD with 4 10GB fibre ports, connected to two Cisco n3k, 2 bonds LACP L3+4. I use a separate cluster network, even if nowadays it's not really needed, but I want to be close of our actual Porduction clusters.

Network seems fine.
I don't see OSD flapping.

HDD are quite fast, I see the automatic benchmarks saying that it seems to high for a rotationnal media ~600 IOPS, instead of 50-500 IOPS. They are SAS drives on a PERC controler, but configured as passthrough, no RAID, no cache.
The SSD are SAS TOSHIBA Enterprises.
Thoses servers were used in a VMware vSAN environnement (with more SSD).

Thank you,

Best regards,

Michel
Sent from my mobile
Le 21 août 2025 12:02:02 Gilles Mocellin <gilles.mocel...@nuagelibre.org> a écrit :

Hi,

Having timeout problems with my cluster, I try to finsh, recreate OSDs
that failed.

My config is hybrid with 17 HDD and 1 SSD by server.
My OSD spec is the standard :

service_type: osd
service_id: throughput_optimized
service_name: osd.throughput_optimized
placement:
  host_pattern: '*'
spec:
  data_devices:
    rotational: 1
  db_devices:
    rotational: 0
  objectstore: bluestore
  encrypted: true
  filter_logic: AND

When I remove an OSD with :
ceph orch osd rm ID --zap

It is finaly re-created, but without DB device.
And I can see the ceph-volume commands :

Zapping :
cephadm ['--image',
'quay.io/ceph/ceph@sha256:7c69e59beaeea61ca714e71cb84ff6d5e533db7f1fd84143dd9ba6649a5fd2ec',
'--timeout', '2395', 'ceph-volume', '--fsid',
'8ec7575a-7de5-11f0-a78a-246e96bd90a4', '--', 'lvm', 'zap', '--osd-id',
'82', '--destroy']

Recreate :
cephadm ['--env', 'CEPH_VOLUME_OSDSPEC_AFFINITY=throughput_optimized',
'--image',
'quay.io/ceph/ceph@sha256:7c69e59beaeea61ca714e71cb84ff6d5e533db7f1fd84143dd9ba6649a5fd2ec',
'--timeout', '2395', 'ceph-volume', '--fsid',
'8ec7575a-7de5-11f0-a78a-246e96bd90a4', '--config-json', '-', '--',
'lvm', 'batch', '--no-auto', '/dev/sde', '--dmcrypt', '--osd-ids', '82',
'--yes', '--no-systemd']

I think it should have the argument --db-devices ?
I can see that option when the OSD spec was applied, it launch for all
drives on the host :

cephadm ['--env', 'CEPH_VOLUME_OSDSPEC_AFFINITY=throughput_optimized',
'--image',
'quay.io/ceph/ceph@sha256:7c69e59beaeea61ca714e71cb84ff6d5e533db7f1fd84143dd9ba6649a5fd2ec',
'--timeout', '2395', 'ceph-volume', '--fsid',
'8ec7575a-7de5-11f0-a78a-246e96bd90a4', '--config-json', '-', '--',
'lvm', 'batch', '--no-auto', '/dev/sda', '/dev/sdb', '/dev/sdc',
'/dev/sdd', '/dev/sde', '/dev/sdf', '/dev/sdg', '/dev/sdh', '/dev/sdi', '/dev/sdj', '/dev/sdk', '/dev/sdl', '/dev/sdm', '/dev/sdn', '/dev/sdo',
'/dev/sdp', '/dev/sdq', '--db-devices', '/dev/sdr', '--dmcrypt',
'--yes', '--no-systemd']

There is the --db-devices argument.

Is it a bug or I do it wrong ?

I have several OSD that are now without DB device, I want to recreate
them with the DB device.
And also, I want that every time a new drive is created, it will be with a DB device, or fails if no room, but not create without DB device if my
OSD SPEC tells it.
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to