date:20240927

[ceph-users] Re: Restore a pool from snapshot

2024-09-27 Thread Eugen Block


Hi,

it's been a while since I last looked into this, but as far as I know,  
you'd have to iterate over each object in the pool to restore it from  
your snapshot. There's no option to restore all of them with one  
command.


Regards,
Eugen

Zitat von Pavel Kaygorodov :


Hi!

May be a dumb question, sorry, but how I can restore a whole pool  
from a snapshot?
I have made a snapshot with 'rados mksnap', but there is no command  
to restore a whole snapshot, only one object may be specified for  
rollback. Is it possible to restore all objects?


Thanks in advance,
  Pavel.

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Mds daemon damaged - assert failed

2024-09-27 Thread Frédéric Nass

Hi George,

Looks like you hit this one [1]. Can't find the fix [2] in Reef release notes 
[3]. You'll have to cherry pick it and build sources or wait for it to come to 
next build.

Regards,
Frédéric.

[1] https://tracker.ceph.com/issues/58878
[2] https://github.com/ceph/ceph/pull/55265
[3] https://docs.ceph.com/en/latest/releases/reef/#v18-2-4-reef

- Le 24 Sep 24, à 0:32, Kyriazis, George george.kyria...@intel.com a écrit :

> Hello ceph users,
> 
> I am in the unfortunate situation of having a status of “1 mds daemon 
> damaged”.
> Looking at the logs, I see that the daemon died with an assert as follows:
> 
> ./src/osdc/Journaler.cc: 1368: FAILED ceph_assert(trim_to > trimming_pos)
> 
> ceph version 18.2.2 (e9fe820e7fffd1b7cde143a9f77653b73fcec748) reef (stable)
> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char 
> const*)+0x12a)
> [0x73a83189d7d9]
> 2: /usr/lib/ceph/libceph-common.so.2(+0x29d974) [0x73a83189d974]
> 3: (Journaler::_trim()+0x671) [0x57235caa70b1]
> 4: (Journaler::_finish_write_head(int, Journaler::Header&, 
> C_OnFinisher*)+0x171)
> [0x57235caaa8f1]
> 5: (Context::complete(int)+0x9) [0x57235c716849]
> 6: (Finisher::finisher_thread_entry()+0x16d) [0x73a83194659d]
> 7: /lib/x86_64-linux-gnu/libc.so.6(+0x89134) [0x73a8310a8134]
> 8: /lib/x86_64-linux-gnu/libc.so.6(+0x1097dc) [0x73a8311287dc]
> 
> 0> 2024-09-23T14:10:26.490-0500 73a822c006c0 -1 *** Caught signal 
> (Aborted) **
> in thread 73a822c006c0 thread_name:MR_Finisher
> 
> ceph version 18.2.2 (e9fe820e7fffd1b7cde143a9f77653b73fcec748) reef (stable)
> 1: /lib/x86_64-linux-gnu/libc.so.6(+0x3c050) [0x73a83105b050]
> 2: /lib/x86_64-linux-gnu/libc.so.6(+0x8ae2c) [0x73a8310a9e2c]
> 3: gsignal()
> 4: abort()
> 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char 
> const*)+0x185)
> [0x73a83189d834]
> 6: /usr/lib/ceph/libceph-common.so.2(+0x29d974) [0x73a83189d974]
> 7: (Journaler::_trim()+0x671) [0x57235caa70b1]
> 8: (Journaler::_finish_write_head(int, Journaler::Header&, 
> C_OnFinisher*)+0x171)
> [0x57235caaa8f1]
> 9: (Context::complete(int)+0x9) [0x57235c716849]
> 10: (Finisher::finisher_thread_entry()+0x16d) [0x73a83194659d]
> 11: /lib/x86_64-linux-gnu/libc.so.6(+0x89134) [0x73a8310a8134]
> 12: /lib/x86_64-linux-gnu/libc.so.6(+0x1097dc) [0x73a8311287dc]
> NOTE: a copy of the executable, or `objdump -rdS ` is needed to
> interpret this.
> 
> 
> As listed above, I am running 18.2.2 on a proxmox cluster with a hybrid 
> hdd/sdd
> setup.  2 cephfs filesystems.  The mds responsible for the hdd filesystem is
> the one that died.
> 
> Output of ceph -s follows:
> 
> root@vis-mgmt:~/bin# ceph -s
>  cluster:
>id: ec2c9542-dc1b-4af6-9f21-0adbcabb9452
>health: HEALTH_ERR
>1 filesystem is degraded
>1 filesystem is offline
>1 mds daemon damaged
>5 pgs not scrubbed in time
>1 daemons have recently crashed
>services:
>mon: 5 daemons, quorum 
> vis-hsw-01,vis-skx-01,vis-clx-15,vis-clx-04,vis-icx-00
>(age 6m)
>mgr: vis-hsw-02(active, since 13d), standbys: vis-skx-02, vis-hsw-04,
>vis-clx-08, vis-clx-02
>mds: 1/2 daemons up, 5 standby
>osd: 97 osds: 97 up (since 3h), 97 in (since 4d)
>data:
>volumes: 1/2 healthy, 1 recovering; 1 damaged
>pools:   14 pools, 1961 pgs
>objects: 223.70M objects, 304 TiB
>usage:   805 TiB used, 383 TiB / 1.2 PiB avail
>pgs: 1948 active+clean
> 9active+clean+scrubbing+deep
> 4active+clean+scrubbing
>io:
>client:   86 KiB/s rd, 5.5 MiB/s wr, 64 op/s rd, 26 op/s wr
>  
> 
> 
> I tried restarting all the mds deamons but they are all marked as “standby”.  
> I
> also tried restarting all the mons and then the mds daemons again, but that
> didn’t help.
> 
> Much help is appreciated!
> 
> Thank you!
> 
> George
> 
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Mds daemon damaged - assert failed

2024-09-27 Thread Konstantin Shalygin

Hi,

The [2] is the fix for [1] and should be backported? Currently fields are not 
filled, so no one knows that backports are needed


k

> On 27 Sep 2024, at 11:01, Frédéric Nass  
> wrote:
> 
> Hi George,
> 
> Looks like you hit this one [1]. Can't find the fix [2] in Reef release notes 
> [3]. You'll have to cherry pick it and build sources or wait for it to come 
> to next build.
> 
> Regards,
> Frédéric.
> 
> [1] https://tracker.ceph.com/issues/58878
> [2] https://github.com/ceph/ceph/pull/55265
> [3] https://docs.ceph.com/en/latest/releases/reef/#v18-2-4-reef
> 
> - Le 24 Sep 24, à 0:32, Kyriazis, George george.kyria...@intel.com a 
> écrit :
> 
>> Hello ceph users,
>> 
>> I am in the unfortunate situation of having a status of “1 mds daemon 
>> damaged”.
>> Looking at the logs, I see that the daemon died with an assert as follows:
>> 
>> ./src/osdc/Journaler.cc: 1368: FAILED ceph_assert(trim_to > trimming_pos)
>> 
>> ceph version 18.2.2 (e9fe820e7fffd1b7cde143a9f77653b73fcec748) reef (stable)
>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char 
>> const*)+0x12a)
>> [0x73a83189d7d9]
>> 2: /usr/lib/ceph/libceph-common.so.2(+0x29d974) [0x73a83189d974]
>> 3: (Journaler::_trim()+0x671) [0x57235caa70b1]
>> 4: (Journaler::_finish_write_head(int, Journaler::Header&, 
>> C_OnFinisher*)+0x171)
>> [0x57235caaa8f1]
>> 5: (Context::complete(int)+0x9) [0x57235c716849]
>> 6: (Finisher::finisher_thread_entry()+0x16d) [0x73a83194659d]
>> 7: /lib/x86_64-linux-gnu/libc.so.6(+0x89134) [0x73a8310a8134]
>> 8: /lib/x86_64-linux-gnu/libc.so.6(+0x1097dc) [0x73a8311287dc]
>> 
>>0> 2024-09-23T14:10:26.490-0500 73a822c006c0 -1 *** Caught signal 
>> (Aborted) **
>> in thread 73a822c006c0 thread_name:MR_Finisher
>> 
>> ceph version 18.2.2 (e9fe820e7fffd1b7cde143a9f77653b73fcec748) reef (stable)
>> 1: /lib/x86_64-linux-gnu/libc.so.6(+0x3c050) [0x73a83105b050]
>> 2: /lib/x86_64-linux-gnu/libc.so.6(+0x8ae2c) [0x73a8310a9e2c]
>> 3: gsignal()
>> 4: abort()
>> 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char 
>> const*)+0x185)
>> [0x73a83189d834]
>> 6: /usr/lib/ceph/libceph-common.so.2(+0x29d974) [0x73a83189d974]
>> 7: (Journaler::_trim()+0x671) [0x57235caa70b1]
>> 8: (Journaler::_finish_write_head(int, Journaler::Header&, 
>> C_OnFinisher*)+0x171)
>> [0x57235caaa8f1]
>> 9: (Context::complete(int)+0x9) [0x57235c716849]
>> 10: (Finisher::finisher_thread_entry()+0x16d) [0x73a83194659d]
>> 11: /lib/x86_64-linux-gnu/libc.so.6(+0x89134) [0x73a8310a8134]
>> 12: /lib/x86_64-linux-gnu/libc.so.6(+0x1097dc) [0x73a8311287dc]
>> NOTE: a copy of the executable, or `objdump -rdS ` is needed to
>> interpret this.
>> 
>> 
>> As listed above, I am running 18.2.2 on a proxmox cluster with a hybrid 
>> hdd/sdd
>> setup.  2 cephfs filesystems.  The mds responsible for the hdd filesystem is
>> the one that died.
>> 
>> Output of ceph -s follows:
>> 
>> root@vis-mgmt:~/bin# ceph -s
>> cluster:
>>   id: ec2c9542-dc1b-4af6-9f21-0adbcabb9452
>>   health: HEALTH_ERR
>>   1 filesystem is degraded
>>   1 filesystem is offline
>>   1 mds daemon damaged
>>   5 pgs not scrubbed in time
>>   1 daemons have recently crashed
>>   services:
>>   mon: 5 daemons, quorum 
>> vis-hsw-01,vis-skx-01,vis-clx-15,vis-clx-04,vis-icx-00
>>   (age 6m)
>>   mgr: vis-hsw-02(active, since 13d), standbys: vis-skx-02, vis-hsw-04,
>>   vis-clx-08, vis-clx-02
>>   mds: 1/2 daemons up, 5 standby
>>   osd: 97 osds: 97 up (since 3h), 97 in (since 4d)
>>   data:
>>   volumes: 1/2 healthy, 1 recovering; 1 damaged
>>   pools:   14 pools, 1961 pgs
>>   objects: 223.70M objects, 304 TiB
>>   usage:   805 TiB used, 383 TiB / 1.2 PiB avail
>>   pgs: 1948 active+clean
>>9active+clean+scrubbing+deep
>>4active+clean+scrubbing
>>   io:
>>   client:   86 KiB/s rd, 5.5 MiB/s wr, 64 op/s rd, 26 op/s wr
>> 
>> 
>> 
>> I tried restarting all the mds deamons but they are all marked as “standby”. 
>>  I
>> also tried restarting all the mons and then the mds daemons again, but that
>> didn’t help.
>> 
>> Much help is appreciated!
>> 
>> Thank you!
>> 
>> George
>> 
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Ceph orchestrator not refreshing device list

2024-09-27 Thread Bob Gibson

Hi Eugen,

Thanks again for taking the time to help us with this.

Here are answers to your questions:

Nothing stands out from the mgr logs. Even when `ceph orch device ls` stops 
reporting, it still shows a claim on the osd in the logs when I run it:

Sep 27 09:39:24 ceph-mon3 bash[476409]: debug 2024-09-27T13:39:24.731+ 
7fd4dc6fa700  0 [cephadm INFO root] Found osd claims -> {'ceph-osd31': ['88']}
Sep 27 09:39:24 ceph-mon3 bash[476409]: debug 2024-09-27T13:39:24.731+ 
7fd4dc6fa700  0 log_channel(cephadm) log [INF] : Found osd claims -> 
{'ceph-osd31': ['88']}
Sep 27 09:39:24 ceph-mon3 bash[476409]: debug 2024-09-27T13:39:24.731+ 
7fd4dc6fa700  0 [cephadm INFO cephadm.services.osd] Found osd claims for 
drivegroup ceph-osd31 -> {'ceph-osd31': ['88']}
Sep 27 09:39:24 ceph-mon3 bash[476409]: debug 2024-09-27T13:39:24.731+ 
7fd4dc6fa700  0 log_channel(cephadm) log [INF] : Found osd claims for 
drivegroup ceph-osd31 -> {'ceph-osd31': ['88’]}


Here’s a sample of mgr logs right after a mgr failover (I’ve filtered out some 
noise from pgmap, prometheus, pg_autoscaler, balancer, and progress):

Sep 27 09:55:18 ceph-mon3 bash[476409]: debug 2024-09-27T13:55:18.006+ 
7f8d2f15c700  1 mgr handle_mgr_map Activating!
Sep 27 09:55:18 ceph-mon3 bash[476409]: debug 2024-09-27T13:55:18.006+ 
7f8d2f15c700  1 mgr handle_mgr_map I am now activating
Sep 27 09:55:18 ceph-mon3 bash[476409]: [27/Sep/2024:13:55:18] ENGINE HTTP 
Server cherrypy._cpwsgi_server.CPWSGIServer(('::', 9283)) shut down
Sep 27 09:55:18 ceph-mon3 bash[476409]: debug 2024-09-27T13:55:18.102+ 
7f8c4baa7700  0 [cephadm DEBUG root] setting log level based on debug_mgr: INFO 
(2/5)
Sep 27 09:55:18 ceph-mon3 bash[476409]: debug 2024-09-27T13:55:18.202+ 
7f8c4baa7700  1 mgr load Constructed class from module: cephadm
Sep 27 09:55:18 ceph-mon3 bash[476409]: debug 2024-09-27T13:55:18.206+ 
7f8c4baa7700  0 [crash DEBUG root] setting log level based on debug_mgr: INFO 
(2/5)
Sep 27 09:55:18 ceph-mon3 bash[476409]: debug 2024-09-27T13:55:18.206+ 
7f8c4baa7700  1 mgr load Constructed class from module: crash
Sep 27 09:55:18 ceph-mon3 bash[476409]: debug 2024-09-27T13:55:18.222+ 
7f8c4baa7700  0 [devicehealth DEBUG root] setting log level based on debug_mgr: 
INFO (2/5)
Sep 27 09:55:18 ceph-mon3 bash[476409]: debug 2024-09-27T13:55:18.222+ 
7f8c4baa7700  1 mgr load Constructed class from module: devicehealth
Sep 27 09:55:18 ceph-mon3 bash[476409]: debug 2024-09-27T13:55:18.222+ 
7f8c3e28c700  0 [devicehealth INFO root] Starting
Sep 27 09:55:18 ceph-mon3 bash[476409]: debug 2024-09-27T13:55:18.238+ 
7f8c4baa7700  0 [orchestrator DEBUG root] setting log level based on debug_mgr: 
INFO (2/5)
Sep 27 09:55:18 ceph-mon3 bash[476409]: debug 2024-09-27T13:55:18.242+ 
7f8c4baa7700  1 mgr load Constructed class from module: orchestrator
Sep 27 09:55:18 ceph-mon3 bash[476409]: debug 2024-09-27T13:55:18.318+ 
7f8c4baa7700  0 [rbd_support DEBUG root] setting log level based on debug_mgr: 
INFO (2/5)
Sep 27 09:55:18 ceph-mon3 bash[476409]: [27/Sep/2024:13:55:18] ENGINE Bus 
STARTING
Sep 27 09:55:18 ceph-mon3 bash[476409]: debug 2024-09-27T13:55:18.346+ 
7f8c31272700  0 [rbd_support INFO root] recovery thread starting
Sep 27 09:55:18 ceph-mon3 bash[476409]: debug 2024-09-27T13:55:18.346+ 
7f8c31272700  0 [rbd_support INFO root] starting setup
Sep 27 09:55:18 ceph-mon3 bash[476409]: debug 2024-09-27T13:55:18.354+ 
7f8c4baa7700  1 mgr load Constructed class from module: rbd_support
Sep 27 09:55:18 ceph-mon3 bash[476409]: debug 2024-09-27T13:55:18.358+ 
7f8c31272700  0 [rbd_support INFO root] MirrorSnapshotScheduleHandler: 
load_schedules
Sep 27 09:55:18 ceph-mon3 bash[476409]: debug 2024-09-27T13:55:18.370+ 
7f8c31272700  0 [rbd_support INFO root] load_schedules: rbd, start_after=
Sep 27 09:55:18 ceph-mon3 bash[476409]: debug 2024-09-27T13:55:18.374+ 
7f8c4baa7700  0 [status DEBUG root] setting log level based on debug_mgr: INFO 
(2/5)
Sep 27 09:55:18 ceph-mon3 bash[476409]: debug 2024-09-27T13:55:18.374+ 
7f8c4baa7700  1 mgr load Constructed class from module: status
Sep 27 09:55:18 ceph-mon3 bash[476409]: debug 2024-09-27T13:55:18.378+ 
7f8c4baa7700  0 [telemetry DEBUG root] setting log level based on debug_mgr: 
INFO (2/5)
Sep 27 09:55:18 ceph-mon3 bash[476409]: debug 2024-09-27T13:55:18.378+ 
7f8c4baa7700  1 mgr load Constructed class from module: telemetry
Sep 27 09:55:18 ceph-mon3 bash[476409]: debug 2024-09-27T13:55:18.382+ 
7f8c4baa7700  0 [volumes DEBUG root] setting log level based on debug_mgr: INFO 
(2/5)
Sep 27 09:55:18 ceph-mon3 bash[476409]: debug 2024-09-27T13:55:18.386+ 
7f8c31272700  0 [rbd_support INFO root] load_schedules: images, start_after=
Sep 27 09:55:18 ceph-mon3 bash[476409]: debug 2024-09-27T13:55:18.390+ 
7f8c31272700  0 [rbd_support INFO root] load_schedules: volumes, start_after=
Sep 27 09:55:18 ceph-mon3 bash[476409]: debug 2024-09-27T13:55:18

[ceph-users] Re: WAL on NVMe/SSD not used after OSD/HDD replace

2024-09-27 Thread Lukasz Borek

Adding --zap to orch command cleans WALL logical volume :

ceph orch osd rm 37 --replace *--zap*

After replacement, new OSD is correctly created. Tested a few times with
18.2.4.

Thanks.

On Fri, 27 Sept 2024 at 19:31, Igor Fedotov  wrote:

> Hi!
>
> I'm not an expert in the Ceph orchestrator but it looks to me like WAL
> volume hasn't been properly cleaned up during osd.1 removal.
>
> Please compare LVM tags for osd.0 and .1:
>
> osd.0:
>
> "devices": [
>  "/dev/sdc"
>  ],
>
> ...
>
>  "lv_tags":
> "...,ceph.osd_fsid=d472bf9f-c17d-4939-baf5-514a07db66bc,ceph.osd_id=0,...",
>
>   "devices": [
>  "/dev/sdb"
>  ],
>  "lv_tags":
> "...ceph.osd_fsid=d472bf9f-c17d-4939-baf5-514a07db66bc,ceph.osd_id=0,...",
>
> osd_fsid (OSD daemon UUID) is the same for both devices, which allows
> ceph-volume to bind these volumes to the relevant OSD.
>
> OSD.1:
>
>{
>  "devices": [
>  "/dev/sdd"
>  ],
>  "lv_tags":
> "...ceph.osd_fsid=d94bda82-e59f-4d3d-81cd-28ea69c5e02f,ceph.osd_id=1,...",
>
> ...
>
>  {
>  "devices": [
>  "/dev/sdb"
>  ],
>  "lv_tags":
> "...ceph.osd_fsid=7a1d0007-71ff-4011-8a18-e6de1499cbdf,ceph.osd_id=1,...",
>
> osd_fsid tags are different, WAL volume's one is apparently a legacy UUID.
>
> This WAL volume is not bound to new osd.1  (lvtags for osd.1 main volume
> confirms that since there are no WAL related members there) and it still
> keeps setting for the legacy OSD.1.
>
> In other words this is an orpan volume for now and apparently could be
> safely recreated and assigned back to osd.1 via ceph-colume lvm new-wal
> command. Certainly better try in the test env first.
>
> Hope this helps.
>
> Thanks,
>
> Igor
>
> On 9/27/2024 3:48 PM, mailing-lists wrote:
> > Dear Ceph-users,
> > I have a problem that I'd like to have your input for.
> >
> > Preface:
> > I have got a test-cluster and a productive-cluster. Both are setup the
> > same and both are having the same "issue". I am running Ubuntu 22.04
> > and deployed ceph 17.2.3 via cephadm. Upgraded to 17.2.7 later on,
> > which is the version we are currently running. Since the issue seem to
> > be the exact same on the test-cluster, I will post
> > test-cluster-outputs here for better readability.
> >
> > The issue:
> > I have replaced disks and after the replacement, it does not show that
> > it would use the NVMe as WAL device anymore. The LV still exists, but
> > the metadata of the osd does not show it, as it would be with any
> > other osd/hdd, that hasnt been replaced.
> >
> > ODS.1 (incorrect, bluefs_dedicated_wal: "0")
> > ```
> > {
> > "id": 1,
> > "arch": "x86_64",
> > "back_addr":
> > "[v2:192.168.6.241:6802/3213655489,v1:192.168.6.241:6803/3213655489]",
> > "back_iface": "",
> > "bluefs": "1",
> > "bluefs_dedicated_db": "0",
> > "bluefs_dedicated_wal": "0",
> > "bluefs_single_shared_device": "1",
> > "bluestore_bdev_access_mode": "blk",
> > "bluestore_bdev_block_size": "4096",
> > "bluestore_bdev_dev_node": "/dev/dm-3",
> > "bluestore_bdev_devices": "sdd",
> > "bluestore_bdev_driver": "KernelDevice",
> > "bluestore_bdev_optimal_io_size": "0",
> > "bluestore_bdev_partition_path": "/dev/dm-3",
> > "bluestore_bdev_rotational": "1",
> > "bluestore_bdev_size": "17175674880",
> > "bluestore_bdev_support_discard": "1",
> > "bluestore_bdev_type": "hdd",
> > "bluestore_min_alloc_size": "4096",
> > "ceph_release": "quincy",
> > "ceph_version": "ceph version 17.2.7
> > (b12291d110049b2f35e32e0de30d70e9a4c060d2) quincy (stable)",
> > "ceph_version_short": "17.2.7",
> > "ceph_version_when_created": "",
> > "container_hostname": "bi-ubu-srv-ceph2-01",
> > "container_image":
> > "
> quay.io/ceph/ceph@sha256:28323e41a7d17db238bdcc0a4d7f38d272f75c1a499bc30f59b0b504af132c6b
> ",
> > "cpu": "AMD EPYC 75F3 32-Core Processor",
> > "created_at": "",
> > "default_device_class": "hdd",
> > "device_ids": "sdd=QEMU_HARDDISK_drive-scsi3",
> > "device_paths":
> > "sdd=/dev/disk/by-path/pci-:00:05.0-scsi-0:0:3:0",
> > "devices": "sdd",
> > "distro": "centos",
> > "distro_description": "CentOS Stream 8",
> > "distro_version": "8",
> > "front_addr":
> >
> "[v2:.241:6800/3213655489,v1:.241:6801/3213655489]",
> > "front_iface": "",
> > "hb_back_addr":
> > "[v2:192.168.6.241:6806/3213655489,v1:192.168.6.241:6807/3213655489]",
> > "hb_front_addr":
> >
> "[v2:.241:6804/3213655489,v1:.241:6805/3213655489]",
> > "hostname": "bi-ubu-srv-ceph2-01",
> > "journal_rotational": "1",
> > "kernel_description": "#132-Ubuntu SMP Thu Aug 29 13:45:52 UTC 2024",
> > "kernel_version": "5.15.0-122-generic",
> > "mem_swap_kb": "4018172",
> > "mem_total_kb": "5025288",
> > "network_numa_unknown_ifaces": "back_iface,fr

[ceph-users] Re: device_health_metrics pool automatically recreated

2024-09-27 Thread Patrick Donnelly

On Tue, Aug 27, 2024 at 6:49 AM Eugen Block  wrote:
>
> Hi,
>
> I just looked into one customer cluster that we upgraded some time ago
> from Octopus to Quincy (17.2.6) and I'm wondering why there are still
> both pools, "device_health_metrics" and ".mgr".
>
> According to the docs [0], it's supposed to be renamed:
>
> > Prior to Quincy, the devicehealth module created a
> > device_health_metrics pool to store device SMART statistics. With
> > Quincy, this pool is automatically renamed to be the common manager
> > module pool.
>
> Now only .mgr has data while device_health_metrics is empty, but it
> has a newer ID:
>
> ses01:~ # ceph df | grep -E "device_health|.mgr"
> .mgr1 1   68 MiB   18  204 MiB
>   0254 TiB
> device_health_metrics  15 1  0 B0  0 B
>   0254 TiB
>
> On a test cluster (meanwhile upgraded to latest Reef) I see the same:
>
> ceph01:~ # ceph df | grep -E "device_health_metrics|.mgr"
> .mgr381  577 KiB2  1.7 MiB  0
> 71 GiB
> device_health_metrics   451  0 B0  0 B  0
> 71 GiB
>
> Since there are still many users who haven't upgraded to >= Quincy
> yet, this should be clarified/fixed. I briefly checked
> tracker.ceph.com, but didn't find anything related to this. I'm
> currently trying to reproduce it on a one-node test cluster which I
> upgraded from Pacific to Quincy, but no results yet, only that the
> renaming was successful. But for the other clusters I don't have
> enough logs to find out how/why the device_health_metrics pool had
> been recreated.

Probably someone ran a pre-Quincy ceph-mgr on the cluster after the
upgrade? That would explain the larger pool id.

-- 
Patrick Donnelly, Ph.D.
He / Him / His
Red Hat Partner Engineer
IBM, Inc.
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Ceph orchestrator not refreshing device list

2024-09-27 Thread Eugen Block

Oh interesting, I just got into the same situation (I believe) on a
test cluster:

host1:~ # ceph orch ps | grep unknown
osd.1 host6
stopped 72s ago 36m-4096M

osd.13 host6
error72s ago 36m-4096M

I still had the remainders on the filesystem:

host6:~ # ll /var/lib/ceph/543967bc-e586-32b8-bd2c-2d8b8b168f02/osd.1
insgesamt 68
lrwxrwxrwx 1 ceph ceph 111 27. Sep 14:43 block ->
/dev/mapper/ceph--0e90997f--456e--4a9b--a8f9--a6f1038c1216-osd--block--81e7f32a--a728--4848--b14d--0b86bb7e1c69
lrwxrwxrwx 1 ceph ceph 108 27. Sep 14:43 block.db ->
/dev/mapper/ceph--9ea6e95f--ad43--4e40--8920--2e772b2efa2f-osd--db--f9c57ec1--77c8--4d9a--85df--1dc053a24000

I just removed those two directories to clear the warning, now my
orchestrator can deploy OSDs again on that node.

Hope that helps!

Zitat von Eugen Block :

Right, if you need encryption, a rebuild is required. Your procedure
has already worked 4 times, so I'd say nothing seems wrong with that
per se.
Regarding the stuck device list, do you see the mgr logging anything
suspicious? Especially when you say that it only returns output
after a failover. Those two osd specs are not conflicting since the
first is "unmanaged" after adoption.
Is there something in 'ceph orch osd rm status'? Can you run
'cephadm ceph-volume inventory' locally on that node? Do you see any
hints in the node's syslog? Maybe try a reboot or something?

Zitat von Bob Gibson :

Thanks for your reply Eugen. I’m fairly new to cephadm so I wasn’t
aware that we could manage the drives without rebuilding them.
However, we thought we’d take advantage of this opportunity to also
encrypt the drives, and that does require a rebuild.

I have a theory on why the orchestrator is confused. I want to
create an osd service for each osd node so I can manage drives on a
per-node basis.

I started by creating a spec for the first node:

service_type: osd
service_id: ceph-osd31
placement:
hosts:
- ceph-osd31
spec:
data_devices:
rotational: 0
size: '3TB:'
encrypted: true
filter_logic: AND
objectstore: bluestore

But I also see a default spec, “osd”, which has placement set to
“unmanaged”.

`ceph orch ls osd —export` shows the following:

service_type: osd
service_name: osd
unmanaged: true
spec:
filter_logic: AND
objectstore: bluestore
---
service_type: osd
service_id: ceph-osd31
service_name: osd.ceph-osd31
placement:
hosts:
- ceph-osd31
spec:
data_devices:
rotational: 0
size: '3TB:'
encrypted: true
filter_logic: AND
objectstore: bluestore

`ceph orch ls osd` shows that I was able to convert 4 drives using my spec:

NAMEPORTS RUNNING REFRESHED AGE PLACEMENT
osd 95 10m ago-
osd.ceph-osd31 4 10m ago43m ceph-osd31

Despite being able to convert 4 drives, I’m wondering if these
specs are conflicting with one another, and that has confused the
orchestrator. If so, how do I safely get from where I am now to
where I want to be? :-)

Cheers,
/rjg

On Sep 26, 2024, at 3:31 PM, Eugen Block wrote:

EXTERNAL EMAIL | USE CAUTION

Hi,

this seems a bit unnecessary to rebuild OSDs just to get them managed.
If you apply a spec file that targets your hosts/OSDs, they will
appear as managed. So when you would need to replace a drive, you
could already utilize the orchestrator to remove and zap the drive.
That works just fine.
How to get out of your current situation is not entirely clear to me
yet. I’ll reread your post tomorrow.

Regards,
Eugen

Zitat von Bob Gibson :

Hi,

We recently converted a legacy cluster running Quincy v17.2.7 to
cephadm. The conversion went smoothly and left all osds unmanaged by
the orchestrator as expected. We’re now in the process of converting
the osds to be managed by the orchestrator. We successfully
converted a few of them, but then the orchestrator somehow got
confused. `ceph health detail` reports a “stray daemon” for the osd
we’re trying to convert, and the orchestrator is unable to refresh
its device list so it doesn’t see any available devices.

From the perspective of the osd node, the osd has been wiped and is
ready to be reinstalled. We’ve also rebooted the node for good
measure. `ceph osd tree` shows that the osd has been destroyed, but
the orchestrator won’t reinstall it because it thinks the device is
still active. The orchestrator device information is stale, but
we’re unable to refresh it. The usual recommended workaround of
failing over the mgr hasn’t helped. We’ve also tried `ceph orch
device ls —refresh` to no avail. In fact after running that command
subsequent runs of `ceph orch device ls` produce no output until the
mgr is failed over again.

Is there a way to force the orchestrator to refresh its list of
devices when in this state? If not, can anyone offer any

[ceph-users] Re: WAL on NVMe/SSD not used after OSD/HDD replace

2024-09-27 Thread Igor Fedotov


Hi!

I'm not an expert in the Ceph orchestrator but it looks to me like WAL 
volume hasn't been properly cleaned up during osd.1 removal.


Please compare LVM tags for osd.0 and .1:

osd.0:

"devices": [
    "/dev/sdc"
    ],

...

    "lv_tags": 
"...,ceph.osd_fsid=d472bf9f-c17d-4939-baf5-514a07db66bc,ceph.osd_id=0,...",


 "devices": [
    "/dev/sdb"
    ],
    "lv_tags": 
"...ceph.osd_fsid=d472bf9f-c17d-4939-baf5-514a07db66bc,ceph.osd_id=0,...",


osd_fsid (OSD daemon UUID) is the same for both devices, which allows 
ceph-volume to bind these volumes to the relevant OSD.


OSD.1:

  {
    "devices": [
    "/dev/sdd"
    ],
    "lv_tags": 
"...ceph.osd_fsid=d94bda82-e59f-4d3d-81cd-28ea69c5e02f,ceph.osd_id=1,...",


...

    {
    "devices": [
    "/dev/sdb"
    ],
    "lv_tags": 
"...ceph.osd_fsid=7a1d0007-71ff-4011-8a18-e6de1499cbdf,ceph.osd_id=1,...",


osd_fsid tags are different, WAL volume's one is apparently a legacy UUID.

This WAL volume is not bound to new osd.1  (lvtags for osd.1 main volume 
confirms that since there are no WAL related members there) and it still 
keeps setting for the legacy OSD.1.


In other words this is an orpan volume for now and apparently could be 
safely recreated and assigned back to osd.1 via ceph-colume lvm new-wal 
command. Certainly better try in the test env first.


Hope this helps.

Thanks,

Igor

On 9/27/2024 3:48 PM, mailing-lists wrote:

Dear Ceph-users,
I have a problem that I'd like to have your input for.

Preface:
I have got a test-cluster and a productive-cluster. Both are setup the 
same and both are having the same "issue". I am running Ubuntu 22.04 
and deployed ceph 17.2.3 via cephadm. Upgraded to 17.2.7 later on, 
which is the version we are currently running. Since the issue seem to 
be the exact same on the test-cluster, I will post 
test-cluster-outputs here for better readability.


The issue:
I have replaced disks and after the replacement, it does not show that 
it would use the NVMe as WAL device anymore. The LV still exists, but 
the metadata of the osd does not show it, as it would be with any 
other osd/hdd, that hasnt been replaced.


ODS.1 (incorrect, bluefs_dedicated_wal: "0")
```
{
    "id": 1,
    "arch": "x86_64",
    "back_addr": 
"[v2:192.168.6.241:6802/3213655489,v1:192.168.6.241:6803/3213655489]",

    "back_iface": "",
    "bluefs": "1",
    "bluefs_dedicated_db": "0",
    "bluefs_dedicated_wal": "0",
    "bluefs_single_shared_device": "1",
    "bluestore_bdev_access_mode": "blk",
    "bluestore_bdev_block_size": "4096",
    "bluestore_bdev_dev_node": "/dev/dm-3",
    "bluestore_bdev_devices": "sdd",
    "bluestore_bdev_driver": "KernelDevice",
    "bluestore_bdev_optimal_io_size": "0",
    "bluestore_bdev_partition_path": "/dev/dm-3",
    "bluestore_bdev_rotational": "1",
    "bluestore_bdev_size": "17175674880",
    "bluestore_bdev_support_discard": "1",
    "bluestore_bdev_type": "hdd",
    "bluestore_min_alloc_size": "4096",
    "ceph_release": "quincy",
    "ceph_version": "ceph version 17.2.7 
(b12291d110049b2f35e32e0de30d70e9a4c060d2) quincy (stable)",

    "ceph_version_short": "17.2.7",
    "ceph_version_when_created": "",
    "container_hostname": "bi-ubu-srv-ceph2-01",
    "container_image": 
"quay.io/ceph/ceph@sha256:28323e41a7d17db238bdcc0a4d7f38d272f75c1a499bc30f59b0b504af132c6b",

    "cpu": "AMD EPYC 75F3 32-Core Processor",
    "created_at": "",
    "default_device_class": "hdd",
    "device_ids": "sdd=QEMU_HARDDISK_drive-scsi3",
    "device_paths": 
"sdd=/dev/disk/by-path/pci-:00:05.0-scsi-0:0:3:0",

    "devices": "sdd",
    "distro": "centos",
    "distro_description": "CentOS Stream 8",
    "distro_version": "8",
    "front_addr": 
"[v2:.241:6800/3213655489,v1:.241:6801/3213655489]",

    "front_iface": "",
    "hb_back_addr": 
"[v2:192.168.6.241:6806/3213655489,v1:192.168.6.241:6807/3213655489]",
    "hb_front_addr": 
"[v2:.241:6804/3213655489,v1:.241:6805/3213655489]",

    "hostname": "bi-ubu-srv-ceph2-01",
    "journal_rotational": "1",
    "kernel_description": "#132-Ubuntu SMP Thu Aug 29 13:45:52 UTC 2024",
    "kernel_version": "5.15.0-122-generic",
    "mem_swap_kb": "4018172",
    "mem_total_kb": "5025288",
    "network_numa_unknown_ifaces": "back_iface,front_iface",
    "objectstore_numa_unknown_devices": "sdd",
    "os": "Linux",
    "osd_data": "/var/lib/ceph/osd/ceph-1",
    "osd_objectstore": "bluestore",
    "osdspec_affinity": "dashboard-admin-1661853488642",
    "rotational": "1"
}
```

ODS.0 (correct, bluefs_dedicated_wal: "1")
```
{
    "id": 0,
    "arch": "x86_64",
    "back_addr": 
"[v2:192.168.6.241:6810/3249286142,v1:192.168.6.241:6811/3249286142]",

    "back_iface": "",
    "bluefs": "1",
    "bluefs_dedicated_db": "0",
    "bluefs_dedicated_wal": "1",
    "bluefs_single_shared_device": "0",
    "bluefs_wal_access_mode": "blk",
    "bluefs_wal_blo

[ceph-users] Re: Mds daemon damaged - assert failed

2024-09-27 Thread Kyriazis, George

I am running 18.2.2, which apparently is the latest one available for proxmox 
at this time (9/2024).

I’d rather not mess around with backporting and testing fixes at this point, 
since this is our “production” cluster..  If it was not a production one, then 
I could possibly play around with this, given there was free time. :-)

Thank you for looking it up!

George

> On Sep 27, 2024, at 3:44 AM, Konstantin Shalygin  wrote:
> 
> Hi,
> 
> The [2] is the fix for [1] and should be backported? Currently fields are not 
> filled, so no one knows that backports are needed
> 
> 
> k
> 
>> On 27 Sep 2024, at 11:01, Frédéric Nass  
>> wrote:
>> 
>> Hi George,
>> 
>> Looks like you hit this one [1]. Can't find the fix [2] in Reef release 
>> notes [3]. You'll have to cherry pick it and build sources or wait for it to 
>> come to next build.
>> 
>> Regards,
>> Frédéric.
>> 
>> [1] https://tracker.ceph.com/issues/58878
>> [2] https://github.com/ceph/ceph/pull/55265
>> [3] https://docs.ceph.com/en/latest/releases/reef/#v18-2-4-reef
>> 
>> - Le 24 Sep 24, à 0:32, Kyriazis, George george.kyria...@intel.com a 
>> écrit :
>> 
>>> Hello ceph users,
>>> 
>>> I am in the unfortunate situation of having a status of “1 mds daemon 
>>> damaged”.
>>> Looking at the logs, I see that the daemon died with an assert as follows:
>>> 
>>> ./src/osdc/Journaler.cc: 1368: FAILED ceph_assert(trim_to > trimming_pos)
>>> 
>>> ceph version 18.2.2 (e9fe820e7fffd1b7cde143a9f77653b73fcec748) reef (stable)
>>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char 
>>> const*)+0x12a)
>>> [0x73a83189d7d9]
>>> 2: /usr/lib/ceph/libceph-common.so.2(+0x29d974) [0x73a83189d974]
>>> 3: (Journaler::_trim()+0x671) [0x57235caa70b1]
>>> 4: (Journaler::_finish_write_head(int, Journaler::Header&, 
>>> C_OnFinisher*)+0x171)
>>> [0x57235caaa8f1]
>>> 5: (Context::complete(int)+0x9) [0x57235c716849]
>>> 6: (Finisher::finisher_thread_entry()+0x16d) [0x73a83194659d]
>>> 7: /lib/x86_64-linux-gnu/libc.so.6(+0x89134) [0x73a8310a8134]
>>> 8: /lib/x86_64-linux-gnu/libc.so.6(+0x1097dc) [0x73a8311287dc]
>>> 
>>>   0> 2024-09-23T14:10:26.490-0500 73a822c006c0 -1 *** Caught signal 
>>> (Aborted) **
>>> in thread 73a822c006c0 thread_name:MR_Finisher
>>> 
>>> ceph version 18.2.2 (e9fe820e7fffd1b7cde143a9f77653b73fcec748) reef (stable)
>>> 1: /lib/x86_64-linux-gnu/libc.so.6(+0x3c050) [0x73a83105b050]
>>> 2: /lib/x86_64-linux-gnu/libc.so.6(+0x8ae2c) [0x73a8310a9e2c]
>>> 3: gsignal()
>>> 4: abort()
>>> 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char 
>>> const*)+0x185)
>>> [0x73a83189d834]
>>> 6: /usr/lib/ceph/libceph-common.so.2(+0x29d974) [0x73a83189d974]
>>> 7: (Journaler::_trim()+0x671) [0x57235caa70b1]
>>> 8: (Journaler::_finish_write_head(int, Journaler::Header&, 
>>> C_OnFinisher*)+0x171)
>>> [0x57235caaa8f1]
>>> 9: (Context::complete(int)+0x9) [0x57235c716849]
>>> 10: (Finisher::finisher_thread_entry()+0x16d) [0x73a83194659d]
>>> 11: /lib/x86_64-linux-gnu/libc.so.6(+0x89134) [0x73a8310a8134]
>>> 12: /lib/x86_64-linux-gnu/libc.so.6(+0x1097dc) [0x73a8311287dc]
>>> NOTE: a copy of the executable, or `objdump -rdS ` is needed to
>>> interpret this.
>>> 
>>> 
>>> As listed above, I am running 18.2.2 on a proxmox cluster with a hybrid 
>>> hdd/sdd
>>> setup.  2 cephfs filesystems.  The mds responsible for the hdd filesystem is
>>> the one that died.
>>> 
>>> Output of ceph -s follows:
>>> 
>>> root@vis-mgmt:~/bin# ceph -s
>>> cluster:
>>>  id: ec2c9542-dc1b-4af6-9f21-0adbcabb9452
>>>  health: HEALTH_ERR
>>>  1 filesystem is degraded
>>>  1 filesystem is offline
>>>  1 mds daemon damaged
>>>  5 pgs not scrubbed in time
>>>  1 daemons have recently crashed
>>>  services:
>>>  mon: 5 daemons, quorum 
>>> vis-hsw-01,vis-skx-01,vis-clx-15,vis-clx-04,vis-icx-00
>>>  (age 6m)
>>>  mgr: vis-hsw-02(active, since 13d), standbys: vis-skx-02, vis-hsw-04,
>>>  vis-clx-08, vis-clx-02
>>>  mds: 1/2 daemons up, 5 standby
>>>  osd: 97 osds: 97 up (since 3h), 97 in (since 4d)
>>>  data:
>>>  volumes: 1/2 healthy, 1 recovering; 1 damaged
>>>  pools:   14 pools, 1961 pgs
>>>  objects: 223.70M objects, 304 TiB
>>>  usage:   805 TiB used, 383 TiB / 1.2 PiB avail
>>>  pgs: 1948 active+clean
>>>   9active+clean+scrubbing+deep
>>>   4active+clean+scrubbing
>>>  io:
>>>  client:   86 KiB/s rd, 5.5 MiB/s wr, 64 op/s rd, 26 op/s wr
>>> 
>>> 
>>> 
>>> I tried restarting all the mds deamons but they are all marked as 
>>> “standby”.  I
>>> also tried restarting all the mons and then the mds daemons again, but that
>>> didn’t help.
>>> 
>>> Much help is appreciated!
>>> 
>>> Thank you!
>>> 
>>> George
>>> 
>>> ___
>>> ceph-users mailing list -- ceph-users@ceph.io
>>> To unsubscribe send an email to ceph-users-le...@ceph.io
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-

[ceph-users] Re: WAL on NVMe/SSD not used after OSD/HDD replace

2024-09-27 Thread Anthony D'Atri

Not a unique issue and I suspect that it affects lots of people who don’t know 
yet.  

Might be that you should rm the old LVM first or specify it with an explicit 
create command.  

> On Sep 27, 2024, at 8:55 AM, mailing-lists  wrote:
> 
> Dear Ceph-users,
> I have a problem that I'd like to have your input for.
> 
> Preface:
> I have got a test-cluster and a productive-cluster. Both are setup the same 
> and both are having the same "issue". I am running Ubuntu 22.04 and deployed 
> ceph 17.2.3 via cephadm. Upgraded to 17.2.7 later on, which is the version we 
> are currently running. Since the issue seem to be the exact same on the 
> test-cluster, I will post test-cluster-outputs here for better readability.
> 
> The issue:
> I have replaced disks and after the replacement, it does not show that it 
> would use the NVMe as WAL device anymore. The LV still exists, but the 
> metadata of the osd does not show it, as it would be with any other osd/hdd, 
> that hasnt been replaced.
> 
> ODS.1 (incorrect, bluefs_dedicated_wal: "0")
> ```
> {
> "id": 1,
> "arch": "x86_64",
> "back_addr": 
> "[v2:192.168.6.241:6802/3213655489,v1:192.168.6.241:6803/3213655489]",
> "back_iface": "",
> "bluefs": "1",
> "bluefs_dedicated_db": "0",
> "bluefs_dedicated_wal": "0",
> "bluefs_single_shared_device": "1",
> "bluestore_bdev_access_mode": "blk",
> "bluestore_bdev_block_size": "4096",
> "bluestore_bdev_dev_node": "/dev/dm-3",
> "bluestore_bdev_devices": "sdd",
> "bluestore_bdev_driver": "KernelDevice",
> "bluestore_bdev_optimal_io_size": "0",
> "bluestore_bdev_partition_path": "/dev/dm-3",
> "bluestore_bdev_rotational": "1",
> "bluestore_bdev_size": "17175674880",
> "bluestore_bdev_support_discard": "1",
> "bluestore_bdev_type": "hdd",
> "bluestore_min_alloc_size": "4096",
> "ceph_release": "quincy",
> "ceph_version": "ceph version 17.2.7 
> (b12291d110049b2f35e32e0de30d70e9a4c060d2) quincy (stable)",
> "ceph_version_short": "17.2.7",
> "ceph_version_when_created": "",
> "container_hostname": "bi-ubu-srv-ceph2-01",
> "container_image": 
> "quay.io/ceph/ceph@sha256:28323e41a7d17db238bdcc0a4d7f38d272f75c1a499bc30f59b0b504af132c6b",
> "cpu": "AMD EPYC 75F3 32-Core Processor",
> "created_at": "",
> "default_device_class": "hdd",
> "device_ids": "sdd=QEMU_HARDDISK_drive-scsi3",
> "device_paths": "sdd=/dev/disk/by-path/pci-:00:05.0-scsi-0:0:3:0",
> "devices": "sdd",
> "distro": "centos",
> "distro_description": "CentOS Stream 8",
> "distro_version": "8",
> "front_addr": 
> "[v2:.241:6800/3213655489,v1:.241:6801/3213655489]",
> "front_iface": "",
> "hb_back_addr": 
> "[v2:192.168.6.241:6806/3213655489,v1:192.168.6.241:6807/3213655489]",
> "hb_front_addr": 
> "[v2:.241:6804/3213655489,v1:.241:6805/3213655489]",
> "hostname": "bi-ubu-srv-ceph2-01",
> "journal_rotational": "1",
> "kernel_description": "#132-Ubuntu SMP Thu Aug 29 13:45:52 UTC 2024",
> "kernel_version": "5.15.0-122-generic",
> "mem_swap_kb": "4018172",
> "mem_total_kb": "5025288",
> "network_numa_unknown_ifaces": "back_iface,front_iface",
> "objectstore_numa_unknown_devices": "sdd",
> "os": "Linux",
> "osd_data": "/var/lib/ceph/osd/ceph-1",
> "osd_objectstore": "bluestore",
> "osdspec_affinity": "dashboard-admin-1661853488642",
> "rotational": "1"
> }
> ```
> 
> ODS.0 (correct, bluefs_dedicated_wal: "1")
> ```
> {
> "id": 0,
> "arch": "x86_64",
> "back_addr": 
> "[v2:192.168.6.241:6810/3249286142,v1:192.168.6.241:6811/3249286142]",
> "back_iface": "",
> "bluefs": "1",
> "bluefs_dedicated_db": "0",
> "bluefs_dedicated_wal": "1",
> "bluefs_single_shared_device": "0",
> "bluefs_wal_access_mode": "blk",
> "bluefs_wal_block_size": "4096",
> "bluefs_wal_dev_node": "/dev/dm-0",
> "bluefs_wal_devices": "sdb",
> "bluefs_wal_driver": "KernelDevice",
> "bluefs_wal_optimal_io_size": "0",
> "bluefs_wal_partition_path": "/dev/dm-0",
> "bluefs_wal_rotational": "0",
> "bluefs_wal_size": "4290772992",
> "bluefs_wal_support_discard": "1",
> "bluefs_wal_type": "ssd",
> "bluestore_bdev_access_mode": "blk",
> "bluestore_bdev_block_size": "4096",
> "bluestore_bdev_dev_node": "/dev/dm-2",
> "bluestore_bdev_devices": "sdc",
> "bluestore_bdev_driver": "KernelDevice",
> "bluestore_bdev_optimal_io_size": "0",
> "bluestore_bdev_partition_path": "/dev/dm-2",
> "bluestore_bdev_rotational": "1",
> "bluestore_bdev_size": "17175674880",
> "bluestore_bdev_support_discard": "1",
> "bluestore_bdev_type": "hdd",
> "bluestore_min_alloc_size": "4096",
> "ceph_release": "quincy",
> "ceph_version": "ceph version 17.2.7 
> (b12291d110049b2f35e32e0de30d70e9a4c060d2) quincy (stable)",
> "ceph_version_short": "17.2.7",
> "ceph_version_when_created": "",
> "con

[ceph-users] WAL on NVMe/SSD not used after OSD/HDD replace

2024-09-27 Thread mailing-lists


Dear Ceph-users,
I have a problem that I'd like to have your input for.

Preface:
I have got a test-cluster and a productive-cluster. Both are setup the 
same and both are having the same "issue". I am running Ubuntu 22.04 and 
deployed ceph 17.2.3 via cephadm. Upgraded to 17.2.7 later on, which is 
the version we are currently running. Since the issue seem to be the 
exact same on the test-cluster, I will post test-cluster-outputs here 
for better readability.


The issue:
I have replaced disks and after the replacement, it does not show that 
it would use the NVMe as WAL device anymore. The LV still exists, but 
the metadata of the osd does not show it, as it would be with any other 
osd/hdd, that hasnt been replaced.


ODS.1 (incorrect, bluefs_dedicated_wal: "0")
```
{
    "id": 1,
    "arch": "x86_64",
    "back_addr": 
"[v2:192.168.6.241:6802/3213655489,v1:192.168.6.241:6803/3213655489]",

    "back_iface": "",
    "bluefs": "1",
    "bluefs_dedicated_db": "0",
    "bluefs_dedicated_wal": "0",
    "bluefs_single_shared_device": "1",
    "bluestore_bdev_access_mode": "blk",
    "bluestore_bdev_block_size": "4096",
    "bluestore_bdev_dev_node": "/dev/dm-3",
    "bluestore_bdev_devices": "sdd",
    "bluestore_bdev_driver": "KernelDevice",
    "bluestore_bdev_optimal_io_size": "0",
    "bluestore_bdev_partition_path": "/dev/dm-3",
    "bluestore_bdev_rotational": "1",
    "bluestore_bdev_size": "17175674880",
    "bluestore_bdev_support_discard": "1",
    "bluestore_bdev_type": "hdd",
    "bluestore_min_alloc_size": "4096",
    "ceph_release": "quincy",
    "ceph_version": "ceph version 17.2.7 
(b12291d110049b2f35e32e0de30d70e9a4c060d2) quincy (stable)",

    "ceph_version_short": "17.2.7",
    "ceph_version_when_created": "",
    "container_hostname": "bi-ubu-srv-ceph2-01",
    "container_image": 
"quay.io/ceph/ceph@sha256:28323e41a7d17db238bdcc0a4d7f38d272f75c1a499bc30f59b0b504af132c6b",

    "cpu": "AMD EPYC 75F3 32-Core Processor",
    "created_at": "",
    "default_device_class": "hdd",
    "device_ids": "sdd=QEMU_HARDDISK_drive-scsi3",
    "device_paths": "sdd=/dev/disk/by-path/pci-:00:05.0-scsi-0:0:3:0",
    "devices": "sdd",
    "distro": "centos",
    "distro_description": "CentOS Stream 8",
    "distro_version": "8",
    "front_addr": 
"[v2:.241:6800/3213655489,v1:.241:6801/3213655489]",

    "front_iface": "",
    "hb_back_addr": 
"[v2:192.168.6.241:6806/3213655489,v1:192.168.6.241:6807/3213655489]",
    "hb_front_addr": 
"[v2:.241:6804/3213655489,v1:.241:6805/3213655489]",

    "hostname": "bi-ubu-srv-ceph2-01",
    "journal_rotational": "1",
    "kernel_description": "#132-Ubuntu SMP Thu Aug 29 13:45:52 UTC 2024",
    "kernel_version": "5.15.0-122-generic",
    "mem_swap_kb": "4018172",
    "mem_total_kb": "5025288",
    "network_numa_unknown_ifaces": "back_iface,front_iface",
    "objectstore_numa_unknown_devices": "sdd",
    "os": "Linux",
    "osd_data": "/var/lib/ceph/osd/ceph-1",
    "osd_objectstore": "bluestore",
    "osdspec_affinity": "dashboard-admin-1661853488642",
    "rotational": "1"
}
```

ODS.0 (correct, bluefs_dedicated_wal: "1")
```
{
    "id": 0,
    "arch": "x86_64",
    "back_addr": 
"[v2:192.168.6.241:6810/3249286142,v1:192.168.6.241:6811/3249286142]",

    "back_iface": "",
    "bluefs": "1",
    "bluefs_dedicated_db": "0",
    "bluefs_dedicated_wal": "1",
    "bluefs_single_shared_device": "0",
    "bluefs_wal_access_mode": "blk",
    "bluefs_wal_block_size": "4096",
    "bluefs_wal_dev_node": "/dev/dm-0",
    "bluefs_wal_devices": "sdb",
    "bluefs_wal_driver": "KernelDevice",
    "bluefs_wal_optimal_io_size": "0",
    "bluefs_wal_partition_path": "/dev/dm-0",
    "bluefs_wal_rotational": "0",
    "bluefs_wal_size": "4290772992",
    "bluefs_wal_support_discard": "1",
    "bluefs_wal_type": "ssd",
    "bluestore_bdev_access_mode": "blk",
    "bluestore_bdev_block_size": "4096",
    "bluestore_bdev_dev_node": "/dev/dm-2",
    "bluestore_bdev_devices": "sdc",
    "bluestore_bdev_driver": "KernelDevice",
    "bluestore_bdev_optimal_io_size": "0",
    "bluestore_bdev_partition_path": "/dev/dm-2",
    "bluestore_bdev_rotational": "1",
    "bluestore_bdev_size": "17175674880",
    "bluestore_bdev_support_discard": "1",
    "bluestore_bdev_type": "hdd",
    "bluestore_min_alloc_size": "4096",
    "ceph_release": "quincy",
    "ceph_version": "ceph version 17.2.7 
(b12291d110049b2f35e32e0de30d70e9a4c060d2) quincy (stable)",

    "ceph_version_short": "17.2.7",
    "ceph_version_when_created": "",
    "container_hostname": "bi-ubu-srv-ceph2-01",
    "container_image": 
"quay.io/ceph/ceph@sha256:28323e41a7d17db238bdcc0a4d7f38d272f75c1a499bc30f59b0b504af132c6b",

    "cpu": "AMD EPYC 75F3 32-Core Processor",
    "created_at": "",
    "default_device_class": "hdd",
    "device_ids": 
"sdb=QEMU_HARDDISK_drive-scsi1,sdc=QEMU_HARDDISK_drive-scsi2",
    "device_paths": 
"sdb=/dev/disk/by-path/pci-:00:05.0-scsi-0:0:1:0,sdc=/dev/disk/by-path/pci-:00:05.0-scsi-0:0:2:0",

[ceph-users] Re: All monitors fall down simultaneously when I try to map rbd on client

2024-09-27 Thread Alex from North

By increasing debulg level I found out the following but have no idea how to 
fix this issue.

```
src/osd/OSDMap.cc: 3242: FAILED ceph_assert(pg_upmap_primaries.empty())
```

There is only one topic in google and with no answer
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: All monitors fall down simultaneously when I try to map rbd on client

2024-09-27 Thread Alex from North

yes, this is a bug, indeed.

https://www.spinics.net/lists/ceph-users/msg82468.html

> Remove mappings by:
> $ `ceph osd dump`
> For each pg_upmap_primary entry in the above output:
> $ `ceph osd rm-pg-upmap-primary `
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: All monitors fall down simultaneously when I try to map rbd on client

2024-09-27 Thread Alex from North

fixed by https://www.spinics.net/lists/ceph-users/msg82468.html

CLOSED.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Ceph orchestrator not refreshing device list

2024-09-27 Thread Bob Gibson

Here are the contents from the same directory on our osd node:

ceph-osd31.prod.os:/var/lib/ceph/9b3b3539-59a9-4338-8bab-3badfab6e855# ls -l
total 412
-rw-r--r--  1 root root 366903 Sep 14 14:53 
cephadm.8b92cafd937eb89681ee011f9e70f85937fd09c4bd61ed4a59981d275a1f255b
drwx--  3  167  167   4096 Sep 14 15:01 crash
drwxr-xr-x 12 root root   4096 Sep 15 12:06 custom_config_files
drw-rw  2 root root   4096 Sep 23 17:00 home
drwx--  2  167  167   4096 Sep 26 12:47 osd.84
drwx--  2  167  167   4096 Sep 26 12:47 osd.85
drwx--  2  167  167   4096 Sep 26 12:47 osd.86
drwx--  2  167  167   4096 Sep 26 12:47 osd.87
drwx--  2  167  167   4096 Sep 26 12:47 osd.89
drwx--  2  167  167   4096 Sep 26 12:47 osd.90
drwx--  2  167  167   4096 Sep 26 12:47 osd.91
drwx--  2  167  167   4096 Sep 26 12:47 osd.92
drwx--  2  167  167   4096 Sep 26 12:47 osd.93
drwx--  6 root root   4096 Sep 23 15:59 removed

In our case the osd.88 directory is under the subdirectory named “removed”, the 
same as the other odds which have been converted.

ceph-osd31.prod.os:/var/lib/ceph/9b3b3539-59a9-4338-8bab-3badfab6e855# ls -l 
removed/osd.88_2024-09-23T19\:59\:42.162302Z/
total 64
lrwxrwxrwx 1 167 167   93 Sep 15 12:10 block -> 
/dev/ceph-2a13ec6a-a5f0-4773-8254-c38b915c824a/osd-block-7f8f9778-5ae2-47c1-bd03-a92a3a7a1db1
-rw--- 1 167 167   37 Sep 15 12:10 ceph_fsid
-rw--- 1 167 167  259 Sep 14 15:14 config
-rw--- 1 167 167   37 Sep 15 12:10 fsid
-rw--- 1 167 167   56 Sep 15 12:10 keyring
-rw--- 1 167 1676 Sep 15 12:10 ready
-rw--- 1 167 1673 Sep 14 11:11 require_osd_release
-rw--- 1 167 167   10 Sep 15 12:10 type
-rw--- 1 167 167   38 Sep 14 15:14 unit.configured
-rw--- 1 167 167   48 Sep 14 15:14 unit.created
-rw--- 1 167 167   26 Sep 14 15:06 unit.image
-rw--- 1 167 167   76 Sep 14 15:06 unit.meta
-rw--- 1 167 167 1527 Sep 14 15:06 unit.poststop
-rw--- 1 167 167 2586 Sep 14 15:06 unit.run
-rw--- 1 167 167  334 Sep 14 15:06 unit.stop
-rw--- 1 167 1673 Sep 15 12:10 whoami

On Sep 27, 2024, at 9:30 AM, Eugen Block  wrote:

EXTERNAL EMAIL | USE CAUTION

Oh interesting, I just got into the same situation (I believe) on a
test cluster:

host1:~ # ceph orch ps | grep unknown
osd.1  host6
stopped  72s ago  36m-4096M
  
osd.13 host6
error72s ago  36m-4096M
  

I still had the remainders on the filesystem:

host6:~ # ll /var/lib/ceph/543967bc-e586-32b8-bd2c-2d8b8b168f02/osd.1
insgesamt 68
lrwxrwxrwx 1 ceph ceph  111 27. Sep 14:43 block ->
/dev/mapper/ceph--0e90997f--456e--4a9b--a8f9--a6f1038c1216-osd--block--81e7f32a--a728--4848--b14d--0b86bb7e1c69
lrwxrwxrwx 1 ceph ceph  108 27. Sep 14:43 block.db ->
/dev/mapper/ceph--9ea6e95f--ad43--4e40--8920--2e772b2efa2f-osd--db--f9c57ec1--77c8--4d9a--85df--1dc053a24000

I just removed those two directories to clear the warning, now my
orchestrator can deploy OSDs again on that node.

Hope that helps!

Zitat von Eugen Block :

Right, if you need encryption, a rebuild is required. Your procedure
has already worked 4 times, so I'd say nothing seems wrong with that
per se.
Regarding the stuck device list, do you see the mgr logging anything
suspicious? Especially when you say that it only returns output
after a failover. Those two osd specs are not conflicting since the
first is "unmanaged" after adoption.
Is there something in 'ceph orch osd rm status'? Can you run
'cephadm ceph-volume inventory' locally on that node? Do you see any
hints in the node's syslog? Maybe try a reboot or something?


Zitat von Bob Gibson :

Thanks for your reply Eugen. I’m fairly new to cephadm so I wasn’t
aware that we could manage the drives without rebuilding them.
However, we thought we’d take advantage of this opportunity to also
encrypt the drives, and that does require a rebuild.

I have a theory on why the orchestrator is confused. I want to
create an osd service for each osd node so I can manage drives on a
per-node basis.

I started by creating a spec for the first node:

service_type: osd
service_id: ceph-osd31
placement:
hosts:
- ceph-osd31
spec:
data_devices:
  rotational: 0
  size: '3TB:'
encrypted: true
filter_logic: AND
objectstore: bluestore

But I also see a default spec, “osd”, which has placement set to
“unmanaged”.

`ceph orch ls osd —export` shows the following:

service_type: osd
service_name: osd
unmanaged: true
spec:
filter_logic: AND
objectstore: bluestore
---
service_type: osd
service_id: ceph-osd31
service_name: osd.ceph-osd31
placement:
hosts:
- ceph-osd31
spec:
data_devices:
  rotational: 0
  size: '3TB:'
encrypted: true
filter_logic: AND
objectstore: bluestore

`ceph orch ls osd` shows that I was able to convert 4 drives using my spec:

NAMEPORTS  RUNNING  REFRESHED  AGE  PLACEMENT
osd 95  10m ago-
osd.ceph-osd31   4  10m ago

[ceph-users] Re: v19.2.0 Squid released

2024-09-27 Thread Adam King

WARNING, if you're using cephadm and nfs please don't upgrade to this
release for the time being. There are compatibility issues with cephadm's
deployment of the NFS daemon and ganesha v6 which made its way into the
release container.

On Thu, Sep 26, 2024 at 6:20 PM Laura Flores  wrote:

> We're very happy to announce the first stable release of the Squid series.
>
> We express our gratitude to all members of the Ceph community who
> contributed by proposing pull requests, testing this release, providing
> feedback, and offering valuable suggestions.
>
> Highlights:
>
> RADOS
> * BlueStore has been optimized for better performance in snapshot-intensive
> workloads.
> * BlueStore RocksDB LZ4 compression is now enabled by default to improve
> average performance and "fast device" space usage.
> * Other improvements include more flexible EC configurations, an OpTracker
> to help debug mgr module issues, and better scrub scheduling.
>
> Dashboard
> * Improved navigation layout
>
> CephFS
> * Support for managing CephFS snapshots and clones, as well as snapshot
> schedule management
> * Manage authorization capabilities for CephFS resources
> * Helpers on mounting a CephFS volume
>
> RBD
> * diff-iterate can now execute locally, bringing a dramatic performance
> improvement for QEMU live disk synchronization and backup use cases.
> * Support for cloning from non-user type snapshots is added.
> * rbd-wnbd driver has gained the ability to multiplex image mappings.
>
> RGW
> * The User Accounts feature unlocks several new AWS-compatible IAM APIs for
> the self-service management of users, keys, groups, roles, policy and more.
>
> Crimson/Seastore
> * Crimson's first tech preview release! Supporting RBD workloads on
> Replicated pools. For more information please visit:
> https://ceph.io/en/news/crimson
>
> We encourage you to read the full release notes at
> https://ceph.io/en/news/blog/2024/v19-2-0-squid-released/
>
> * Git at git://github.com/ceph/ceph.git
> * Tarball at https://download.ceph.com/tarballs/ceph-19.2.0.tar.gz
> * Containers at https://quay.io/repository/ceph/ceph
> * For packages, see https://docs.ceph.com/en/latest/install/get-packages/
> * Release git sha1: 16063ff2022298c9300e49a547a16ffda59baf13
>
> --
>
> Laura Flores
>
> She/Her/Hers
>
> Software Engineer, Ceph Storage 
>
> Chicago, IL
>
> lflo...@ibm.com | lflo...@redhat.com 
> M: +17087388804
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] All monitors fall down simultaneously when I try to map rbd on client

2024-09-27 Thread Alex from North

Hello everybody,
found intresting thing: for some reason ALL the monitors crash when I try to 
rbd map on client host.

here is my pool:

root@ceph1:~# ceph osd pool ls
iotest

Here is my rbd in this pool:

root@ceph1:~# rbd ls -p iotest
test1


this is a client creds to connect to this pool:

[client.iotest]
key = AQASVfZm5bPGLBAAXyPWqJvNMBsXsJQcFrSAhg==
caps mgr = "profile rbd pool=iotest"
caps mon = "profile rbd"
caps osd = "profile rbd pool=iotest"

This is rbmap file on a client host:

root@node-stat:/etc/ceph# cat rbdmap 
# RbdDevice Parameters
#poolname/imagename id=client,keyring=/etc/ceph/ceph.client.keyring
iotest/test1 id=iotest,keyring=/etc/ceph/ceph.client.iotest.keyring

So, when I press Enter on command rbd map iotest/test1 --id iotest in the same 
moment ALL the mons go down.
I pul log on pastebin as it is quite long 
https://pastebin.com/iCr8pY1r

All the hints are appreciated. Thanks in advance.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: All monitors fall down simultaneously when I try to map rbd on client

2024-09-27 Thread Konstantin Shalygin

Hi,

> On 27 Sep 2024, at 14:59, Alex from North  wrote:
> 
> By increasing debulg level I found out the following but have no idea how to 
> fix this issue.
> 
> ```
> src/osd/OSDMap.cc: 3242: FAILED ceph_assert(pg_upmap_primaries.empty())
> ```
> 
> There is only one topic in google and with no answer

May be [1] ?


k
[1] https://tracker.ceph.com/issues/66867

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: All monitors fall down simultaneously when I try to map rbd on client

2024-09-27 Thread Frédéric Nass

Hi Alex,

Maybe this one [1] that leads to osd / mon asserts. Have a look at Laura's post 
here [2] for more information.

Updating clients to Reef+ (not sure which kernel added the upmap read feature) 
or removing any pg_upmap_primaries entries may help in your situation.

Regards,
Frédéric.

[1] https://tracker.ceph.com/issues/61948
[2] 
https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/message/GUQCIRZRMGQ3JOXS2PYZL7EPO3ZMYV6R/


- Le 27 Sep 24, à 10:30, Alex from North service.pl...@ya.ru a écrit :

> Hello everybody,
> found intresting thing: for some reason ALL the monitors crash when I try to 
> rbd
> map on client host.
> 
> here is my pool:
> 
> root@ceph1:~# ceph osd pool ls
> iotest
> 
> Here is my rbd in this pool:
> 
> root@ceph1:~# rbd ls -p iotest
> test1
> 
> 
> this is a client creds to connect to this pool:
> 
> [client.iotest]
>key = AQASVfZm5bPGLBAAXyPWqJvNMBsXsJQcFrSAhg==
>caps mgr = "profile rbd pool=iotest"
>caps mon = "profile rbd"
>caps osd = "profile rbd pool=iotest"
> 
> This is rbmap file on a client host:
> 
> root@node-stat:/etc/ceph# cat rbdmap
> # RbdDevice Parameters
> #poolname/imagename id=client,keyring=/etc/ceph/ceph.client.keyring
> iotest/test1 id=iotest,keyring=/etc/ceph/ceph.client.iotest.keyring
> 
> So, when I press Enter on command rbd map iotest/test1 --id iotest in the same
> moment ALL the mons go down.
> I pul log on pastebin as it is quite long
> https://pastebin.com/iCr8pY1r
> 
> All the hints are appreciated. Thanks in advance.
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: v19.2.0 Squid released

2024-09-27 Thread Adam King

We have pushed a new 19.2.0 container image that uses ganesha v5.5 rather
than 6. For those who hit this issue, rerunning the `ceph orch upgrade`
command needed to upgrade to the original 19.2.0 image again (ceph orch
upgrade start quay.io/ceph/ceph:v19.2.0) was tested and confirmed to get
the nfs daemon running again, with the caveat that the
`mgr/cephadm/use_repo_digest` config option must be set to true for cephadm
to be able to handle upgrading to a floating tag image that has been
modified since the previous upgrade. For those who haven't upgraded yet but
were using both cephadm and nfs, it should now be safe to perform this
upgrade.

On Fri, Sep 27, 2024 at 11:40 AM Adam King  wrote:

> WARNING, if you're using cephadm and nfs please don't upgrade to this
> release for the time being. There are compatibility issues with cephadm's
> deployment of the NFS daemon and ganesha v6 which made its way into the
> release container.
>
> On Thu, Sep 26, 2024 at 6:20 PM Laura Flores  wrote:
>
>> We're very happy to announce the first stable release of the Squid series.
>>
>> We express our gratitude to all members of the Ceph community who
>> contributed by proposing pull requests, testing this release, providing
>> feedback, and offering valuable suggestions.
>>
>> Highlights:
>>
>> RADOS
>> * BlueStore has been optimized for better performance in
>> snapshot-intensive
>> workloads.
>> * BlueStore RocksDB LZ4 compression is now enabled by default to improve
>> average performance and "fast device" space usage.
>> * Other improvements include more flexible EC configurations, an OpTracker
>> to help debug mgr module issues, and better scrub scheduling.
>>
>> Dashboard
>> * Improved navigation layout
>>
>> CephFS
>> * Support for managing CephFS snapshots and clones, as well as snapshot
>> schedule management
>> * Manage authorization capabilities for CephFS resources
>> * Helpers on mounting a CephFS volume
>>
>> RBD
>> * diff-iterate can now execute locally, bringing a dramatic performance
>> improvement for QEMU live disk synchronization and backup use cases.
>> * Support for cloning from non-user type snapshots is added.
>> * rbd-wnbd driver has gained the ability to multiplex image mappings.
>>
>> RGW
>> * The User Accounts feature unlocks several new AWS-compatible IAM APIs
>> for
>> the self-service management of users, keys, groups, roles, policy and
>> more.
>>
>> Crimson/Seastore
>> * Crimson's first tech preview release! Supporting RBD workloads on
>> Replicated pools. For more information please visit:
>> https://ceph.io/en/news/crimson
>>
>> We encourage you to read the full release notes at
>> https://ceph.io/en/news/blog/2024/v19-2-0-squid-released/
>>
>> * Git at git://github.com/ceph/ceph.git
>> * Tarball at https://download.ceph.com/tarballs/ceph-19.2.0.tar.gz
>> * Containers at https://quay.io/repository/ceph/ceph
>> * For packages, see https://docs.ceph.com/en/latest/install/get-packages/
>> * Release git sha1: 16063ff2022298c9300e49a547a16ffda59baf13
>>
>> --
>>
>> Laura Flores
>>
>> She/Her/Hers
>>
>> Software Engineer, Ceph Storage 
>>
>> Chicago, IL
>>
>> lflo...@ibm.com | lflo...@redhat.com 
>> M: +17087388804
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
>>
>>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] 17.2.8 release date?

2024-09-27 Thread Szabo, Istvan (Agoda)

Hi,

Do we know roughly when the 17.2.8 Quincy is going to be released?

Thank you


This message is confidential and is for the sole use of the intended 
recipient(s). It may also be privileged or otherwise protected by copyright or 
other legal rules. If you have received it by mistake please let us know by 
reply email and delete it from your system. It is prohibited to copy this 
message or disclose its content to anyone. Any confidentiality or privilege is 
not waived or lost by any mistaken delivery or unauthorized disclosure of the 
message. All messages sent to and from Agoda may be monitored to ensure 
compliance with company policies, to protect the company's interests and to 
remove potential malware. Electronic messages may be intercepted, amended, lost 
or deleted, or contain viruses.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Old MDS container version when: Ceph orch apply mds

2024-09-27 Thread bkennedy

Thanks for noting this, I just imported our last cluster and couldn't get
ceph-exporter to start.  I noticed that the images it was using for
node-exporter and ceph-exporter were not the same as the other clusters!
Wish this was in the adoption documentation.  I have a running list of all
the things I must add/do when adopting a cluster... just another one on the
list!

Thanks again!

-Brent

-Original Message-
From: Eugen Block  
Sent: Friday, August 2, 2024 3:02 AM
To: ceph-users@ceph.io
Subject: [ceph-users] Re: Old MDS container version when: Ceph orch apply
mds

Hi,

it sounds like the mds container_image is not configured properly, you can
set it via:

ceph config set mds container_image quay.io/ceph/ceph:v18.2.2

or just set it globally for all ceph daemons:

ceph config set global container_image quay.io/ceph/ceph:v18.2.2

If you bootstrap a fresh cluster, the image is set globally for you, but it
doesn't do that during an upgrade from a non-cephadm cluster, which requires
to redeploy mds daemons.

Regards,
Eugen


Zitat von opositor...@mail.com:

> Hi All,
> I migrated my CEPH 18.2.2 cluster from a non cephadm configuration.  
> All goes fine except MDS service was deployed in a old version: 17.0.0 
> I'm trying to deploy  MDS daemons using ceph orch but CEPH always 
> download an old MDS image from docker.
>
> How could I deploy the MDS service in the same 18.2.2 version that the 
> rest of services?
>
> [root@master1 ~]# ceph orch apply mds datafs --placement="2 master1
master2"
>
> [root@master1 ~]# ceph orch ps
> NAME   HOST PORTS  STATUS REFRESHED   
> AGE  MEM USE  MEM LIM  VERSIONIMAGE ID   
> CONTAINER ID
> mds.datafs.master1.gcpovr  master1 running (36m) 6m ago   
> 36m37.2M-  17.0.0-7183-g54142666  75e3d7089cea   
> 96682779c7ad
> mds.datafs.master2.oqaxuy  master2 running (36m) 6m ago   
> 36m33.1M-  17.0.0-7183-g54142666  75e3d7089cea   
> a9a647f87c83
> mgr.master master1 running (16h) 6m ago   
> 17h 448M-  18.2.2 3c937764e6f5   
> 70f06fa05b70
> mgr.master2master2 running (16h) 6m ago   
> 17h 524M-  18.2.2 3c937764e6f5   
> 2d0d5376d8b3
> mon.master master1 running (16h) 6m ago   
> 17h 384M2048M  18.2.2 3c937764e6f5   
> 66a65017ce29
> mon.master2master2 running (16h) 6m ago   
> 17h 380M2048M  18.2.2 3c937764e6f5   
> 51d783a9e36c
> osd.0  osd00   running (16h) 3m ago   
> 17h 432M4096M  18.2.2 3c937764e6f5   
> fedff66f5ed2
> osd.1  osd00   running (16h) 3m ago   
> 17h 475M4096M  18.2.2 3c937764e6f5   
> 24e24a1a22e6
> osd.2  osd00   running (16h) 3m ago   
> 17h 516M4096M  18.2.2 3c937764e6f5   
> ccd05451b739
> osd.3  osd00   running (16h) 3m ago   
> 17h 454M4096M  18.2.2 3c937764e6f5   
> f6d8f13c8aaf
> osd.4  master1 running (16h) 6m ago   
> 17h 525M4096M  18.2.2 3c937764e6f5   
> a2dcf9f1a9b7
> osd.5  master2 running (16h) 6m ago   
> 17h 331M4096M  18.2.2 3c937764e6f5   
> b0011e8561a4
>
> [root@master1 ~]# ceph orch ls
> NAMEPORTS  RUNNING  REFRESHED  AGE  PLACEMENT
> mds.datafs 2/2  6m ago 46s  master1;master2;count:2
> mgr2/0  6m ago -
> mon2/0  6m ago -
> osd  6  6m ago -
>
> [root@master1 ~]# ceph versions
> {
> "mon": {
> "ceph version 18.2.2
> (531c0d11a1c5d39fbfe6aa8a521f023abf3bf3e2) reef (stable)": 2
> },
> "mgr": {
> "ceph version 18.2.2
> (531c0d11a1c5d39fbfe6aa8a521f023abf3bf3e2) reef (stable)": 2
> },
> "osd": {
> "ceph version 18.2.2
> (531c0d11a1c5d39fbfe6aa8a521f023abf3bf3e2) reef (stable)": 6
> },
> "mds": {
> "ceph version 17.0.0-7183-g54142666
> (54142666e5705ced88e3e2d91ddc0ff29867a362) quincy (dev)": 2
> },
> "overall": {
> "ceph version 17.0.0-7183-g54142666
> (54142666e5705ced88e3e2d91ddc0ff29867a362) quincy (dev)": 2,
> "ceph version 18.2.2
> (531c0d11a1c5d39fbfe6aa8a521f023abf3bf3e2) reef (stable)": 10
> }
> }
>
> [root@master1 ~]# podman images
> REPOSITORYTAG  IMAGE ID   
> CREATEDSIZE
> quay.io/ceph/ceph v18.2.2  3c937764e6f5   
> 7 weeks ago1.28 GB
> quay.io/ceph/ceph v18  3c937764e6f5   
> 7 weeks ago1.28 GB
> registry.access.redhat.com/ubi8   latest   c70d72aaebb4   
>

[ceph-users] Re: Restore a pool from snapshot

[ceph-users] Re: Mds daemon damaged - assert failed

[ceph-users] Re: Mds daemon damaged - assert failed

[ceph-users] Re: Ceph orchestrator not refreshing device list

[ceph-users] Re: WAL on NVMe/SSD not used after OSD/HDD replace

[ceph-users] Re: device_health_metrics pool automatically recreated

[ceph-users] Re: Ceph orchestrator not refreshing device list

[ceph-users] Re: WAL on NVMe/SSD not used after OSD/HDD replace

[ceph-users] Re: Mds daemon damaged - assert failed

[ceph-users] Re: WAL on NVMe/SSD not used after OSD/HDD replace

[ceph-users] WAL on NVMe/SSD not used after OSD/HDD replace

[ceph-users] Re: All monitors fall down simultaneously when I try to map rbd on client

[ceph-users] Re: All monitors fall down simultaneously when I try to map rbd on client

[ceph-users] Re: All monitors fall down simultaneously when I try to map rbd on client

[ceph-users] Re: Ceph orchestrator not refreshing device list

[ceph-users] Re: v19.2.0 Squid released

[ceph-users] All monitors fall down simultaneously when I try to map rbd on client

[ceph-users] Re: All monitors fall down simultaneously when I try to map rbd on client

[ceph-users] Re: All monitors fall down simultaneously when I try to map rbd on client

[ceph-users] Re: v19.2.0 Squid released

[ceph-users] 17.2.8 release date?

[ceph-users] Re: Old MDS container version when: Ceph orch apply mds

22 matches

Site Navigation

Mail list logo

Footer information