[ceph-users] Re: crashing OSDs with FAILED ceph_assert

2022-03-12 Thread Denis Polom
Hi Igor, also I found that this happens only on my Ceph clusters with RBD. Clusters with CephFS and RGW are fine. Thank you On 3/12/22 17:06, Igor Fedotov wrote: Denis, may be there is something interesting in dmesg or smartctl output? Are all OSDs/nodes in the cluster affected? When doe

[ceph-users] Dockerized ceph hangs on cryptsetup during osd_ceph_volume_activate

2022-03-12 Thread Zachary Winnerman
Hi, I decided to redo some experimentation with regards to my previous thread I opened, and I think I got it, but I'm having an issue where it looks like ceph-volume lvm activate isn't calling cryptsetup properly, causing it to hang. Here is the command I used to run the docker container: do

[ceph-users] Re: Migrating OSDs to dockerized ceph

2022-03-12 Thread Zachary Winnerman
Thanks for the reply! Correct me if I'm wrong, but this seems to only detects if dmcrypt is enabled or not. It doesn't seem to support any other features I need like LVM. Thanks again, Zach On 3/12/22 18:42, York Huang wrote: May ceph-ansible help. https://github.com/ceph/ceph-ansible/blob

[ceph-users] Re: Migrating OSDs to dockerized ceph

2022-03-12 Thread York Huang
May ceph-ansible help. https://github.com/ceph/ceph-ansible/blob/stable-6.0/infrastructure-playbooks/switch-from-non-containerized-to-containerized-ceph-daemons.yml     -- Original -- From:  "Zachary Winnerman"https://docs.ceph.com/en/pacific/cephadm/adoption.html

[ceph-users] Re: How often should I scrub the filesystem ?

2022-03-12 Thread Chris Palmer
Ok, restarting mds.0 cleared it. I then restarted the others until this one was again active, and repeated the scrub ~mdsdir which was then clean. I don't know what caused it, or why restarting the MDS was necessary but it has done the trick. On 12/03/2022 19:14, Chris Palmer wrote: Hi Milan

[ceph-users] Re: How often should I scrub the filesystem ?

2022-03-12 Thread Chris Palmer
Hi Miland (or anyone else who can help...) Reading this thread made me realise I had overlooked cephfs scrubbing, so i tried it on a small 16.2.7 cluster. The normal forward scrub showed nothing. However "ceph tell mds.0 scrub start ~mdsdir recursive" did find one backtrace error (putting the

[ceph-users] Re: crashing OSDs with FAILED ceph_assert

2022-03-12 Thread Denis Polom
Hi Igor I would say that randomly almost all OSDs. There is any error message in kernel log and smartctl shows disks as healthy. # smartctl -l error /dev/sdh smartctl 7.0 2018-12-30 r4883 [x86_64-linux-5.4.0-89-generic] (local build) Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.

[ceph-users] Re: crashing OSDs with FAILED ceph_assert

2022-03-12 Thread Igor Fedotov
Denis, may be there is something interesting in dmesg or smartctl output? Are all OSDs/nodes in the cluster affected? When does that start to happen? How often? Thanks, Igor On 3/12/2022 6:14 PM, Denis Polom wrote: Hi Igor, before the assertion there is 2022-03-12T10:15:35.879+0100 7f

[ceph-users] Re: crashing OSDs with FAILED ceph_assert

2022-03-12 Thread Denis Polom
Hi Igor, before the assertion there is 2022-03-12T10:15:35.879+0100 7f0e61055700 -1 bdev(0x55a61c6a6000 /var/lib/ceph/osd/ceph-48/block) aio_submit retries 5 2022-03-12T10:15:35.883+0100 7f0e6d06d700 -1 bdev(0x55a61c6a6000 /var/lib/ceph/osd/ceph-48/block) aio_submit retries 2 2022-03-12T10:15:

[ceph-users] Re: crashing OSDs with FAILED ceph_assert

2022-03-12 Thread Igor Fedotov
Hi Denis, please share OSD log output preceding the assertion. It usually has some helpful information, e.g. error code, about the root cause. Thanks, Igor On 3/12/2022 5:01 PM, Denis Polom wrote: Hi, I have Ceph cluster version Pacific 16.2.7 with RBD pool and OSDs made on SSDs with DB

[ceph-users] crashing OSDs with FAILED ceph_assert

2022-03-12 Thread Denis Polom
Hi, I have Ceph cluster version Pacific 16.2.7 with RBD pool and OSDs made on SSDs with DB on separete NVMe. What I observe OSDs are crashing randomly. Output of crash info is: {     "archived": "2022-03-12 11:44:37.251897",     "assert_condition": "r == 0",     "assert_file": "/build/ceph-1