[ceph-users] Re: MDS_DAMAGE dir_frag

2022-12-12 Thread William Edwards
> Op 12 dec. 2022 om 22:47 heeft Sascha Lucas het > volgende geschreven: > > Hi Greg, > >> On Mon, 12 Dec 2022, Gregory Farnum wrote: >> >> On Mon, Dec 12, 2022 at 12:10 PM Sascha Lucas wrote: > >>> A follow-up of [2] also mentioned having random meta-data corruption: "We >>> have 4 cluste

[ceph-users] Demystify EC CLAY and LRC helper chunks?

2022-12-12 Thread Sean Matheny
Hi there, We've done some pretty extensive testing on our new cluster of 11 x (24x18TB HDD, 2x2.9TB NVMe) nodes for erasure code. We were particularly interested in the some of the alternate plugins like CLAY and LRC, however I came across some really curious behaviour that I couldn't find an

[ceph-users] Re: MDS_DAMAGE dir_frag

2022-12-12 Thread Sascha Lucas
Hi Greg, On Mon, 12 Dec 2022, Gregory Farnum wrote: On Mon, Dec 12, 2022 at 12:10 PM Sascha Lucas wrote: A follow-up of [2] also mentioned having random meta-data corruption: "We have 4 clusters (all running same version) and have experienced meta-data corruption on the majority of them at

[ceph-users] Re: MDS_DAMAGE dir_frag

2022-12-12 Thread Gregory Farnum
On Mon, Dec 12, 2022 at 12:10 PM Sascha Lucas wrote: > Hi Dhairya, > > On Mon, 12 Dec 2022, Dhairya Parmar wrote: > > > You might want to look at [1] for this, also I found a relevant thread > [2] > > that could be helpful. > > > > Thanks a lot. I already found [1,2], too. But I did not considere

[ceph-users] Migrate Individual Buckets

2022-12-12 Thread Benjamin . Zieglmeier
Hello. We are in the process of building new stage (non-production) Ceph RGW clusters hosting s3 buckets. We are looking to have our customers migrate their non-production buckets to these new clusters. We want to help ease the migration, to hopefully improve adoption, and I wanted to ask the g

[ceph-users] Re: MDS_DAMAGE dir_frag

2022-12-12 Thread Sascha Lucas
Hi Dhairya, On Mon, 12 Dec 2022, Dhairya Parmar wrote: You might want to look at [1] for this, also I found a relevant thread [2] that could be helpful. Thanks a lot. I already found [1,2], too. But I did not considered it, because I felt not having a "disaster"? Nothing seems broken nor cr

[ceph-users] Re: Incomplete PGs

2022-12-12 Thread Eugen Block
Please provide more details, for example the ceph osd tree and the crush rule for pool 6. Also a ceph status would help. Zitat von "Hayashida, Mami" : Our small Ceph (Nautilus) cluster experienced a series of failures over a period of time, including losing one OSD node completely. I have b

[ceph-users] Incomplete PGs

2022-12-12 Thread Hayashida, Mami
Our small Ceph (Nautilus) cluster experienced a series of failures over a period of time, including losing one OSD node completely. I have been trying to restore the cluster since then, but running into one problem after another. Currently I have 19 PGs that are marked inactive + incomplete. As

[ceph-users] Re: Reduce recovery bandwidth

2022-12-12 Thread Murilo Morais
Martin, thank you so much for posting this discussion, it helped a lot! Em sáb., 10 de dez. de 2022 às 07:46, Konold, Martin < martin.kon...@konsec.com> escreveu: > Hi Murilo, > > I recommend upgrading to 17.2.5 and the follow the instruction as > documented in > > https://www.spinics.net/lists/c

[ceph-users] Re: MDS_DAMAGE dir_frag

2022-12-12 Thread Dhairya Parmar
Hi there, You might want to look at [1] for this, also I found a relevant thread [2] that could be helpful. [1] https://docs.ceph.com/en/latest/cephfs/disaster-recovery-experts/#disaster-recovery-experts [2] https://www.spinics.net/lists/ceph-users/msg53202.html - Dhairya On Mon, Dec 12, 2022

[ceph-users] MDS_DAMAGE dir_frag

2022-12-12 Thread Sascha Lucas
Hi, without any outage/disaster cephFS (17.2.5/cephadm) reports damaged metadata: [root@ceph106 ~]# zcat /var/log/ceph/3cacfa58-55cf-11ed-abaf-5cba2c03dec0/ceph-mds.disklib.ceph106.kbzjbg.log-20221211.gz 2022-12-10T10:12:35.161+ 7fa46779d700 1 mds.disklib.ceph106.kbzjbg Updating MDS map

[ceph-users] Re: ceph-iscsi lock ping pong

2022-12-12 Thread Xiubo Li
Hi Stolte, For the VMware config could you refer to : https://docs.ceph.com/en/latest/rbd/iscsi-initiator-esx/ ? What's the "Path Selection Policy with ALUA" you are using ? The ceph-iscsi couldn't implement the real AA, so if you use the RR I think it will be like this. - Xiubo On 12/12/

[ceph-users] Re: Increase the recovery throughput

2022-12-12 Thread Frank Schilder
Hi Monish, you are probably on mclock scheduler, which ignores these settings. You might want to set them back to defaults, change the scheduler to wpq and then try again if it needs adjusting. there were several threads about "broken" recovery ops scheduling with mclock in the latest versions.

[ceph-users] ceph mgr fail after upgrade to pacific

2022-12-12 Thread Eugen Block
Hi, last week we successfully upgraded from Nautilus to Pacific, and since today I'm experiencing failing MGR daemons. The pods are still running but stopped logging. The standby MGRs take over until all MGRs become unresponsive, we currently have three MGRs. I'm not sure if [1] is the ex

[ceph-users] Re: Increase the recovery throughput

2022-12-12 Thread Monish Selvaraj
Hi Eugen, We tried that already. the osd_max_backfills is in 24 and the osd_recovery_max_active is in 20. On Mon, Dec 12, 2022 at 3:47 PM Eugen Block wrote: > Hi, > > there are many threads dicussing recovery throughput, have you tried > any of the solutions? First thing to try is to increase >

[ceph-users] Re: Increase the recovery throughput

2022-12-12 Thread Eugen Block
Hi, there are many threads dicussing recovery throughput, have you tried any of the solutions? First thing to try is to increase osd_recovery_max_active and osd_max_backfills. What are the current values in your cluster? Zitat von Monish Selvaraj : Hi, Our ceph cluster consists of 20

[ceph-users] ceph-iscsi lock ping pong

2022-12-12 Thread Stolte, Felix
Hi guys, we are using ceph-iscsi to provide block storage for Microsoft Exchange and vmware vsphere. Ceph docs state that you need to configure Windows iSCSI Initatior for fail-over-only but there is no such point for vmware. In my tcmu-runner logs on both ceph-iscsi gateways I see the followin