[ceph-users] LifecycleConfiguration is removing files too soon

2022-05-11 Thread Richard Hopman
Hi, we are running a Ceph Nautilus cluster (14.2.22) and have some issues with the lifecycle expiration of files. When setting a x day expiration on a bucket the file is removed too soon (usually a few seconds for 1 day expiration, up to a few minutes for 3-7 days). When using a specific ex

[ceph-users] Re: LifecycleConfiguration is removing files too soon

2022-05-11 Thread Konstantin Shalygin
Hi, Do you aware that LC works like in Amazon S3? > Lifecycle rules run once a day at midnight Universal Coordinated Time (UTC). Maybe you are close to this time and it seems to you that the objects are being deleted too early? Try setting 2 days instead of 1 day k > On 11 May 2022, at 13:

[ceph-users] Re: ceph osd crush move exception

2022-05-11 Thread Eugen Block
Hi, you're right, the choose_args are coming from the balancer, but I'm wondering why they would affect the crush move. Do you re-deploy the same hosts when resizing or do you add new hosts? Maybe this would somehow explain why the balancer choose_args could affect a move operation. Zi

[ceph-users] Re: 16.2.8 pacific QE validation status, RC2 available for testing

2022-05-11 Thread Casey Bodley
agreed, rgw is still a go for 16.2.8 On Tue, May 10, 2022 at 7:45 PM Yuri Weinstein wrote: > > Josh, David, assuming Casey agrees, this release is ready for publishing. > > Thx > YuriW > > On Tue, May 10, 2022 at 3:47 PM Neha Ojha wrote: >> >> Hi Yuri, >> >> rados and upgrade/pacific-p2p look go

[ceph-users] Re: LifecycleConfiguration is removing files too soon

2022-05-11 Thread Soumya Koduri
Hi, On 5/11/22 16:26, Richard Hopman wrote: Hi, we are running a Ceph Nautilus cluster (14.2.22) and have some issues with the lifecycle expiration of files. When setting a x day expiration on a bucket the file is removed too soon (usually a few seconds for 1 day expiration, up to a few m

[ceph-users] Ceph-rados removes tags on object copy

2022-05-11 Thread Tadas
Hello, I'm having issue with ceph-rados removing object tags, when updating object metadata (copying object on top of itself). This happens with ceph Nautilus and ceph Pacific clusters. Does not happen with AWS S3. How to reproduce this: https://gist.github.com/Seitanas/5645a15747d43de55b9d2913

[ceph-users] Re: Erasure-coded PG stuck in the failed_repair state

2022-05-11 Thread Robert Appleyard - STFC UKRI
Hi, Thanks for your reply, we have let it scrub and it’s still active+clean+inconsistent+failed_repair and we still get the same error: [root@ceph-adm1 ~]# rados list-inconsistent-obj 11.2b5 No scrub information available for pg 11.2b5 error 2: (2) No such file or directory My suspicion is that

[ceph-users] Re: Newer linux kernel cephfs clients is more trouble?

2022-05-11 Thread David Rivera
Hi, My experience is similar, I was also using elrepo kernels on CentOS 8. Kernels 5.14+ were causing problems, I had to go back to 5.11. I did not test 5.12-5.13. I did not have enough time to narrow down the system instability to Ceph. Currently, I'm using the included Rocky Linux 8 kernels (4.1

[ceph-users] Re: Newer linux kernel cephfs clients is more trouble?

2022-05-11 Thread Alex Closs
Hey y'all - As a datapoint, I *don't* see this issue on 5.17.4-200.fc35.x86_64. Hosts are Fedora 35 server, with 17.2.0. Happy to test or provide more data from this cluster if it would be helpful. -Alex On May 11, 2022, 2:02 PM -0400, David Rivera , wrote: > Hi, > > My experience is similar, I

[ceph-users] Re: reinstalled node with OSD

2022-05-11 Thread Harry G. Coin
bbk, It did help!  Thank you. Here's a slightly more 'with the osd-fsid details filled in' procedure for moving a 'dockerized' / container-run OSD set of drives to a replacement server/motherboard (or the same server with blank/new/fresh reinstalled OS).  For occasions when the 'new setup' wil

[ceph-users] The last 15 'degraded' items take as many hours as the first 15K?

2022-05-11 Thread Harry G. Coin
Might someone explain why the count of degraded items can drop thousands, sometimes tens of thousands in the same number of hours it takes to go from 10 to 0?  For example, when an OSD or a host with a few OSD's goes offline for a while, reboots. Sitting at one complete and entire degraded obj

[ceph-users] Re: The last 15 'degraded' items take as many hours as the first 15K?

2022-05-11 Thread Anthony D'Atri
Small objects recover faster than large ones. But especially, early in the process many OSDs / PGs are recovering in parallel. Toward the end there’s a long tail where parallelism is limited by osd_max_backfills, say the remaining PGs to recover are all on a single OSD, they will execute seria

[ceph-users] Re: The last 15 'degraded' items take as many hours as the first 15K?

2022-05-11 Thread Harry G. Coin
It's a little four host, 4 OSD/host HDD cluster with a 5th doing the non-osd work.  Nearly entirely cephfs load. On 5/11/22 17:47, Josh Baergen wrote: Is this on SSD or HDD? RGW index, RBD, or ...? Those all change the math on single-object recovery time. Having said that...if the object is no