Casey, Thanks. I picked a few buckets in question and they have not been resharded (num_shards not changed since creation). However, radosgw-admin lc reshard fix --bucket BUCKET did restore the lifecycle and radosgw-admin lc process --bucket BUCKETNAME did start deleting tihngs as expected on the slave side, I will check one I did did not process by hand to see if it has run tomorrow. I think may still have to change rgw_lc_max_workerrgw_lc_max_wp_workerrgw_lifecycle_work_time but we will see.
-Chris On Monday, December 9, 2024 at 08:17:24 AM MST, Casey Bodley <cbod...@redhat.com> wrote: hi Chris, https://docs.ceph.com/en/latest/radosgw/dynamicresharding/#lifecycle-fixes may be relevant here. there's a `radosgw-admin lc reshard fix` command that you can run on the secondary site to add buckets back to the lc list. if you omit the --bucket argument, it should scan all buckets and re-link everything with a lifecycle policy On Fri, Dec 6, 2024 at 5:04 PM Christopher Durham <caduceu...@aol.com> wrote: > > > I have 18.2.4 on Rocky 9 Linux.This system has been updated from octopus -> > pacific -> quincy (18.2.2) -> (el8->el9 reinstall of each server, but ceph > osd and mon survival) -> reef (18.2.4) over several years. > > It appears that I have two probably related problems with lifecycle > expiration in a multsite configuration. > I have two zones, one on each side of a multisite. I recently discovered > (about a month after the el9 and reef 18.2.4 updates) that lifecycle > expiration was (mostly) not working on the secondary zone side.I had thought > initially that there may be replication issues, but while there are > replication issues on individual buckets that required me to full sync > individual buckets, the majority of the issues are becauselifecycle > expiration is not working on the secondary side. > The observation that caused me to think lifecycle is the issue is that based > on a lifecycle policy for a given bucket, all objects in that bucket should > be already deleted.What we are seeing is that all objects have been deleted > from the bucket on the master zone, but NONE of them have been deleted on the > slave side.This may vary based on the date the objects were created across > multiple lifecycle runs on the master side, but objects never get > deleted/expired on the slave side. > I tracked this down to one of two causes, let's say for a given bucket bucket1 > > 1. radosgw-admin lc list on the master shows that the bucket completes its > lifecycle processing periodically. But on the slave side, it shows: > "started": "Thu, 01 Jan 1970 ...""status": "UNINITIAL" > If I run: > radosgw-admin lc process --bucket bucket1 > that particular bucket flushes all of its expired objects (takes awhile). But > as far as I can tell at this point, it never runs lifecycle again on the > slave side > > Now, let's say I have bucket2. > 2. radosgw-admin lc list on the slave side does NOT show the bucket in the > json output, yet the same command on the master side shows it! > > Given this, if I run > radosgw-admin lc process --bucket2 > causes C++ exceptions and the command crashes on the slave side (makes sense, > actually) > > Yet in this case if I do: > aws --profile bucket2_owner s3api get-bucket-lifecycle-configuration --bucket > bucket2 > it shows the lifecycle configuration for the bucket, regardless whether I > point the awscli to the master or slave zone. > In this case, if I redeploy the lifecycle with > put-bucket-lifecycle-configuration to the master side, then thelifecycle > status shows up in > radosgw-admin lc list > on the slave side (as well as on the master) as UNINITIAL, and this issue > devolves to #1 above, > Note that lifecycle expiration on the slave side does work for some number of > buckets, but most remain in the UNINITIAL state, and others not there at all > until Iredeploy the lifecycle. The slave side is a lot more active in reading > and writing. > > So, why would the bucket not show up in lc list on the slave side, where it > had before (I can't say how long ago 'before' was)?How can I get it to > automatically perform lifecycle on the slave side? Would this perhaps be > related to > > rgw_lc_max_workerrgw_lc_max_wp_workerrgw_lifecycle_work_time > It appears that lifecycle processing is independent on each side, meaning > that a lifecycle processing of bucket A on one side runs separately from > lifecycle processing of bucket A on the other side, and as such an object may > exist on one side for a time when it has been already deleted on the other > side. > > How does rgw_lifecycle_work_time work? Does it mean that outside of the > work_time window no new lifecyle processing starts, or that those in process > abort/stop? > Either way this may explain my observations as to too many buckets staying in > UNINITIAL when those that are processing have a lot of data to delete. > And why is this last one rgw_lifecycle_work_time and not rgw_lc_work_time? > Anyway, any help on theses issues would be appreciated. Thanks > \-Chris > _______________________________________________ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io _______________________________________________ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io _______________________________________________ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io