As I'm looking into this more and more, I'm realizing how big of a problem garbage collection has been in our clusters. The biggest cluster has over 1 billion objects in its gc list (the command is still running, it just recently passed by the 1B mark). Does anyone have any guidance on what to do to optimize the gc settings to hopefully/eventually catch up on this as well as stay caught up once we are? I'm not expecting an overnight fix, but something that could feasibly be caught up within 6 months would be wonderful.
On Mon, Oct 23, 2017 at 11:18 AM David Turner <drakonst...@gmail.com> wrote: > We recently deleted a bucket that was no longer needed that had 400TB of > data in it to help as our cluster is getting quite full. That should free > up about 30% of our cluster used space, but in the last week we haven't > seen nearly a fraction of that free up yet. I left the cluster with this > running over the weekend to try to help `radosgw-admin --rgw-realm=local gc > process`, but it didn't seem to put a dent into it. Our regular ingestion > is faster than how fast the garbage collection is cleaning stuff up, but > our regular ingestion is less than 2% growth at it's maximum. > > As of yesterday our gc list was over 350GB when dumped into a file (I had > to stop it as the disk I was redirecting the output to was almost full). > In the future I will use the --bypass-gc option to avoid the cleanup, but > is there a way to speed up the gc once you're in this position? There were > about 8M objects that were deleted from this bucket. I've come across a > few references to the rgw-gc settings in the config, but nothing that > explained the times well enough for me to feel comfortable doing anything > with them. > > On Tue, Jul 25, 2017 at 4:01 PM Bryan Stillwell <bstillw...@godaddy.com> > wrote: > >> Excellent, thank you! It does exist in 0.94.10! :) >> >> >> >> Bryan >> >> >> >> *From: *Pavan Rallabhandi <prallabha...@walmartlabs.com> >> *Date: *Tuesday, July 25, 2017 at 11:21 AM >> >> >> *To: *Bryan Stillwell <bstillw...@godaddy.com>, " >> ceph-users@lists.ceph.com" <ceph-users@lists.ceph.com> >> *Subject: *Re: [ceph-users] Speeding up garbage collection in RGW >> >> >> >> I’ve just realized that the option is present in Hammer (0.94.10) as >> well, you should try that. >> >> >> >> *From: *Bryan Stillwell <bstillw...@godaddy.com> >> *Date: *Tuesday, 25 July 2017 at 9:45 PM >> *To: *Pavan Rallabhandi <prallabha...@walmartlabs.com>, " >> ceph-users@lists.ceph.com" <ceph-users@lists.ceph.com> >> *Subject: *EXT: Re: [ceph-users] Speeding up garbage collection in RGW >> >> >> >> Unfortunately, we're on hammer still (0.94.10). That option looks like >> it would work better, so maybe it's time to move the upgrade up in the >> schedule. >> >> >> >> I've been playing with the various gc options and I haven't seen any >> speedups like we would need to remove them in a reasonable amount of time. >> >> >> >> Thanks, >> >> Bryan >> >> >> >> *From: *Pavan Rallabhandi <prallabha...@walmartlabs.com> >> *Date: *Tuesday, July 25, 2017 at 3:00 AM >> *To: *Bryan Stillwell <bstillw...@godaddy.com>, " >> ceph-users@lists.ceph.com" <ceph-users@lists.ceph.com> >> *Subject: *Re: [ceph-users] Speeding up garbage collection in RGW >> >> >> >> If your Ceph version is >=Jewel, you can try the `--bypass-gc` option in >> radosgw-admin, which would remove the tails objects as well without marking >> them to be GCed. >> >> >> >> Thanks, >> >> >> >> On 25/07/17, 1:34 AM, "ceph-users on behalf of Bryan Stillwell" < >> ceph-users-boun...@lists.ceph.com on behalf of bstillw...@godaddy.com> >> wrote: >> >> >> >> I'm in the process of cleaning up a test that an internal customer >> did on our production cluster that produced over a billion objects spread >> across 6000 buckets. So far I've been removing the buckets like this: >> >> >> >> printf %s\\n bucket{1..6000} | xargs -I{} -n 1 -P 32 radosgw-admin >> bucket rm --bucket={} --purge-objects >> >> >> >> However, the disk usage doesn't seem to be getting reduced at the >> same rate the objects are being removed. From what I can tell a large >> number of the objects are waiting for garbage collection. >> >> >> >> When I first read the docs it sounded like the garbage collector >> would only remove 32 objects every hour, but after looking through the logs >> I'm seeing about 55,000 objects removed every hour. That's about 1.3 >> million a day, so at this rate it'll take a couple years to clean up the >> rest! For comparison, the purge-objects command above is removing (but not >> GC'ing) about 30 million objects a day, so a much more manageable 33 days >> to finish. >> >> >> >> I've done some digging and it appears like I should be changing these >> configuration options: >> >> >> >> rgw gc max objs (default: 32) >> >> rgw gc obj min wait (default: 7200) >> >> rgw gc processor max time (default: 3600) >> >> rgw gc processor period (default: 3600) >> >> >> >> A few questions I have though are: >> >> >> >> Should 'rgw gc processor max time' and 'rgw gc processor period' >> always be set to the same value? >> >> >> >> Which would be better, increasing 'rgw gc max objs' to something like >> 1024, or reducing the 'rgw gc processor' times to something like 60 seconds? >> >> >> >> Any other guidance on the best way to adjust these values? >> >> >> >> Thanks, >> >> Bryan >> >> >> >> >> >> _______________________________________________ >> >> ceph-users mailing list >> >> ceph-users@lists.ceph.com >> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> >> >> >> >> >> >> _______________________________________________ >> ceph-users mailing list >> ceph-users@lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com