Re: [ceph-users] radosgw sync falling behind regularly

2019-03-11 Thread Trey Palmer
HI Casey, We're still trying to figure this sync problem out, if you could possibly tell us anything further we would be deeply grateful! Our errors are coming from 'data sync'. In `sync status` we pretty constantly show one shard behind, but a different one each time we run it. Here's a paste

Re: [ceph-users] radosgw sync falling behind regularly

2019-03-08 Thread Casey Bodley
(cc ceph-users) Can you tell whether these sync errors are coming from metadata sync or data sync? Are they blocking sync from making progress according to your 'sync status'? On 3/8/19 10:23 AM, Trey Palmer wrote: Casey, Having done the 'reshard stale-instances delete' earlier on the advic

Re: [ceph-users] radosgw sync falling behind regularly

2019-03-06 Thread Trey Palmer
It appears we eventually got 'data sync init' working. At least, it's worked on 5 of the 6 sync directions in our 3-node cluster. The sixth has not run without an error returned, although 'sync status' does say "preparing for full sync". Thanks, Trey On Wed, Mar 6, 2019 at 1:22 PM Trey Palmer

Re: [ceph-users] radosgw sync falling behind regularly

2019-03-06 Thread Trey Palmer
Casey, This was the result of trying 'data sync init': root@c2-rgw1:~# radosgw-admin data sync init ERROR: source zone not specified root@c2-rgw1:~# radosgw-admin data sync init --source-zone= WARNING: cannot find source zone id for name= ERROR: sync.init_sync_status() returned ret=-2 root@c2-rgw

Re: [ceph-users] radosgw sync falling behind regularly

2019-03-06 Thread Trey Palmer
Casey, You are spot on that almost all of these are deleted buckets. At some point in the last few months we deleted and replaced buckets with underscores in their names, and those are responsible for most of these errors. Thanks very much for the reply and explanation. We’ll give ‘data sync

Re: [ceph-users] radosgw sync falling behind regularly

2019-03-06 Thread Casey Bodley
Hi Trey, I think it's more likely that these stale metadata entries are from deleted buckets, rather than accidental bucket reshards. When a bucket is deleted in a multisite configuration, we don't delete its bucket instance because other zones may still need to sync the object deletes - and

Re: [ceph-users] radosgw sync falling behind regularly

2019-03-05 Thread Trey Palmer
Casey, Thanks very much for the reply! We definitely have lots of errors on sync-disabled buckets and the workaround for that is obvious (most of them are empty anyway). Our second form of error is stale buckets. We had dynamic resharding enabled but have now disabled it (having discovered it w

Re: [ceph-users] radosgw sync falling behind regularly

2019-03-05 Thread Casey Bodley
Hi Christian, I think you've correctly intuited that the issues are related to the use of 'bucket sync disable'. There was a bug fix for that feature in http://tracker.ceph.com/issues/26895, and I recently found that a block of code was missing from its luminous backport. That missing code is

Re: [ceph-users] radosgw sync falling behind regularly

2019-03-05 Thread Matthew H
Hi Christian, To be on the safe side and future proof yourself will want to go ahead and set the following in your ceph.conf file, and then issue a restart to your RGW instances. rgw_dynamic_resharding = false There are a number of issues with dynamic resharding, multisite rgw problems being

Re: [ceph-users] radosgw sync falling behind regularly

2019-03-05 Thread Christian Rice
Matthew, first of all, let me say we very much appreciate your help! So I don’t think we turned dynamic resharding on, nor did we manually reshard buckets. Seems like it defaults to on for luminous but the mimic docs say it’s not supported in multisite. So do we need to disable it manually via

Re: [ceph-users] radosgw sync falling behind regularly

2019-03-05 Thread Trey Palmer
t;id": "331d3f1e-1b72-4c56-bb5a-d1d0fcf6d0b8", > > "name": "sv3-prod", > > "endpoints": [ > > "http://sv3-ceph-rgw1:8080"; > > ], > > "log_meta": "false", > >

Re: [ceph-users] radosgw sync falling behind regularly

2019-03-04 Thread Christian Rice
dc11-prod.rgw.buckets.data", "data_extra_pool": "dc11-prod.rgw.buckets.non-ec", "index_type": 0, "compression": "" } } ], "metadata_heap": "", &

Re: [ceph-users] radosgw sync falling behind regularly

2019-03-04 Thread Matthew H
adata syncing and data syncing ( both separate issues ) that you could be hitting. Thanks, ____________ From: ceph-users on behalf of Christian Rice Sent: Wednesday, February 27, 2019 7:05 PM To: ceph-users Subject: [ceph-users] radosgw sync falling behind regularly Deb

Re: [ceph-users] radosgw sync falling behind regularly

2019-03-04 Thread Christian Rice
you could be hitting. Thanks, ____________ From: ceph-users on behalf of Christian Rice Sent: Wednesday, February 27, 2019 7:05 PM To: ceph-users Subject: [ceph-users] radosgw sync falling behind regularly Debian 9; ceph 12.8.8-bpo90+1; no rbd or cephfs, just radosgw;

Re: [ceph-users] radosgw sync falling behind regularly

2019-02-28 Thread Christian Rice
g. Thanks, From: ceph-users on behalf of Christian Rice Sent: Wednesday, February 27, 2019 7:05 PM To: ceph-users Subject: [ceph-users] radosgw sync falling behind regularly Debian 9; ceph 12.8.8-bpo90+1; no rbd or cephfs, just radosgw; three clusters in one

Re: [ceph-users] radosgw sync falling behind regularly

2019-02-27 Thread Matthew H
g. Thanks, From: ceph-users on behalf of Christian Rice Sent: Wednesday, February 27, 2019 7:05 PM To: ceph-users Subject: [ceph-users] radosgw sync falling behind regularly Debian 9; ceph 12.8.8-bpo90+1; no rbd or cephfs, just radosgw; three clusters in one zonegroup. Often we find either m

[ceph-users] radosgw sync falling behind regularly

2019-02-27 Thread Christian Rice
Debian 9; ceph 12.8.8-bpo90+1; no rbd or cephfs, just radosgw; three clusters in one zonegroup. Often we find either metadata or data sync behind, and it doesn’t look to ever recover until…we restart the endpoint radosgw target service. eg at 15:45:40: dc11-ceph-rgw1:/var/log/ceph# radosgw-adm