HI Casey,
We're still trying to figure this sync problem out, if you could possibly
tell us anything further we would be deeply grateful!
Our errors are coming from 'data sync'. In `sync status` we pretty
constantly show one shard behind, but a different one each time we run it.
Here's a paste
(cc ceph-users)
Can you tell whether these sync errors are coming from metadata sync or
data sync? Are they blocking sync from making progress according to your
'sync status'?
On 3/8/19 10:23 AM, Trey Palmer wrote:
Casey,
Having done the 'reshard stale-instances delete' earlier on the advic
It appears we eventually got 'data sync init' working.
At least, it's worked on 5 of the 6 sync directions in our 3-node cluster.
The sixth has not run without an error returned, although 'sync status'
does say "preparing for full sync".
Thanks,
Trey
On Wed, Mar 6, 2019 at 1:22 PM Trey Palmer
Casey,
This was the result of trying 'data sync init':
root@c2-rgw1:~# radosgw-admin data sync init
ERROR: source zone not specified
root@c2-rgw1:~# radosgw-admin data sync init --source-zone=
WARNING: cannot find source zone id for name=
ERROR: sync.init_sync_status() returned ret=-2
root@c2-rgw
Casey,
You are spot on that almost all of these are deleted buckets. At some
point in the last few months we deleted and replaced buckets with
underscores in their names, and those are responsible for most of these
errors.
Thanks very much for the reply and explanation. We’ll give ‘data sync
Hi Trey,
I think it's more likely that these stale metadata entries are from
deleted buckets, rather than accidental bucket reshards. When a bucket
is deleted in a multisite configuration, we don't delete its bucket
instance because other zones may still need to sync the object deletes -
and
Casey,
Thanks very much for the reply!
We definitely have lots of errors on sync-disabled buckets and the
workaround for that is obvious (most of them are empty anyway).
Our second form of error is stale buckets. We had dynamic resharding
enabled but have now disabled it (having discovered it w
Hi Christian,
I think you've correctly intuited that the issues are related to the use
of 'bucket sync disable'. There was a bug fix for that feature in
http://tracker.ceph.com/issues/26895, and I recently found that a block
of code was missing from its luminous backport. That missing code is
Hi Christian,
To be on the safe side and future proof yourself will want to go ahead and set
the following in your ceph.conf file, and then issue a restart to your RGW
instances.
rgw_dynamic_resharding = false
There are a number of issues with dynamic resharding, multisite rgw problems
being
Matthew, first of all, let me say we very much appreciate your help!
So I don’t think we turned dynamic resharding on, nor did we manually reshard
buckets. Seems like it defaults to on for luminous but the mimic docs say it’s
not supported in multisite. So do we need to disable it manually via
t;id": "331d3f1e-1b72-4c56-bb5a-d1d0fcf6d0b8",
>
> "name": "sv3-prod",
>
> "endpoints": [
>
> "http://sv3-ceph-rgw1:8080";
>
> ],
>
> "log_meta": "false",
>
>
dc11-prod.rgw.buckets.data",
"data_extra_pool": "dc11-prod.rgw.buckets.non-ec",
"index_type": 0,
"compression": ""
}
}
],
"metadata_heap": "",
&
adata syncing and data syncing ( both separate issues ) that you could be
hitting.
Thanks,
____________
From: ceph-users on behalf of Christian
Rice
Sent: Wednesday, February 27, 2019 7:05 PM
To: ceph-users
Subject: [ceph-users] radosgw sync falling behind regularly
Deb
you could be
hitting.
Thanks,
____________
From: ceph-users on behalf of Christian
Rice
Sent: Wednesday, February 27, 2019 7:05 PM
To: ceph-users
Subject: [ceph-users] radosgw sync falling behind regularly
Debian 9; ceph 12.8.8-bpo90+1; no rbd or cephfs, just radosgw;
g.
Thanks,
From: ceph-users on behalf of Christian
Rice
Sent: Wednesday, February 27, 2019 7:05 PM
To: ceph-users
Subject: [ceph-users] radosgw sync falling behind regularly
Debian 9; ceph 12.8.8-bpo90+1; no rbd or cephfs, just radosgw; three clusters
in one
g.
Thanks,
From: ceph-users on behalf of Christian
Rice
Sent: Wednesday, February 27, 2019 7:05 PM
To: ceph-users
Subject: [ceph-users] radosgw sync falling behind regularly
Debian 9; ceph 12.8.8-bpo90+1; no rbd or cephfs, just radosgw; three clusters
in one zonegroup.
Often we find either m
Debian 9; ceph 12.8.8-bpo90+1; no rbd or cephfs, just radosgw; three clusters
in one zonegroup.
Often we find either metadata or data sync behind, and it doesn’t look to ever
recover until…we restart the endpoint radosgw target service.
eg at 15:45:40:
dc11-ceph-rgw1:/var/log/ceph# radosgw-adm
17 matches
Mail list logo