HI Casey, We're still trying to figure this sync problem out, if you could possibly tell us anything further we would be deeply grateful!
Our errors are coming from 'data sync'. In `sync status` we pretty constantly show one shard behind, but a different one each time we run it. Here's a paste -- these commands were run in rapid succession. root@sv3-ceph-rgw1:~# radosgw-admin sync status realm b3e2afe7-2254-494a-9a34-ce50358779fd (savagebucket) zonegroup de6af748-1a2f-44a1-9d44-30799cf1313e (us) zone 331d3f1e-1b72-4c56-bb5a-d1d0fcf6d0b8 (sv3-prod) metadata sync syncing full sync: 0/64 shards incremental sync: 64/64 shards metadata is caught up with master data sync source: 107d29a0-b732-4bf1-a26e-1f64f820e839 (dc11-prod) syncing full sync: 0/128 shards incremental sync: 128/128 shards data is caught up with source source: 1e27bf9c-3a2f-4845-85b6-33a24bbe1c04 (sv5-corp) syncing full sync: 0/128 shards incremental sync: 128/128 shards data is caught up with source root@sv3-ceph-rgw1:~# radosgw-admin sync status realm b3e2afe7-2254-494a-9a34-ce50358779fd (savagebucket) zonegroup de6af748-1a2f-44a1-9d44-30799cf1313e (us) zone 331d3f1e-1b72-4c56-bb5a-d1d0fcf6d0b8 (sv3-prod) metadata sync syncing full sync: 0/64 shards incremental sync: 64/64 shards metadata is caught up with master data sync source: 107d29a0-b732-4bf1-a26e-1f64f820e839 (dc11-prod) syncing full sync: 0/128 shards incremental sync: 128/128 shards data is behind on 1 shards behind shards: [30] oldest incremental change not applied: 2019-01-19 22:53:23.0.16109s source: 1e27bf9c-3a2f-4845-85b6-33a24bbe1c04 (sv5-corp) syncing full sync: 0/128 shards incremental sync: 128/128 shards data is caught up with source root@sv3-ceph-rgw1:~# Below I'm pasting a small section of log. Thanks so much for looking! Trey Palmer root@sv3-ceph-rgw1:/var/log/ceph# tail -f ceph-rgw-sv3-ceph-rgw1.log | grep -i error 2019-03-08 11:43:07.208572 7fa080cc7700 0 data sync: ERROR: failed to read remote data log info: ret=-2 2019-03-08 11:43:07.211348 7fa080cc7700 0 meta sync: ERROR: RGWBackoffControlCR called coroutine returned -2 2019-03-08 11:43:07.267117 7fa080cc7700 0 data sync: ERROR: failed to read remote data log info: ret=-2 2019-03-08 11:43:07.269631 7fa080cc7700 0 meta sync: ERROR: RGWBackoffControlCR called coroutine returned -2 2019-03-08 11:43:07.895192 7fa080cc7700 0 data sync: ERROR: init sync on dmv/dmv:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18467.134 failed, retcode=-2 2019-03-08 11:43:08.046685 7fa080cc7700 0 data sync: ERROR: init sync on dmv/dmv:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18467.134 failed, retcode=-2 2019-03-08 11:43:08.171277 7fa0870eb700 0 ERROR: failed to get bucket instance info for .bucket.meta.phowe_superset:phowe_superset:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.233 2019-03-08 11:43:08.171748 7fa0850e7700 0 ERROR: failed to get bucket instance info for .bucket.meta.gdfp_dev:gdfp_dev:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.158 2019-03-08 11:43:08.175867 7fa08a0f1700 0 meta sync: ERROR: can't remove key: bucket.instance:phowe_superset/phowe_superset:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.233 ret=-2 2019-03-08 11:43:08.176755 7fa0820e1700 0 data sync: ERROR: init sync on whoiswho/whoiswho:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18467.293 failed, retcode=-2 2019-03-08 11:43:08.176872 7fa0820e1700 0 data sync: ERROR: init sync on dmv/dmv:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18467.134 failed, retcode=-2 2019-03-08 11:43:08.176885 7fa093103700 0 ERROR: failed to get bucket instance info for .bucket.meta.phowe_superset:phowe_superset:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.233 2019-03-08 11:43:08.176925 7fa0820e1700 0 data sync: ERROR: failed to retrieve bucket info for bucket=phowe_superset/phowe_superset:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.233 2019-03-08 11:43:08.177916 7fa0910ff700 0 meta sync: ERROR: can't remove key: bucket.instance:gdfp_dev/gdfp_dev:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.158 ret=-2 2019-03-08 11:43:08.178815 7fa08b0f3700 0 ERROR: failed to get bucket instance info for .bucket.meta.gdfp_dev:gdfp_dev:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.158 2019-03-08 11:43:08.178847 7fa0820e1700 0 data sync: ERROR: failed to retrieve bucket info for bucket=gdfp_dev/gdfp_dev:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.158 2019-03-08 11:43:08.179492 7fa0820e1700 0 data sync: ERROR: init sync on adcreative/adcreative:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.21 failed, retcode=-2 2019-03-08 11:43:08.179529 7fa0820e1700 0 data sync: ERROR: init sync on vulnerability_report/vulnerability-report:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18467.421 failed, retcode=-2 2019-03-08 11:43:08.179770 7fa0820e1700 0 data sync: ERROR: init sync on early_osquery/early-osquery:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.339 failed, retcode=-2 2019-03-08 11:43:08.217393 7fa0820e1700 0 data sync: ERROR: init sync on bugsnag_integration/bugsnag-integration:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18467.328 failed, retcode=-2 2019-03-08 11:43:08.233847 7fa0820e1700 0 data sync: ERROR: init sync on vulnerability_report/vulnerability-report:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18467.421 failed, retcode=-2 2019-03-08 11:43:08.233917 7fa0820e1700 0 data sync: ERROR: init sync on dmv/dmv:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18467.134 failed, retcode=-2 2019-03-08 11:43:08.233998 7fa0820e1700 0 data sync: ERROR: init sync on early_osquery/early-osquery:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.339 failed, retcode=-2 2019-03-08 11:43:08.273391 7fa0820e1700 0 data sync: ERROR: init sync on bugsnag_integration/bugsnag-integration:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18467.328 failed, retcode=-2 2019-03-08 11:43:08.745150 7fa0840e5700 0 ERROR: failed to get bucket instance info for .bucket.meta.event_dashboard:event_dashboard:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18467.148 2019-03-08 11:43:08.745408 7fa08c0f5700 0 ERROR: failed to get bucket instance info for .bucket.meta.produktizr_doc:produktizr_doc:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.241 2019-03-08 11:43:08.749571 7fa0820e1700 0 data sync: ERROR: init sync on ceph/ceph:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18467.427 failed, retcode=-2 2019-03-08 11:43:08.750472 7fa0820e1700 0 data sync: ERROR: init sync on terraform_dev/terraform-dev:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.418 failed, retcode=-2 2019-03-08 11:43:08.750508 7fa08e0f9700 0 meta sync: ERROR: can't remove key: bucket.instance:event_dashboard/event_dashboard:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18467.148 ret=-2 2019-03-08 11:43:08.751094 7fa0868ea700 0 meta sync: ERROR: can't remove key: bucket.instance:produktizr_doc/produktizr_doc:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.241 ret=-2 2019-03-08 11:43:08.751331 7fa08a8f2700 0 ERROR: failed to get bucket instance info for .bucket.meta.event_dashboard:event_dashboard:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18467.148 2019-03-08 11:43:08.751387 7fa0820e1700 0 data sync: ERROR: failed to retrieve bucket info for bucket=event_dashboard/event_dashboard:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18467.148 2019-03-08 11:43:08.751497 7fa0820e1700 0 data sync: ERROR: init sync on pithos_doc/pithos-doc:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18467.393 failed, retcode=-2 2019-03-08 11:43:08.751619 7fa0820e1700 0 data sync: ERROR: init sync on jmeter_sc/jmeter-sc:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18467.360 failed, retcode=-2 2019-03-08 11:43:08.752037 7fa0900fd700 0 ERROR: failed to get bucket instance info for .bucket.meta.produktizr_doc:produktizr_doc:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.241 2019-03-08 11:43:08.752063 7fa0820e1700 0 data sync: ERROR: failed to retrieve bucket info for bucket=produktizr_doc/produktizr_doc:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.241 2019-03-08 11:43:08.752462 7fa0820e1700 0 data sync: ERROR: init sync on goinfosb/goinfosb:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18467.160 failed, retcode=-2 2019-03-08 11:43:08.793707 7fa0820e1700 0 data sync: ERROR: init sync on kafkadrm/kafkadrm:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.183 failed, retcode=-2 2019-03-08 11:43:08.809748 7fa0820e1700 0 data sync: ERROR: init sync on terraform_dev/terraform-dev:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.418 failed, retcode=-2 2019-03-08 11:43:08.809804 7fa0820e1700 0 data sync: ERROR: init sync on pithos_doc/pithos-doc:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18467.393 failed, retcode=-2 2019-03-08 11:43:08.809917 7fa0820e1700 0 data sync: ERROR: init sync on jmeter_sc/jmeter-sc:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18467.360 failed, retcode=-2 2019-03-08 11:43:09.345180 7fa0840e5700 0 ERROR: failed to get bucket instance info for .bucket.meta.spins_on_the_ledger:spins_on_the_ledger:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18467.274 2019-03-08 11:43:09.349186 7fa0820e1700 0 data sync: ERROR: init sync on steno/steno:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18467.279 failed, retcode=-2 2019-03-08 11:43:09.349235 7fa0820e1700 0 data sync: ERROR: init sync on adjuster_kafka/adjuster-kafka:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18467.308 failed, retcode=-2 2019-03-08 11:43:09.349809 7fa0820e1700 0 data sync: ERROR: init sync on oauth/oauth:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.223 failed, retcode=-2 2019-03-08 11:43:09.351909 7fa08d0f7700 0 meta sync: ERROR: can't remove key: bucket.instance:spins_on_the_ledger/spins_on_the_ledger:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18467.274 ret=-2 2019-03-08 11:43:09.352412 7fa0820e1700 0 data sync: ERROR: init sync on sre_jmeter/sre-jmeter:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.635 failed, retcode=-2 2019-03-08 11:43:09.352609 7fa08f0fb700 0 ERROR: failed to get bucket instance info for .bucket.meta.spins_on_the_ledger:spins_on_the_ledger:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18467.274 2019-03-08 11:43:09.352635 7fa0820e1700 0 data sync: ERROR: failed to retrieve bucket info for bucket=spins_on_the_ledger/spins_on_the_ledger:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18467.274 2019-03-08 11:43:09.352831 7fa0820e1700 0 data sync: ERROR: init sync on charon_analytics/charon-analytics:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.331 failed, retcode=-2 2019-03-08 11:43:09.352903 7fa0820e1700 0 data sync: ERROR: init sync on kafka_doc/kafka-doc:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18467.362 failed, retcode=-2 2019-03-08 11:43:09.353337 7fa0820e1700 0 data sync: ERROR: init sync on serversidesequencing/serversidesequencing:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.263 failed, retcode=-2 2019-03-08 11:43:09.389559 7fa0820e1700 0 data sync: ERROR: init sync on radio_publicapi/radio-publicapi:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.401 failed, retcode=-2 2019-03-08 11:43:09.402324 7fa0820e1700 0 data sync: ERROR: init sync on adjuster_kafka/adjuster-kafka:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18467.308 failed, retcode=-2 2019-03-08 11:43:09.405314 7fa0820e1700 0 data sync: ERROR: init sync on charon_analytics/charon-analytics:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.331 failed, retcode=-2 2019-03-08 11:43:09.406046 7fa0820e1700 0 data sync: ERROR: init sync on kafka_doc/kafka-doc:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18467.362 failed, retcode=-2 2019-03-08 11:43:09.441428 7fa0820e1700 0 data sync: ERROR: init sync on radio_publicapi/radio-publicapi:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.401 failed, retcode=-2 On Fri, Mar 8, 2019 at 10:29 AM Casey Bodley <cbod...@redhat.com> wrote: > (cc ceph-users) > > Can you tell whether these sync errors are coming from metadata sync or > data sync? Are they blocking sync from making progress according to your > 'sync status'? > > On 3/8/19 10:23 AM, Trey Palmer wrote: > > Casey, > > > > Having done the 'reshard stale-instances delete' earlier on the advice > > of another list member, we have tons of sync errors on deleted > > buckets, as you mention. > > > > After 'data sync init' we're still seeing all of these errors on > > deleted buckets. > > > > Since buckets are metadata, it occurred to me this morning that > > buckets are metadata so a 'sync init' wouldn't refresh that info. > > But a 'metadata sync init' might get rid of the stale bucket sync > > info and stop the sync errors. Would that be the way to go? > > > > Thanks, > > > > Trey > > > > > > > > On Wed, Mar 6, 2019 at 11:47 AM Casey Bodley <cbod...@redhat.com > > <mailto:cbod...@redhat.com>> wrote: > > > > Hi Trey, > > > > I think it's more likely that these stale metadata entries are from > > deleted buckets, rather than accidental bucket reshards. When a > > bucket > > is deleted in a multisite configuration, we don't delete its bucket > > instance because other zones may still need to sync the object > > deletes - > > and they can't make progress on sync if the bucket metadata > > disappears. > > These leftover bucket instances look the same to the 'reshard > > stale-instances' commands, but I'd be cautious about using that to > > remove them in multisite, as it may cause more sync errors and > > potentially leak storage if they still contain objects. > > > > Regarding 'datalog trim', that alone isn't safe because it could trim > > entries that hadn't been applied on other zones yet, causing them to > > miss some updates. What you can do is run 'data sync init' on each > > zone, > > and restart gateways. This will restart with a data full sync (which > > will scan all buckets for changes), and skip past any datalog entries > > from before the full sync. I was concerned that the bug in error > > handling (ie "ERROR: init sync on...") would also affect full > > sync, but > > that doesn't appear to be the case - so I do think that's worth > > trying. > > >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com