Hi Matthew,
I work with Christian. Thanks so much for looking at this.
We have a huge stale-instances list from that command.
Our periods are all the same, I redirected them to a file on each node and
checksummed them. Here's the period:
{
"id": "3d0d40ef-90de-40ea-8c44-caa20ea8dc53",
"epoch": 16,
"predecessor_uuid": "926c74c7-c1a7-46b1-9f25-eb5c392a7fbb",
"sync_status": [],
"period_map": {
"id": "3d0d40ef-90de-40ea-8c44-caa20ea8dc53",
"zonegroups": [
{
"id": "de6af748-1a2f-44a1-9d44-30799cf1313e",
"name": "us",
"api_name": "us",
"is_master": "true",
"endpoints": [
"http://sv5-ceph-rgw1.savagebeast.com:8080"
],
"hostnames": [],
"hostnames_s3website": [],
"master_zone": "1e27bf9c-3a2f-4845-85b6-33a24bbe1c04",
"zones": [
{
"id": "107d29a0-b732-4bf1-a26e-1f64f820e839",
"name": "dc11-prod",
"endpoints": [
"http://dc11-ceph-rgw1:8080"
],
"log_meta": "false",
"log_data": "true",
"bucket_index_max_shards": 0,
"read_only": "false",
"tier_type": "",
"sync_from_all": "true",
"sync_from": []
},
{
"id": "1e27bf9c-3a2f-4845-85b6-33a24bbe1c04",
"name": "sv5-corp",
"endpoints": [
"http://sv5-ceph-rgw1.savagebeast.com:8080"
],
"log_meta": "false",
"log_data": "true",
"bucket_index_max_shards": 0,
"read_only": "false",
"tier_type": "",
"sync_from_all": "true",
"sync_from": []
},
{
"id": "331d3f1e-1b72-4c56-bb5a-d1d0fcf6d0b8",
"name": "sv3-prod",
"endpoints": [
"http://sv3-ceph-rgw1:8080"
],
"log_meta": "false",
"log_data": "true",
"bucket_index_max_shards": 0,
"read_only": "false",
"tier_type": "",
"sync_from_all": "true",
"sync_from": []
}
],
"placement_targets": [
{
"name": "default-placement",
"tags": []
}
],
"default_placement": "default-placement",
"realm_id": "b3e2afe7-2254-494a-9a34-ce50358779fd"
}
],
"short_zone_ids": [
{
"key": "107d29a0-b732-4bf1-a26e-1f64f820e839",
"val": 1720993486
},
{
"key": "1e27bf9c-3a2f-4845-85b6-33a24bbe1c04",
"val": 2301637458
},
{
"key": "331d3f1e-1b72-4c56-bb5a-d1d0fcf6d0b8",
"val": 1449486239
}
]
},
"master_zonegroup": "de6af748-1a2f-44a1-9d44-30799cf1313e",
"master_zone": "1e27bf9c-3a2f-4845-85b6-33a24bbe1c04",
"period_config": {
"bucket_quota": {
"enabled": false,
"check_on_raw": false,
"max_size": -1,
"max_size_kb": 0,
"max_objects": -1
},
"user_quota": {
"enabled": false,
"check_on_raw": false,
"max_size": -1,
"max_size_kb": 0,
"max_objects": -1
}
},
"realm_id": "b3e2afe7-2254-494a-9a34-ce50358779fd",
"realm_name": "savagebucket",
"realm_epoch": 2
}
On Tue, Mar 5, 2019 at 7:31 AM Matthew H <[email protected]> wrote:
> Hi Christian,
>
> You haven't resharded any of your buckets have you? You can run the
> command below in v12.2.11 to list stale bucket instances.
>
> radosgw-admin reshard stale-instances list
>
> Can you also send the output from the following command on each rgw?
>
> radosgw-admin period get
>
>
>
> ------------------------------
> *From:* Christian Rice <[email protected]>
> *Sent:* Tuesday, March 5, 2019 1:46 AM
> *To:* Matthew H; ceph-users
> *Subject:* Re: radosgw sync falling behind regularly
>
>
> sure thing.
>
>
>
> sv5-ceph-rgw1
>
> zonegroup get
>
> {
>
> "id": "de6af748-1a2f-44a1-9d44-30799cf1313e",
>
> "name": "us",
>
> "api_name": "us",
>
> "is_master": "true",
>
> "endpoints": [
>
> "http://sv5-ceph-rgw1.savagebeast.com:8080"
>
> ],
>
> "hostnames": [],
>
> "hostnames_s3website": [],
>
> "master_zone": "1e27bf9c-3a2f-4845-85b6-33a24bbe1c04",
>
> "zones": [
>
> {
>
> "id": "107d29a0-b732-4bf1-a26e-1f64f820e839",
>
> "name": "dc11-prod",
>
> "endpoints": [
>
> "http://dc11-ceph-rgw1:8080"
>
> ],
>
> "log_meta": "false",
>
> "log_data": "true",
>
> "bucket_index_max_shards": 0,
>
> "read_only": "false",
>
> "tier_type": "",
>
> "sync_from_all": "true",
>
> "sync_from": []
>
> },
>
> {
>
> "id": "1e27bf9c-3a2f-4845-85b6-33a24bbe1c04",
>
> "name": "sv5-corp",
>
> "endpoints": [
>
> "http://sv5-ceph-rgw1.savagebeast.com:8080"
>
> ],
>
> "log_meta": "false",
>
> "log_data": "true",
>
> "bucket_index_max_shards": 0,
>
> "read_only": "false",
>
> "tier_type": "",
>
> "sync_from_all": "true",
>
> "sync_from": []
>
> },
>
> {
>
> "id": "331d3f1e-1b72-4c56-bb5a-d1d0fcf6d0b8",
>
> "name": "sv3-prod",
>
> "endpoints": [
>
> "http://sv3-ceph-rgw1:8080"
>
> ],
>
> "log_meta": "false",
>
> "log_data": "true",
>
> "bucket_index_max_shards": 0,
>
> "read_only": "false",
>
> "tier_type": "",
>
> "sync_from_all": "true",
>
> "sync_from": []
>
> }
>
> ],
>
> "placement_targets": [
>
> {
>
> "name": "default-placement",
>
> "tags": []
>
> }
>
> ],
>
> "default_placement": "default-placement",
>
> "realm_id": "b3e2afe7-2254-494a-9a34-ce50358779fd"
>
> }
>
>
>
> zone get
>
> {
>
> "id": "1e27bf9c-3a2f-4845-85b6-33a24bbe1c04",
>
> "name": "sv5-corp",
>
> "domain_root": "sv5-corp.rgw.meta:root",
>
> "control_pool": "sv5-corp.rgw.control",
>
> "gc_pool": "sv5-corp.rgw.log:gc",
>
> "lc_pool": "sv5-corp.rgw.log:lc",
>
> "log_pool": "sv5-corp.rgw.log",
>
> "intent_log_pool": "sv5-corp.rgw.log:intent",
>
> "usage_log_pool": "sv5-corp.rgw.log:usage",
>
> "reshard_pool": "sv5-corp.rgw.log:reshard",
>
> "user_keys_pool": "sv5-corp.rgw.meta:users.keys",
>
> "user_email_pool": "sv5-corp.rgw.meta:users.email",
>
> "user_swift_pool": "sv5-corp.rgw.meta:users.swift",
>
> "user_uid_pool": "sv5-corp.rgw.meta:users.uid",
>
> "system_key": {
>
> "access_key": "access_key_redacted",
>
> "secret_key": "secret_key_redacted"
>
> },
>
> "placement_pools": [
>
> {
>
> "key": "default-placement",
>
> "val": {
>
> "index_pool": "sv5-corp.rgw.buckets.index",
>
> "data_pool": "sv5-corp.rgw.buckets.data",
>
> "data_extra_pool": "sv5-corp.rgw.buckets.non-ec",
>
> "index_type": 0,
>
> "compression": ""
>
> }
>
> }
>
> ],
>
> "metadata_heap": "",
>
> "tier_config": [],
>
> "realm_id": "b3e2afe7-2254-494a-9a34-ce50358779fd"
>
> }
>
> sv3-ceph-rgw1
>
> zonegroup get
>
> {
>
> "id": "de6af748-1a2f-44a1-9d44-30799cf1313e",
>
> "name": "us",
>
> "api_name": "us",
>
> "is_master": "true",
>
> "endpoints": [
>
> "http://sv5-ceph-rgw1.savagebeast.com:8080"
>
> ],
>
> "hostnames": [],
>
> "hostnames_s3website": [],
>
> "master_zone": "1e27bf9c-3a2f-4845-85b6-33a24bbe1c04",
>
> "zones": [
>
> {
>
> "id": "107d29a0-b732-4bf1-a26e-1f64f820e839",
>
> "name": "dc11-prod",
>
> "endpoints": [
>
> "http://dc11-ceph-rgw1:8080"
>
> ],
>
> "log_meta": "false",
>
> "log_data": "true",
>
> "bucket_index_max_shards": 0,
>
> "read_only": "false",
>
> "tier_type": "",
>
> "sync_from_all": "true",
>
> "sync_from": []
>
> },
>
> {
>
> "id": "1e27bf9c-3a2f-4845-85b6-33a24bbe1c04",
>
> "name": "sv5-corp",
>
> "endpoints": [
>
> "http://sv5-ceph-rgw1.savagebeast.com:8080"
>
> ],
>
> "log_meta": "false",
>
> "log_data": "true",
>
> "bucket_index_max_shards": 0,
>
> "read_only": "false",
>
> "tier_type": "",
>
> "sync_from_all": "true",
>
> "sync_from": []
>
> },
>
> {
>
> "id": "331d3f1e-1b72-4c56-bb5a-d1d0fcf6d0b8",
>
> "name": "sv3-prod",
>
> "endpoints": [
>
> "http://sv3-ceph-rgw1:8080"
>
> ],
>
> "log_meta": "false",
>
> "log_data": "true",
>
> "bucket_index_max_shards": 0,
>
> "read_only": "false",
>
> "tier_type": "",
>
> "sync_from_all": "true",
>
> "sync_from": []
>
> }
>
> ],
>
> "placement_targets": [
>
> {
>
> "name": "default-placement",
>
> "tags": []
>
> }
>
> ],
>
> "default_placement": "default-placement",
>
> "realm_id": "b3e2afe7-2254-494a-9a34-ce50358779fd"
>
> }
>
>
>
> zone get
>
> {
>
> "id": "331d3f1e-1b72-4c56-bb5a-d1d0fcf6d0b8",
>
> "name": "sv3-prod",
>
> "domain_root": "sv3-prod.rgw.meta:root",
>
> "control_pool": "sv3-prod.rgw.control",
>
> "gc_pool": "sv3-prod.rgw.log:gc",
>
> "lc_pool": "sv3-prod.rgw.log:lc",
>
> "log_pool": "sv3-prod.rgw.log",
>
> "intent_log_pool": "sv3-prod.rgw.log:intent",
>
> "usage_log_pool": "sv3-prod.rgw.log:usage",
>
> "reshard_pool": "sv3-prod.rgw.log:reshard",
>
> "user_keys_pool": "sv3-prod.rgw.meta:users.keys",
>
> "user_email_pool": "sv3-prod.rgw.meta:users.email",
>
> "user_swift_pool": "sv3-prod.rgw.meta:users.swift",
>
> "user_uid_pool": "sv3-prod.rgw.meta:users.uid",
>
> "system_key": {
>
> "access_key": "access_key_redacted",
>
> "secret_key": "secret_key_redacted"
>
> },
>
> "placement_pools": [
>
> {
>
> "key": "default-placement",
>
> "val": {
>
> "index_pool": "sv3-prod.rgw.buckets.index",
>
> "data_pool": "sv3-prod.rgw.buckets.data",
>
> "data_extra_pool": "sv3-prod.rgw.buckets.non-ec",
>
> "index_type": 0,
>
> "compression": ""
>
> }
>
> }
>
> ],
>
> "metadata_heap": "",
>
> "tier_config": [],
>
> "realm_id": "b3e2afe7-2254-494a-9a34-ce50358779fd"
>
> }
>
> dc11-ceph-rgw1
>
> zonegroup get
>
> {
>
> "id": "de6af748-1a2f-44a1-9d44-30799cf1313e",
>
> "name": "us",
>
> "api_name": "us",
>
> "is_master": "true",
>
> "endpoints": [
>
> "http://sv5-ceph-rgw1.savagebeast.com:8080"
>
> ],
>
> "hostnames": [],
>
> "hostnames_s3website": [],
>
> "master_zone": "1e27bf9c-3a2f-4845-85b6-33a24bbe1c04",
>
> "zones": [
>
> {
>
> "id": "107d29a0-b732-4bf1-a26e-1f64f820e839",
>
> "name": "dc11-prod",
>
> "endpoints": [
>
> "http://dc11-ceph-rgw1:8080"
>
> ],
>
> "log_meta": "false",
>
> "log_data": "true",
>
> "bucket_index_max_shards": 0,
>
> "read_only": "false",
>
> "tier_type": "",
>
> "sync_from_all": "true",
>
> "sync_from": []
>
> },
>
> {
>
> "id": "1e27bf9c-3a2f-4845-85b6-33a24bbe1c04",
>
> "name": "sv5-corp",
>
> "endpoints": [
>
> "http://sv5-ceph-rgw1.savagebeast.com:8080"
>
> ],
>
> "log_meta": "false",
>
> "log_data": "true",
>
> "bucket_index_max_shards": 0,
>
> "read_only": "false",
>
> "tier_type": "",
>
> "sync_from_all": "true",
>
> "sync_from": []
>
> },
>
> {
>
> "id": "331d3f1e-1b72-4c56-bb5a-d1d0fcf6d0b8",
>
> "name": "sv3-prod",
>
> "endpoints": [
>
> "http://sv3-ceph-rgw1:8080"
>
> ],
>
> "log_meta": "false",
>
> "log_data": "true",
>
> "bucket_index_max_shards": 0,
>
> "read_only": "false",
>
> "tier_type": "",
>
> "sync_from_all": "true",
>
> "sync_from": []
>
> }
>
> ],
>
> "placement_targets": [
>
> {
>
> "name": "default-placement",
>
> "tags": []
>
> }
>
> ],
>
> "default_placement": "default-placement",
>
> "realm_id": "b3e2afe7-2254-494a-9a34-ce50358779fd"
>
> }
>
>
>
> zone get
>
> {
>
> "id": "107d29a0-b732-4bf1-a26e-1f64f820e839",
>
> "name": "dc11-prod",
>
> "domain_root": "dc11-prod.rgw.meta:root",
>
> "control_pool": "dc11-prod.rgw.control",
>
> "gc_pool": "dc11-prod.rgw.log:gc",
>
> "lc_pool": "dc11-prod.rgw.log:lc",
>
> "log_pool": "dc11-prod.rgw.log",
>
> "intent_log_pool": "dc11-prod.rgw.log:intent",
>
> "usage_log_pool": "dc11-prod.rgw.log:usage",
>
> "reshard_pool": "dc11-prod.rgw.log:reshard",
>
> "user_keys_pool": "dc11-prod.rgw.meta:users.keys",
>
> "user_email_pool": "dc11-prod.rgw.meta:users.email",
>
> "user_swift_pool": "dc11-prod.rgw.meta:users.swift",
>
> "user_uid_pool": "dc11-prod.rgw.meta:users.uid",
>
> "system_key": {
>
> "access_key": "access_key_redacted",
>
> "secret_key": "secret_key_redacted"
>
> },
>
> "placement_pools": [
>
> {
>
> "key": "default-placement",
>
> "val": {
>
> "index_pool": "dc11-prod.rgw.buckets.index",
>
> "data_pool": "dc11-prod.rgw.buckets.data",
>
> "data_extra_pool": "dc11-prod.rgw.buckets.non-ec",
>
> "index_type": 0,
>
> "compression": ""
>
> }
>
> }
>
> ],
>
> "metadata_heap": "",
>
> "tier_config": [],
>
> "realm_id": "b3e2afe7-2254-494a-9a34-ce50358779fd"
>
> }
>
>
>
> *From: *Matthew H <[email protected]>
> *Date: *Monday, March 4, 2019 at 7:44 PM
> *To: *Christian Rice <[email protected]>, ceph-users <
> [email protected]>
> *Subject: *Re: radosgw sync falling behind regularly
>
>
>
> Christian,
>
>
>
> Can you provide your zonegroup and zones configurations for all 3 rgw
> sites? (run the commands for each site please)
>
>
>
> Thanks,
>
>
> ------------------------------
>
> *From:* Christian Rice <[email protected]>
> *Sent:* Monday, March 4, 2019 5:34 PM
> *To:* Matthew H; ceph-users
> *Subject:* Re: radosgw sync falling behind regularly
>
>
>
> So we upgraded everything from 12.2.8 to 12.2.11, and things have gone to
> hell. Lots of sync errors, like so:
>
>
>
> sudo radosgw-admin sync error list
>
> [
>
> {
>
> "shard_id": 0,
>
> "entries": [
>
> {
>
> "id": "1_1549348245.870945_5163821.1",
>
> "section": "data",
>
> "name":
> "dora/catalogmaker-redis:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18467.470/56fbc9685d609b4c8cdbd11dd60bf03bedcb613b438c663c9899d930b25f0405",
>
> "timestamp": "2019-02-05 06:30:45.870945Z",
>
> "info": {
>
> "source_zone": "1e27bf9c-3a2f-4845-85b6-33a24bbe1c04",
>
> "error_code": 5,
>
> "message": "failed to sync object(5) Input/output
> error"
>
> }
>
> },
>
> …
>
>
>
> radosgw logs are full of:
>
> 2019-03-04 14:32:58.039467 7f90e81eb700 0 data sync: ERROR: failed to
> read remote data log info: ret=-2
>
> 2019-03-04 14:32:58.041296 7f90e81eb700 0 data sync: ERROR: init sync on
> escarpment/escarpment:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18467.146
> failed, retcode=-2
>
> 2019-03-04 14:32:58.041662 7f90e81eb700 0 meta sync: ERROR:
> RGWBackoffControlCR called coroutine returned -2
>
> 2019-03-04 14:32:58.042949 7f90e81eb700 0 data sync: WARNING: skipping
> data log entry for missing bucket
> escarpment/escarpment:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18467.146
>
> 2019-03-04 14:32:58.823501 7f90e81eb700 0 data sync: ERROR: failed to
> read remote data log info: ret=-2
>
> 2019-03-04 14:32:58.825243 7f90e81eb700 0 meta sync: ERROR:
> RGWBackoffControlCR called coroutine returned -2
>
>
>
> dc11-ceph-rgw2:~$ sudo radosgw-admin sync status
>
> realm b3e2afe7-2254-494a-9a34-ce50358779fd (savagebucket)
>
> zonegroup de6af748-1a2f-44a1-9d44-30799cf1313e (us)
>
> zone 107d29a0-b732-4bf1-a26e-1f64f820e839 (dc11-prod)
>
> 2019-03-04 14:26:21.351372 7ff7ae042e40 0 meta sync: ERROR: failed to
> fetch mdlog info
>
> metadata sync syncing
>
> full sync: 0/64 shards
>
> failed to fetch local sync status: (5) Input/output error
>
> ^C
>
>
>
> Any advice? All three clusters on 12.2.11, Debian stretch.
>
>
>
> *From: *Christian Rice <[email protected]>
> *Date: *Thursday, February 28, 2019 at 9:06 AM
> *To: *Matthew H <[email protected]>, ceph-users <
> [email protected]>
> *Subject: *Re: radosgw sync falling behind regularly
>
>
>
> Yeah my bad on the typo, not running 12.8.8 ☺ It’s 12.2.8. We can
> upgrade and will attempt to do so asap. Thanks for that, I need to read my
> release notes more carefully, I guess!
>
>
>
> *From: *Matthew H <[email protected]>
> *Date: *Wednesday, February 27, 2019 at 8:33 PM
> *To: *Christian Rice <[email protected]>, ceph-users <
> [email protected]>
> *Subject: *Re: radosgw sync falling behind regularly
>
>
>
> Hey Christian,
>
>
>
> I'm making a while guess, but assuming this is 12.2.8. If so, it it
> possible that you can upgrade to 12.2.11? There's been rgw multisite bug
> fixes for metadata syncing and data syncing ( both separate issues ) that
> you could be hitting.
>
>
>
> Thanks,
> ------------------------------
>
> *From:* ceph-users <[email protected]> on behalf of
> Christian Rice <[email protected]>
> *Sent:* Wednesday, February 27, 2019 7:05 PM
> *To:* ceph-users
> *Subject:* [ceph-users] radosgw sync falling behind regularly
>
>
>
> Debian 9; ceph 12.8.8-bpo90+1; no rbd or cephfs, just radosgw; three
> clusters in one zonegroup.
>
>
>
> Often we find either metadata or data sync behind, and it doesn’t look to
> ever recover until…we restart the endpoint radosgw target service.
>
>
>
> eg at 15:45:40:
>
>
>
> dc11-ceph-rgw1:/var/log/ceph# radosgw-admin sync status
>
> realm b3e2afe7-2254-494a-9a34-ce50358779fd (savagebucket)
>
> zonegroup de6af748-1a2f-44a1-9d44-30799cf1313e (us)
>
> zone 107d29a0-b732-4bf1-a26e-1f64f820e839 (dc11-prod)
>
> metadata sync syncing
>
> full sync: 0/64 shards
>
> incremental sync: 64/64 shards
>
> metadata is behind on 2 shards
>
> behind shards: [19,41]
>
> oldest incremental change not applied: 2019-02-27
> 14:42:24.0.408263s
>
> data sync source: 1e27bf9c-3a2f-4845-85b6-33a24bbe1c04 (sv5-corp)
>
> syncing
>
> full sync: 0/128 shards
>
> incremental sync: 128/128 shards
>
> data is caught up with source
>
> source: 331d3f1e-1b72-4c56-bb5a-d1d0fcf6d0b8 (sv3-prod)
>
> syncing
>
> full sync: 0/128 shards
>
> incremental sync: 128/128 shards
>
> data is caught up with source
>
>
>
>
>
> so at 15:46:07:
>
>
>
> dc11-ceph-rgw1:/var/log/ceph# sudo systemctl restart
> [email protected]
>
>
>
> and by the time I checked at 15:48:08:
>
>
>
> dc11-ceph-rgw1:/var/log/ceph# radosgw-admin sync status
>
> realm b3e2afe7-2254-494a-9a34-ce50358779fd (savagebucket)
>
> zonegroup de6af748-1a2f-44a1-9d44-30799cf1313e (us)
>
> zone 107d29a0-b732-4bf1-a26e-1f64f820e839 (dc11-prod)
>
> metadata sync syncing
>
> full sync: 0/64 shards
>
> incremental sync: 64/64 shards
>
> metadata is caught up with master
>
> data sync source: 1e27bf9c-3a2f-4845-85b6-33a24bbe1c04 (sv5-corp)
>
> syncing
>
> full sync: 0/128 shards
>
> incremental sync: 128/128 shards
>
> data is caught up with source
>
> source: 331d3f1e-1b72-4c56-bb5a-d1d0fcf6d0b8 (sv3-prod)
>
> syncing
>
> full sync: 0/128 shards
>
> incremental sync: 128/128 shards
>
> data is caught up with source
>
>
>
>
>
> There’s no way this is “lag.” It’s stuck, and happens frequently, though
> perhaps not daily. Any suggestions? Our cluster isn’t heavily used yet,
> but it’s production.
> _______________________________________________
> ceph-users mailing list
> [email protected]
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com