Hello,

Here a resume of the trouble shooting story on my radosgw.

after some manipulations on the zone definition, we get stuck in a situation 
where we cannot update zones and zonegroups anymore.

This situation has affected bucket manipulation too :

> radosgw-admin bucket list
> 2016-09-06 09:04:14.810198 7fcbb01d5900  0 Error updating periodmap, multiple 
> master zonegroups configured 
> 2016-09-06 09:04:14.810213 7fcbb01d5900  0 master zonegroup: 
> 4d982760-7853-4174-8c05-cec2ef148cf0 and  default
> 2016-09-06 09:04:14.810215 7fcbb01d5900  0 ERROR: updating period map: (22) 
> Invalid argument
> 2016-09-06 09:04:14.810230 7fcbb01d5900  0 failed to add zonegroup to 
> current_period: (22) Invalid argument
> 2016-09-06 09:04:14.810238 7fcbb01d5900 -1 failed converting region to 
> zonegroup : ret -22 (22) Invalid argument

After multiple discussions with 2 ceph developers from redhat, we found a bug 
in the period management in the RadosGW.

A bug has been submited : http://tracker.ceph.com/issues/17239

I've written the different step of the troubleshooting here :

1. Situation

firstly, we have created one zonegroup and one zone to allow us to put data on 
replicated pool or erasure pool through the RadosGW

default_zonegroup.json
default_zone.json

At This point, if we try to modify the zone or create a new one, we won't be 
able to commit those change and the radosgw will be in an unstable
state.

The problem come from period update

    http://docs.ceph.com/docs/jewel/radosgw/multisite/#update-the-period

In the period there is one zone set as master and this period is in conflict 
with the one we have updated then impossible to fix the situation.

2. Troubleshooting #1

After many try, the only solution we found was to start from scatch the 
definition of zone / zonegroup / period. To do that, we have do delete
the .rgw.root pool.

But before, we have to stop all radosgw daemon.

> rados purge .rgw.root --yes-i-really-really-mean-it

after deleting the pool, we start the radosgw daemon.

To be able to manipulate zone and zonegroup we must create a realm ID

> radosgw-admin realm create --rgw-realm=default --default

Then create a new zonegroup and a new zone and set them as default and commit

> radosgw-admin zonegroup create --rgw-zonegroup=default --master --default
> radosgw-admin zone create --rgw-zonegroup=default --rgw-zone=default 
> --default --master
> radosgw-admin zonegroup default --rgw-zonegroup default
> radosgw-admin zone default --rgw-zone default
> radosgw-admin period update
> radosgw-admin period update --commit

But then we found that the zone id and the zonegroup id does not match the one 
we have in the bucket instance. The id of the zone appear in the
name of the bucket.instance and both id (zone and zonegroup) are present in the 
metadata of the bucket.instance

> radosgw-admin metadata list bucket.instance
> [
>     "newtest2:69a46a98-09a8-41f3-9122-ced11496513b.1095087.2",
>     "testreplicate:c9724aff-5fa0-4dd9-b494-57bdb48fab4e.34100.1",
>     "testerasure:c9724aff-5fa0-4dd9-b494-57bdb48fab4e.34103.1",
>     "newtest:69a46a98-09a8-41f3-9122-ced11496513b.1095087.1"
> ]

> radosgw-admin metadata get 
> bucket.instance:testreplicate:c9724aff-5fa0-4dd9-b494-57bdb48fab4e.34100.1
> {
>     "key": 
> "bucket.instance:testreplicate:c9724aff-5fa0-4dd9-b494-57bdb48fab4e.34100.1",
>     "ver": {
>         "tag": "_e5KzBElZ1V3P0ZEbQElSBRA",
>         "ver": 1
>     },
>     "mtime": "2016-08-23 13:39:10.662987Z",
>     "data": {
>         "bucket_info": {
>             "bucket": {
>                 "name": "testreplicate",
>                 "pool": "default.rgw.buckets.data",
>                 "data_extra_pool": "default.rgw.buckets.extra",
>                 "index_pool": "default.rgw.buckets.index",
>                 "marker": "c9724aff-5fa0-4dd9-b494-57bdb48fab4e.34100.1",
>                 "bucket_id": "c9724aff-5fa0-4dd9-b494-57bdb48fab4e.34100.1"
>             },
>             "creation_time": "0.000000",
>             "owner": "replicate",
>             "flags": 0,
>             "zonegroup": "4d982760-7853-4174-8c05-cec2ef148cf0",
>             "placement_rule": "default-placement",
>             "has_instance_obj": "true",
>             "quota": {
>                 "enabled": false,
>                 "max_size_kb": -1,
>                 "max_objects": -1
>             },
>             "num_shards": 0,
>             "bi_shard_hash_type": 0,
>             "requester_pays": "false",
>             "has_website": "false",
>             "swift_versioning": "false",
>             "swift_ver_location": ""
>         },
>         "attrs": [
>             {
>                 "key": "user.rgw.acl",
>                 "val":
> "AgKRAAAAAwIaAAAACQAAAHJlcGxpY2F0ZQkAAAByZXBsaWNhdGUDA2sAAAABAQAAAAkAAAByZXBsaWNhdGUPAAAAAQAAAAkAAAByZXBsaWNhdGUEAzoAAAACAgQAAAAAAAAACQAAAHJlcGxpY2F0ZQAAAAAAAAAAAgIEAAAADwAAAAkAAAByZXBsaWNhdGUAAAAAAAAAAA=="
>             },
>             {
>                 "key": "user.rgw.idtag",
>                 "val": ""
>             },
>             {
>                 "key": "user.rgw.manifest",
>                 "val": ""
>             }
>         ]
>     }
> }

We try to fixe this by creating new zonegroup and zone with the good IDs, set 
as default and delete the other one but we fall back on the bug on
period update

3. Troubleshooting #2

Restart from scratch the process :

We stop all the radosgw daemon, delete the .rgw.root pool, start the radosgw, 
create the realm again

Then we decide to try to create the zonegroup and the zone from json we save 
with good IDs set

We have to be careful to change the realm id in the 2 json with the new one, if 
not it won't work.

After edition the 2 files again

default_zonegroup.json
default_zone.json

we can create the zonegroup and zone like that :

> radosgw-admin zonegroup set --rgw-zonegroup default < default_zonegroup.json
> radosgw-admin zone set --rgw-zonegroup default --rgw-zone default < 
> default_zone.json

At this point, the new zonegroup and zone were successfully created but their 
IDs wasn't those in the json, during the set, the radosgw-admin
create a new IDs for both zonegroup and zone.

In this situation we are still not able to access to the data. We have to start 
again from scratch...

4. Troubleshooting #3

We decide to restart the process but leave the radosgw stopped, we have the 
intuition that may affect the behaviour by creation default zone and
zonegroup itself.

Finally we did that :

Stop all RadosGW !

Purge the .rgw.root pool

> rados purge .rgw.root --yes-i-really-really-mean-it

create a new realm id and set it as default

> radosgw-admin realm create --rgw-realm=default --default

Edit the 2 json files to change the realm id with the new one

> vim default_zone.json #change realm with the new one
> vim default_zonegroup.json #change realm with the new one

Create the zonegroup and the zone like that (the order is really important here 
!)

> radosgw-admin zonegroup set --rgw-zonegroup default < default_zonegroup.json
> radosgw-admin zone set --rgw-zonegroup default --rgw-zone default < 
> default_zone.json

Set zonegroup and zone as default

> radosgw-admin zonegroup default --rgw-zonegroup default
> radosgw-admin zone default --rgw-zone default

We can check if the zone and the zonegroup are good by doing this

> radosgw-admin zonegroup list
> radosgw-admin zonegroup get
> radosgw-admin zone list
> radosgw-admin zone get

We have to update the period (do not commit first and read if the data in the 
update are good)

> radosgw-admin period update

Then we can commit the period update to apply the configuration

> radosgw-admin period update --commit

We can now safely restart the radosgw !

-- 
Yoann Moulin
EPFL IC-IT


Attachment: default_zone.json
Description: application/json

Attachment: default_zonegroup.json
Description: application/json

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to