Hi,
thanks for the answer.
On Thu, Mar 07, 2019 at 07:48:59PM -0800, David Zafman wrote:
> See what results you get from this command.
>
> # rados list-inconsistent-snapset 2.2bb --format=json-pretty
>
> You might see this, so nothing interesting. If you don't get json, then
> re-run a scrub again.
>
> {
> "epoch": ######,
> "inconsistents": []
> }
# rados list-inconsistent-snapset 2.2bb --format=json-pretty
{
"epoch": 485065,
"inconsistents": [
{
"name": "rbd_data.dfd5e2235befd0.000000000001c299",
"nspace": "",
"locator": "",
"snap": 326022,
"errors": [
"headless"
]
},
{
"name": "rbd_data.dfd5e2235befd0.000000000001c299",
"nspace": "",
"locator": "",
"snap": "head",
"snapset": {
"snap_context": {
"seq": 327360,
"snaps": []
},
"head_exists": 1,
"clones": []
},
"errors": [
"extra_clones"
],
"extra clones": [
326022
]
}
]
}
> I don't think you need to do the remove-clone-metadata because you got
> "unexpected clone" so I think you'd get "Clone 326022 not present"
>
> I think you need to remove the clone object from osd.12 and osd.80. For
> example:
>
> # ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-12/
> --journal-path /dev/sdXX --op list rbd_data.dfd5e2235befd0.000000000001c299
>
> ["2.2bb",{"oid":"rbd_data.dfd5e2235befd0.000000000001c299","key":"","snapid":-2,"hash":########,"max":0,"pool":2,"namespace":"","max":0}]
> ["2.2bb",{"oid":"rbd_data.dfd5e2235befd0.000000000001c299","key":"","snapid":326022,"hash":#########,"max":0,"pool":2,"namespace":"","max":0}]
>
> Use the json for snapid 326022 to remove it.
>
> # ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-12/
> --journal-path /dev/sdXX
> '["2.2bb",{"oid":"rbd_data.dfd5e2235befd0.000000000001c299","key":"","snapid":326022,"hash":#########,"max":0,"pool":2,"namespace":"","max":0}]'
>
> remove
# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-80/ --journal-path
/dev/sda1 --op list rbd_data.dfd5e2235befd0.000000000001c299 --pgid 2.2bb
["2.2bb",{"oid":"rbd_data.dfd5e2235befd0.000000000001c299","key":"","snapid":326022,"hash":3420345019,"max":0,"pool":2,"namespace":"","max":0}]
["2.2bb",{"oid":"rbd_data.dfd5e2235befd
I added --pgid 2.2bb because it is taking to long to finish.
# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-80/ --journal-path
/dev/sda1
'["2.2bb",{"oid":"rbd_data.dfd5e2235befd0.000000000001c299","key":"","snapid":326022,"hash":3420345019,"max":0,"pool":2,"namespace":"","max":0}]'
remove
remove #2:dd4a7bd3:::rbd_data.dfd5e2235befd0.000000000001c299:4f986#
osd.12 was a slight diferent because it is bluestore:
# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-12/ --op list
rbd_data.dfd5e2235befd0.000000000001c299 --pgid 2.2bb
["2.2bb",{"oid":"rbd_data.dfd5e2235befd0.000000000001c299","key":"","snapid":326022,"hash":3420345019,"max":0,"pool":2,"namespace":"","max":0}]
["2.2bb",{"oid":"rbd_data.dfd5e2235befd0.000000000001c299","key":"","snapid":-2,"hash":3420345019,"max":0,"pool":2,"namespace":"","max":0}]
# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-12/
'["2.2bb",{"oid":"rbd_data.dfd5e2235befd0.000000000001c299","key":"","snapid":326022,"hash":3420345019,"max":0,"pool":2,"namespace":"","max":0}]'
remove
remove #2:dd4a7bd3:::rbd_data.dfd5e2235befd0.000000000001c299:4f986#
But nothing changed, so I tried to repair the pg again and from osd.36
I got now:
2019-03-08 09:09:11.786038 7f920c40d700 -1 log_channel(cluster) log [ERR] :
2.2bb shard 36 soid 2:dd4a7bd3:::rbd_data.dfd5e2235befd0.000000000001c299:4f986
: candidate size 0 info size 4194304 mismatch
2019-03-08 09:09:11.786041 7f920c40d700 -1 log_channel(cluster) log [ERR] :
2.2bb soid 2:dd4a7bd3:::rbd_data.dfd5e2235befd0.000000000001c299:4f986 : failed
to pick suitable object info
2019-03-08 09:09:11.786182 7f920c40d700 -1 log_channel(cluster) log [ERR] :
repair 2.2bb 2:dd4a7bd3:::rbd_data.dfd5e2235befd0.000000000001c299:4f986 : on
disk size (0) does not match object info size (4194304) adjusted for ondisk to
(4194304)
2019-03-08 09:09:11.786191 7f920c40d700 -1 log_channel(cluster) log [ERR] :
repair 2.2bb 2:dd4a7bd3:::rbd_data.dfd5e2235befd0.000000000001c299:4f986 : is
an unexpected clone
2019-03-08 09:09:11.786213 7f920c40d700 -1 osd.36 pg_epoch: 485254 pg[2.2bb( v
485253'15080921 (485236'15079373,485253'15080921] local-lis/les=485251/485252
n=3836 ec=38/38 lis/c 485251/485251 les/c/f 485252/485252/0
485251/485251/484996) [36,12,80] r=0 lpr=485251 crt=485253'15080921 lcod
485252'15080920 mlcod 485252'15080920
active+clean+scrubbing+deep+inconsistent+repair snaptrimq=[5022c~1,50230~1]]
_scan_snaps no clone_snaps for
2:dd4a7bd3:::rbd_data.dfd5e2235befd0.000000000001c299:4f986 in 4fec0=[]:{}
And:
# rados list-inconsistent-snapset 2.2bb --format=json-pretty
{
"epoch": 485251,
"inconsistents": []
}
Now I have:
HEALTH_ERR 5 scrub errors; Possible data damage: 1 pg inconsistent
OSD_SCRUB_ERRORS 5 scrub errors
PG_DAMAGED Possible data damage: 1 pg inconsistent
pg 2.2bb is active+clean+inconsistent, acting [36,12,80]
Jumped from 3 to 5 scrub errors now.
Any clues?
Thanks again,
--
Herbert
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com