In case anyone else runs into this, I resolved using removeall on both bad OSDs and running ceph pg repair, which copied the good object back.

-Steve


On 06/27/2018 06:17 PM, Steve Anthony wrote:
In the process of trying to repair snapshot inconsistencies associated
with the issues in this thread,
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-June/027125.html
("FAILED assert(p != recovery_info.ss.clone_snaps.end())​"), I have one
PG I still can't get to repair.

Two of the three replicas appear to have (or think they have) a
snapshot. However, neither ceph-objectstore-tool list operation nor
running find on the OSD fuse mounted report or find the snaps.

# ceph-objectstore-tool --type bluestore --data-path
/var/lib/ceph/osd/ceph-313/ --pgid 2.13e --op list
rb.0.2479b45.238e1f29.000000125cbb
["2.13e",{"oid":"rb.0.2479b45.238e1f29.000000125cbb","key":"","snapid":-2,"hash":2016338238,"max":0,"pool":2,"namespace":"","max":0}]

The ceph-objectstore tool remove-clone-metadata operation also reports
the snapshot does not exist.

# ceph-objectstore-tool --dry-run --type bluestore --data-path
/var/lib/ceph/osd/ceph-313/ --pgid 2.13e
'{"oid":"rb.0.2479b45.238e1f29.000000125cbb","key":"","snapid":-2,"hash":2016338238,"max":0,"pool":2,"namespace":"","max":0}'
remove-clone-metadata 4896
Clone 1320 not presentdry-run: Nothing changed

However, the remove operation sees the snapshot and refuses to delete
the object.

# ceph-objectstore-tool --dry-run --type bluestore --data-path
/var/lib/ceph/osd/ceph-313/ --pgid 2.13e
'{"oid":"rb.0.2479b45.238e1f29.000000125cbb","key":"","snapid":-2,"hash":2016338238,"max":0,"pool":2,"namespace":"","max":0}'
remove
Snapshots are present, use removeall to delete everything
dry-run: Nothing changed

Listing the inconsistencies with rados, it appears that the phantom
snapshot is present on 2/3 replicas. Other PGs had this issue, but on
1/3 replicas and using removeall on the bad copy, then repairing the PG
fixed the issue. Running removeall on the primary replica resulted in
the repair replicating the other bad object. Should I just issue
removeall on both OSDs and then run repair to fix the missing objects,
or is there some other way to purge snaps on an object? (I've already
purged all snapshots on all images in the cluster with rbd snap purge)

Thoughts?

# rados list-inconsistent-obj 2.13e

{
   "epoch": 1008264,
   "inconsistents": [
     {
       "object": {
         "name": "rb.0.2479b45.238e1f29.000000125cbb",
         "nspace": "",
         "locator": "",
         "snap": "head",
         "version": 2024222
       },
       "errors": [
         "object_info_inconsistency",
         "snapset_inconsistency"
       ],
       "union_shard_errors": [
      ],
       "selected_object_info": {
         "oid": {
           "oid": "rb.0.2479b45.238e1f29.000000125cbb",
           "key": "",
           "snapid": -2,
           "hash": 2016338238,
           "max": 0,
           "pool": 2,
           "namespace": ""
         },
         "version": "946857'2041225",
         "prior_version": "943431'2032262",
         "last_reqid": "osd.36.0:48196",
         "user_version": 2024222,
         "size": 4194304,
         "mtime": "2018-05-13 08:58:21.359912",
         "local_mtime": "2018-05-13 08:58:21.537637",
         "lost": 0,
         "flags": [
           "dirty",
           "data_digest",
           "omap_digest"
         ],
         "legacy_snaps": [
        ],
         "truncate_seq": 0,
         "truncate_size": 0,
         "data_digest": "0x0d99bd77",
         "omap_digest": "0xffffffff",
         "expected_object_size": 4194304,
         "expected_write_size": 4194304,
         "alloc_hint_flags": 0,
         "manifest": {
           "type": 0,
           "redirect_target": {
             "oid": "",
             "key": "",
             "snapid": 0,
             "hash": 0,
             "max": 0,
             "pool": -9.2233720368548e+18,
             "namespace": ""
           }
         },
         "watchers": {
        }
       },
       "shards": [
         {
           "osd": 225,
           "primary": false,
           "errors": [
          ],
           "size": 4194304,
           "omap_digest": "0xffffffff",
           "data_digest": "0x0d99bd77",
           "object_info": {
             "oid": {
               "oid": "rb.0.2479b45.238e1f29.000000125cbb",
               "key": "",
               "snapid": -2,
               "hash": 2016338238,
               "max": 0,
               "pool": 2,
               "namespace": ""
             },
             "version": "946857'2041225",
             "prior_version": "943431'2032262",
             "last_reqid": "osd.36.0:48196",
             "user_version": 2024222,
             "size": 4194304,
             "mtime": "2018-05-13 08:58:21.359912",
             "local_mtime": "2018-05-13 08:58:21.537637",
             "lost": 0,
             "flags": [
               "dirty",
               "data_digest",
               "omap_digest"
             ],
             "legacy_snaps": [
            ],
             "truncate_seq": 0,
             "truncate_size": 0,
             "data_digest": "0x0d99bd77",
             "omap_digest": "0xffffffff",
             "expected_object_size": 4194304,
             "expected_write_size": 4194304,
             "alloc_hint_flags": 0,
             "manifest": {
               "type": 0,
               "redirect_target": {
                 "oid": "",
                 "key": "",
                 "snapid": 0,
                 "hash": 0,
                 "max": 0,
                 "pool": -9.2233720368548e+18,
                 "namespace": ""
               }
             },
             "watchers": {
            }
           },
           "snapset": {
             "snap_context": {
               "seq": 4896,
               "snaps": [
              ]
             },
             "head_exists": 1,
             "clones": [
            ]
           }
         },
         {
           "osd": 305,
           "primary": false,
           "errors": [
          ],
           "size": 4194304,
           "omap_digest": "0xffffffff",
           "data_digest": "0x0d99bd77",
           "object_info": {
             "oid": {
               "oid": "rb.0.2479b45.238e1f29.000000125cbb",
               "key": "",
               "snapid": -2,
               "hash": 2016338238,
               "max": 0,
               "pool": 2,
               "namespace": ""
             },
             "version": "943431'2032262",
             "prior_version": "942275'2030618",
             "last_reqid": "osd.36.0:48196",
             "user_version": 2024222,
             "size": 4194304,
             "mtime": "2018-05-13 08:58:21.359912",
             "local_mtime": "2018-05-13 08:58:21.537637",
             "lost": 0,
             "flags": [
               "dirty",
               "data_digest",
               "omap_digest"
             ],
             "legacy_snaps": [
            ],
             "truncate_seq": 0,
             "truncate_size": 0,
             "data_digest": "0x0d99bd77",
             "omap_digest": "0xffffffff",
             "expected_object_size": 4194304,
             "expected_write_size": 4194304,
             "alloc_hint_flags": 0,
             "manifest": {
               "type": 0,
               "redirect_target": {
                 "oid": "",
                 "key": "",
                 "snapid": 0,
                 "hash": 0,
                 "max": 0,
                 "pool": -9.2233720368548e+18,
                 "namespace": ""
               }
             },
             "watchers": {
            }
           },
           "snapset": {
             "snap_context": {
               "seq": 4896,
               "snaps": [
                 4896
               ]
             },
             "head_exists": 1,
             "clones": [
            ]
           }
         },
         {
           "osd": 313,
           "primary": true,
           "errors": [
          ],
           "size": 4194304,
           "omap_digest": "0xffffffff",
           "data_digest": "0x0d99bd77",
           "object_info": {
             "oid": {
               "oid": "rb.0.2479b45.238e1f29.000000125cbb",
               "key": "",
               "snapid": -2,
               "hash": 2016338238,
               "max": 0,
               "pool": 2,
               "namespace": ""
             },
             "version": "943431'2032262",
             "prior_version": "942275'2030618",
             "last_reqid": "osd.36.0:48196",
             "user_version": 2024222,
             "size": 4194304,
             "mtime": "2018-05-13 08:58:21.359912",
             "local_mtime": "2018-05-13 08:58:21.537637",
             "lost": 0,
             "flags": [
               "dirty",
               "data_digest",
               "omap_digest"
             ],
             "legacy_snaps": [
            ],
             "truncate_seq": 0,
             "truncate_size": 0,
             "data_digest": "0x0d99bd77",
             "omap_digest": "0xffffffff",
             "expected_object_size": 4194304,
             "expected_write_size": 4194304,
             "alloc_hint_flags": 0,
             "manifest": {
               "type": 0,
               "redirect_target": {
                 "oid": "",
                 "key": "",
                 "snapid": 0,
                 "hash": 0,
                 "max": 0,
                 "pool": -9.2233720368548e+18,
                 "namespace": ""
               }
             },
             "watchers": {
            }
           },
           "snapset": {
             "snap_context": {
               "seq": 4896,
               "snaps": [
                 4896
               ]
             },
             "head_exists": 1,
             "clones": [
            ]
           }
         }
       ]
     }
   ]
}


--
Steve Anthony
LTS HPC Senior Analyst
Lehigh University
sma...@lehigh.edu

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to