[ceph-users] Re: 1 pg inconsistent

koukou73gr Fri, 17 Jul 2020 09:00:12 -0700

Yeap, osd.24 has experienced a read error. If you check the system logon osd.24 host, you'll probably find some relevant kernel messages aboutSATA errors. The LBA sector that triggered the read error is logged inthe kernel messages.

If you run a smartctl -x on the osd.24 SATA disk device you'll probablyfind out the disk has started accumulating "pending sectors" or has evenincreased the "relocated sector count" significantly. It may also havesignificantly higher "multi zone error rate" than other disks of thesame age.


Immediate action would be to decommission the disk.

Once decouple from ceph, you may run the disk vendor's diagnostics suiteon the drive and it may come clean in the end, but even so, I'd use itfor non-critical staff thereafter.


Best regards.



On 17/07/2020 12.38, Abhimnyu Dhobale wrote:

Thanks For your reply,

Please find the below output and suggest.

[root@vpsapohmcs01 ~]# rados -p vpsacephcl01 list-inconsistent-obj 1.3c9
--format=json-pretty
{
     "epoch": 845,
     "inconsistents": [
         {
             "object": {
                 "name": "rbd_data.515c96b8b4567.000000000000c377",
                 "nspace": "",
                 "locator": "",
                 "snap": "head",
                 "version": 21101
             },
             "errors": [],
             "union_shard_errors": [
                 "read_error"
             ],
             "selected_object_info": {
                 "oid": {
                     "oid": "rbd_data.515c96b8b4567.000000000000c377",
                     "key": "",
                     "snapid": -2,
                     "hash": 867656649,
                     "max": 0,
                     "pool": 1,
                     "namespace": ""
                 },
                 "version": "853'21101",
                 "prior_version": "853'21100",
                 "last_reqid": "client.2317742.0:24909022",
                 "user_version": 21101,
                 "size": 4194304,
                 "mtime": "2020-07-16 21:02:20.564245",
                 "local_mtime": "2020-07-16 21:02:20.572003",
                 "lost": 0,
                 "flags": [
                     "dirty",
                     "omap_digest"
                 ],
                 "truncate_seq": 0,
                 "truncate_size": 0,
                 "data_digest": "0xffffffff",
                 "omap_digest": "0xffffffff",
                 "expected_object_size": 4194304,
                 "expected_write_size": 4194304,
                 "alloc_hint_flags": 0,
                 "manifest": {
                     "type": 0
                 },
                 "watchers": {}
             },
             "shards": [
                 {
                     "osd": 5,
                     "primary": false,
                     "errors": [],
                     "size": 4194304,
                     "omap_digest": "0xffffffff",
                     "data_digest": "0x8ebd7de4"
                 },
                 {
                     "osd": 19,
                     "primary": true,
                     "errors": [],
                     "size": 4194304,
                     "omap_digest": "0xffffffff",
                     "data_digest": "0x8ebd7de4"
                 },
                 {
                     "osd": 24,
                     "primary": false,
                     "errors": [
                         "read_error"
                     ],
                     "size": 4194304
                 }
             ]
         }
     ]
}



On Tue, Jul 14, 2020 at 6:40 PM Eric Smith <eric.sm...@vecima.com> wrote:

If you run (Substitute your pool name for <pool>):

rados -p <pool> list-inconsistent-obj 1.574 --format=json-pretty

You should get some detailed information about which piece of data
actually has the error and you can determine what to do with it from there.

-----Original Message-----
From: Abhimnyu Dhobale <adhoba...@gmail.com>
Sent: Tuesday, July 14, 2020 5:13 AM
To: ceph-users@ceph.io
Subject: [ceph-users] 1 pg inconsistent

Good Day,

Ceph is showing below error frequently. every time after pg repair it is
resolved.

[root@vpsapohmcs01 ~]# ceph health detail HEALTH_ERR 1 scrub errors;
Possible data damage: 1 pg inconsistent OSD_SCRUB_ERRORS 1 scrub errors
PG_DAMAGED Possible data damage: 1 pg inconsistent pg 1.574 is
active+clean+inconsistent, acting [19,25,2]

[root@vpsapohmcs02 ~]# cat /var/log/ceph/ceph-osd.19.log | grep error
2020-07-12 11:42:11.824 7f864e0b2700 -1 log_channel(cluster) log [ERR] :
1.574 shard 25 soid
1:2ea0a7a3:::rbd_data.515c96b8b4567.0000000000007a7c:head : candidate had
a read error
2020-07-12 11:42:15.035 7f86520ba700 -1 log_channel(cluster) log [ERR] :
1.574 deep-scrub 1 errors

[root@vpsapohmcs01 ~]# ceph --version
ceph version 13.2.10 (564bdc4ae87418a232fc901524470e1a0f76d641) mimic
(stable)

Request you to please suggest.

--
Thanks & Regards
Abhimnyu Dhobale
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an
email to ceph-users-le...@ceph.io

_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: 1 pg inconsistent

Reply via email to