Yeap, osd.24 has experienced a read error. If you check the system log on osd.24 host, you'll probably find some relevant kernel messages about SATA errors. The LBA sector that triggered the read error is logged in the kernel messages.

If you run a smartctl -x on the osd.24 SATA disk device you'll probably find out the disk has started accumulating "pending sectors" or has even increased the "relocated sector count" significantly. It may also have significantly higher "multi zone error rate" than other disks of the same age.

Immediate action would be to decommission the disk.

Once decouple from ceph, you may run the disk vendor's diagnostics suite on the drive and it may come clean in the end, but even so, I'd use it for non-critical staff thereafter.

Best regards.



On 17/07/2020 12.38, Abhimnyu Dhobale wrote:
Thanks For your reply,

Please find the below output and suggest.

[root@vpsapohmcs01 ~]# rados -p vpsacephcl01 list-inconsistent-obj 1.3c9
--format=json-pretty
{
     "epoch": 845,
     "inconsistents": [
         {
             "object": {
                 "name": "rbd_data.515c96b8b4567.000000000000c377",
                 "nspace": "",
                 "locator": "",
                 "snap": "head",
                 "version": 21101
             },
             "errors": [],
             "union_shard_errors": [
                 "read_error"
             ],
             "selected_object_info": {
                 "oid": {
                     "oid": "rbd_data.515c96b8b4567.000000000000c377",
                     "key": "",
                     "snapid": -2,
                     "hash": 867656649,
                     "max": 0,
                     "pool": 1,
                     "namespace": ""
                 },
                 "version": "853'21101",
                 "prior_version": "853'21100",
                 "last_reqid": "client.2317742.0:24909022",
                 "user_version": 21101,
                 "size": 4194304,
                 "mtime": "2020-07-16 21:02:20.564245",
                 "local_mtime": "2020-07-16 21:02:20.572003",
                 "lost": 0,
                 "flags": [
                     "dirty",
                     "omap_digest"
                 ],
                 "truncate_seq": 0,
                 "truncate_size": 0,
                 "data_digest": "0xffffffff",
                 "omap_digest": "0xffffffff",
                 "expected_object_size": 4194304,
                 "expected_write_size": 4194304,
                 "alloc_hint_flags": 0,
                 "manifest": {
                     "type": 0
                 },
                 "watchers": {}
             },
             "shards": [
                 {
                     "osd": 5,
                     "primary": false,
                     "errors": [],
                     "size": 4194304,
                     "omap_digest": "0xffffffff",
                     "data_digest": "0x8ebd7de4"
                 },
                 {
                     "osd": 19,
                     "primary": true,
                     "errors": [],
                     "size": 4194304,
                     "omap_digest": "0xffffffff",
                     "data_digest": "0x8ebd7de4"
                 },
                 {
                     "osd": 24,
                     "primary": false,
                     "errors": [
                         "read_error"
                     ],
                     "size": 4194304
                 }
             ]
         }
     ]
}



On Tue, Jul 14, 2020 at 6:40 PM Eric Smith <eric.sm...@vecima.com> wrote:

If you run (Substitute your pool name for <pool>):

rados -p <pool> list-inconsistent-obj 1.574 --format=json-pretty

You should get some detailed information about which piece of data
actually has the error and you can determine what to do with it from there.

-----Original Message-----
From: Abhimnyu Dhobale <adhoba...@gmail.com>
Sent: Tuesday, July 14, 2020 5:13 AM
To: ceph-users@ceph.io
Subject: [ceph-users] 1 pg inconsistent

Good Day,

Ceph is showing below error frequently. every time after pg repair it is
resolved.

[root@vpsapohmcs01 ~]# ceph health detail HEALTH_ERR 1 scrub errors;
Possible data damage: 1 pg inconsistent OSD_SCRUB_ERRORS 1 scrub errors
PG_DAMAGED Possible data damage: 1 pg inconsistent pg 1.574 is
active+clean+inconsistent, acting [19,25,2]

[root@vpsapohmcs02 ~]# cat /var/log/ceph/ceph-osd.19.log | grep error
2020-07-12 11:42:11.824 7f864e0b2700 -1 log_channel(cluster) log [ERR] :
1.574 shard 25 soid
1:2ea0a7a3:::rbd_data.515c96b8b4567.0000000000007a7c:head : candidate had
a read error
2020-07-12 11:42:15.035 7f86520ba700 -1 log_channel(cluster) log [ERR] :
1.574 deep-scrub 1 errors

[root@vpsapohmcs01 ~]# ceph --version
ceph version 13.2.10 (564bdc4ae87418a232fc901524470e1a0f76d641) mimic
(stable)

Request you to please suggest.

--
Thanks & Regards
Abhimnyu Dhobale
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an
email to ceph-users-le...@ceph.io



_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to