[ceph-users] Re: How do I get a sector marked bad?

Dan van der Ster Mon, 30 Mar 2020 13:09:20 -0700

Hi,

I have a feeling that the pg repair didn't actually run yet. Sometimes
if the OSDs are busy scrubbing, the repair doesn't start when you ask
it to.
You can force it through with something like:


ceph osd set noscrub
ceph osd set nodeep-scrub
ceph config set osd_max_scrubs 3
ceph pg repair <the pg>
ceph status # and check that the repair really started
ceph config set osd_max_scrubs 1
ceph osd unset nodeep-scrub
ceph osd unset noscrub

Once repair runs/completes, it will rewrite the inconsistent object
replica (to a new place on the disk). Check your ceph.log to see when
this happens.

>From my experience, the PendingSectors counter will not be decremented
until that sector is written again (which will happen at some random
point in the future when bluestore allocates some new data there).

Hope that helps,

Dan


On Mon, Mar 30, 2020 at 9:00 AM David Herselman <d...@syrex.co> wrote:
>
> Hi,
>
> We have a single inconsistent placement group where I then subsequently 
> triggered a deep scrub and tried doing a 'pg repair'. The placement group 
> remains in an inconsistent state.
>
> How do I discard the objects for this placement group only on the one OSD and 
> get Ceph to essentially write the data out new. Drives will only mark a 
> sector as remapped when asked to overwrite the problematic sector or repeated 
> reads of the failed sector eventually succeed (this is my limited 
> understanding).
>
> Nothing useful in the 'ceph pg 1.35 query' output that I could decipher. Then 
> ran 'ceph pg deep-scrub 1.35' and 'rados list-inconsistent-obj 1.35' 
> thereafter indicates a read error on one of the copies:
> {"epoch":25776,"inconsistents":[{"object":{"name":"rbd_data.746f3c94fb3a42.000000000001e48d","nspace":"","locator":"","snap":"head","version":34866184},"errors":[],"union_shard_errors":["read_error"],"selected_object_info":{"oid":{"oid":"rbd_data.746f3c94fb3a42.000000000001e48d","key":"","snapid":-2,"hash":3814100149,"max":0,"pool":1,"namespace":""},"version":"22845'1781037","prior_version":"22641'1771494","last_reqid":"client.136837683.0:124047","user_version":34866184,"size":4194304,"mtime":"2020-03-08
>  17:59:00.159846","local_mtime":"2020-03-08 
> 17:59:00.159670","lost":0,"flags":["dirty","data_digest","omap_digest"],"truncate_seq":0,"truncate_size":0,"data_digest":"0x031cb17c","omap_digest":"0xffffffff","expected_object_size":4194304,"expected_write_size":4194304,"alloc_hint_flags":0,"manifest":{"type":0},"watchers":{}},"shards":[{"osd":51,"primary":false,"errors":["read_error"],"size":4194304},{"osd":60,"primary":false,"errors":[],"size":4194304,"omap_digest":"0xffffffff","data_d
 ig
>  
> est":"0x031cb17c"},{"osd":82,"primary":true,"errors":[],"size":4194304,"omap_digest":"0xffffffff","data_digest":"0x031cb17c"}]}]}
>
> /var/log/syslog:
> Mar 30 08:40:40 kvm1e kernel: [74792.229021] ata2.00: exception Emask 0x0 
> SAct 0x2 SErr 0x0 action 0x0
> Mar 30 08:40:40 kvm1e kernel: [74792.230416] ata2.00: irq_stat 0x40000008
> Mar 30 08:40:40 kvm1e kernel: [74792.231715] ata2.00: failed command: READ 
> FPDMA QUEUED
> Mar 30 08:40:40 kvm1e kernel: [74792.233071] ata2.00: cmd 
> 60/00:08:00:7a:50/04:00:c9:00:00/40 tag 1 ncq dma 524288 in
> Mar 30 08:40:40 kvm1e kernel: [74792.233071]          res 
> 43/40:00:10:7b:50/00:04:c9:00:00/00 Emask 0x409 (media error) <F>
> Mar 30 08:40:40 kvm1e kernel: [74792.235736] ata2.00: status: { DRDY SENSE 
> ERR }
> Mar 30 08:40:40 kvm1e kernel: [74792.237045] ata2.00: error: { UNC }
> Mar 30 08:40:40 kvm1e ceph-osd[450777]: 2020-03-30 08:40:40.240 7f48a41f3700 
> -1 bluestore(/var/lib/ceph/osd/ceph-51) _do_read bdev-read failed: (5) 
> Input/output error
> Mar 30 08:40:40 kvm1e kernel: [74792.244914] ata2.00: configured for UDMA/133
> Mar 30 08:40:40 kvm1e kernel: [74792.244938] sd 1:0:0:0: [sdb] tag#1 FAILED 
> Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
> Mar 30 08:40:40 kvm1e kernel: [74792.244942] sd 1:0:0:0: [sdb] tag#1 Sense 
> Key : Medium Error [current]
> Mar 30 08:40:40 kvm1e kernel: [74792.244945] sd 1:0:0:0: [sdb] tag#1 Add. 
> Sense: Unrecovered read error - auto reallocate failed
> Mar 30 08:40:40 kvm1e kernel: [74792.244949] sd 1:0:0:0: [sdb] tag#1 CDB: 
> Read(16) 88 00 00 00 00 00 c9 50 7a 00 00 00 04 00 00 00
> Mar 30 08:40:40 kvm1e kernel: [74792.244953] blk_update_request: I/O error, 
> dev sdb, sector 3377494800 op 0x0:(READ) flags 0x0 phys_seg 94 prio class 0
> Mar 30 08:40:40 kvm1e kernel: [74792.246238] ata2: EH complete
>
>
> Regards
> David Herselman
> _______________________________________________
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: How do I get a sector marked bad?

Reply via email to