Re: [ceph-users] HDD bad sector, pg inconsistent, no object remapping

Martin B Nielsen Wed, 13 Nov 2013 00:35:22 -0800

Probably common sense but I was bitten by this once in a likewise
situation..


If you run 3x replica and distribute them over 3x hosts (is that default
now?) make sure that the disks on the host with the failed disk have space
for it - the remaining two disks will have to hold the content of the
failed disk and if they can't, your cluster will run full and halt.

Cheers,
Martin


On Wed, Nov 13, 2013 at 12:59 AM, David Zafman <david.zaf...@inktank.com>wrote:

>
> Since the disk is failing and you have 2 other copies I would take osd.0
> down.  This means that ceph will not attempt to read the bad disk either
> for clients or to make another copy of the data:
>
> ***** Not sure about the syntax of this for the version of ceph you are
> running
> ceph osd down 0
>
> Mark it “out” which will immediately trigger recovery to create more
> copies of the data with the remaining OSDs.
> ceph osd out 0
>
> You can now finish the process of removing the osd by looking at these
> instructions:
>
>
> http://ceph.com/docs/master/rados/operations/add-or-rm-osds/#removing-osds-manual
>
> David Zafman
> Senior Developer
> http://www.inktank.com
>
> On Nov 12, 2013, at 3:16 AM, Mihály Árva-Tóth <
> mihaly.arva-t...@virtual-call-center.eu> wrote:
>
> > Hello,
> >
> > I have 3 node, with 3 OSD in each node. I'm using .rgw.buckets pool with
> 3 replica. One of my HDD (osd.0) has just bad sectors, when I try to read
> an object from OSD direct, I get Input/output errror. dmesg:
> >
> > [1214525.670065] mpt2sas0: log_info(0x31080000): originator(PL),
> code(0x08), sub_code(0x0000)
> > [1214525.670072] mpt2sas0: log_info(0x31080000): originator(PL),
> code(0x08), sub_code(0x0000)
> > [1214525.670100] sd 0:0:2:0: [sdc] Unhandled sense code
> > [1214525.670104] sd 0:0:2:0: [sdc]
> > [1214525.670107] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
> > [1214525.670110] sd 0:0:2:0: [sdc]
> > [1214525.670112] Sense Key : Medium Error [current]
> > [1214525.670117] Info fld=0x60c8f21
> > [1214525.670120] sd 0:0:2:0: [sdc]
> > [1214525.670123] Add. Sense: Unrecovered read error
> > [1214525.670126] sd 0:0:2:0: [sdc] CDB:
> > [1214525.670128] Read(16): 88 00 00 00 00 00 06 0c 8f 20 00 00 00 08 00
> 00
> >
> > Okay I known need to replace HDD.
> >
> > Fragment of ceph -s  output:
> >   pgmap v922039: 856 pgs: 855 active+clean, 1 active+clean+inconsistent;
> >
> > ceph pg dump | grep inconsistent
> >
> > 11.15d  25443   0       0       0       6185091790      3001    3001
>  active+clean+inconsistent       2013-11-06 02:30:45.23416.....
> >
> > ceph pg map 11.15d
> >
> > osdmap e1600 pg 11.15d (11.15d) -> up [0,8,3] acting [0,8,3]
> >
> > pg repair or deep-scrub can not fix this issue. But if I understand
> correctly, osd has to known it can not retrieve object from osd.0 and need
> to be replicate an another osd because there is no 3 working replicas now.
> >
> > Thank you,
> > Mihaly
> > _______________________________________________
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] HDD bad sector, pg inconsistent, no object remapping

Reply via email to