Hi Daniel,

When I encounter an OSD which I can start, but which then stops on its
own after running for some period of time, then root cause has generally
been sectors pending reallocation on the hard drive the OSD is using.
The OSD will run fine until it attempts to read from the bad disk
sectors and then it produces a read error and drops offline.

You can check the disk using smartmon-tools, and if there are sectors
pending reallocation, remove the OSD from the cluster, use dd to write
zeros over the drive (this will cause the drive to reallocate spare
sectors to replace the bad sectors), then re-add the OSD to the cluster.

-Steve

On 02/05/2015 08:19 AM, Daniel Takatori Ohara wrote:
> Hello Alex,
>
> Thank's for the answer.
>
> In the server's, i use CentOS 6.6 with kernel 2.6.32, and in the
> clients i use Ubuntu 14 with kernel 3.16.
>
> And the version of the Ceph is 0.87.
>
> Thank's,
>
> Att.
>
> ---
> Daniel Takatori Ohara.
> System Administrator - Lab. of Bioinformatics
> Molecular Oncology Center 
> Instituto Sírio-Libanês de Ensino e Pesquisa
> Hospital Sírio-Libanês
> Phone: +55 11 3155-0200 (extension 1927)
> R: Cel. Nicolau dos Santos, 69
> São Paulo-SP. 01308-060
> http://www.bioinfo.mochsl.org.br
>
>
> On Thu, Feb 5, 2015 at 10:43 AM, Alexis KOALLA
> <alexis.koa...@orange.com <mailto:alexis.koa...@orange.com>> wrote:
>
>     Hi Daniel
>     Could you be more precise on your issue please?
>     What is the OS under which your ceph is running and what is the
>     ceph version you are currently running?
>
>     Anyway, I have exeprienced an issue that looks like yours.
>     I have  installed and configured a small cluster "microceph" on my
>     PC  for quick demo. AOn this cluster I have 4 OSDs and 1 MON .
>     There is no MDS.
>     I have written a script that starts the cluster.
>     In this script I start the monitor: ceph-mon -c
>     /path/to/yourceph/confile -i <mon_id>
>     I also start manually the 4 OSD like this :ceph-osd -c
>     /path/to/yourceph/confile -i <osd_id>
>
>     I also forced the OSD to be "in" after the start.
>     Right now it works fine.But I don't think it's the right ay to
>     process(start manually the OSD and putting them in )
>     May be it can give you an idea where to start investigation.
>
>     Regards
>     Alex
>
>
>     Le 05/02/2015 11:29, Daniel Takatori Ohara a écrit :
>>     Hi, anyone help me please.
>>
>>     I have a cluster with 4 OSD's, 1 MDS and 1 MON.
>>
>>     The osd.3 was down, and i need restart in the host with the
>>     command /etc/init.d/ceph restart osd.3.
>>
>>     The osd.0 is marked down sometimes, but he is marked up
>>     automatically.
>>
>>     [ceph@ceph-admin my-cluster]$ ceph osd tree
>>     # id    weight  type name       up/down reweight
>>     -1      50.63   root default
>>     -2      13.84           host ceph-osd1
>>     0       13.84                   osd.0   up      1
>>     -3      14.76           host ceph-osd2
>>     1       14.76                   osd.1   up      1
>>     -4      22.03           host ceph-osd3
>>     2       10.09                   osd.2   up      0.8
>>     3       11.94                   osd.3   down    0
>>
>>     Anyone, can help me, please?
>>
>>     Thank's,
>>
>>     Att.
>>
>>     ---
>>     Daniel Takatori Ohara.
>>     System Administrator - Lab. of Bioinformatics
>>     Molecular Oncology Center 
>>     Instituto Sírio-Libanês de Ensino e Pesquisa
>>     Hospital Sírio-Libanês
>>     Phone: +55 11 3155-0200 <tel:%2B55%2011%203155-0200> (extension 1927)
>>     R: Cel. Nicolau dos Santos, 69
>>     São Paulo-SP. 01308-060
>>     http://www.bioinfo.mochsl.org.br
>>
>>
>>
>>     _______________________________________________
>>     ceph-users mailing list
>>     ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
>>     http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>     -- 
>
>     logo Orange <http://www.orange.com/>
>
>     *Alexis KOALLA*
>
>     Orange/IMT/OLPS/ASE/DAPI/CSE
>
>     Spécialiste en Technologies/Cloud Storage Services & Plateformes
>
>     Specialist  in Technologies/Cloud Storage Services & Platforms
>
>     Tel :+33(0) 299 124 939 / +33 670 698 929
>     alexis.koa...@orange.com <mailto:alexis.koa...@orange.com>
>
>
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

-- 
Steve Anthony
LTS HPC Support Specialist
Lehigh University
sma...@lehigh.edu

Attachment: signature.asc
Description: OpenPGP digital signature

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to