Hi,

what is "min_size" on that pool? How many osd nodes you have in cluster and
do you use any custom crushmap?

On Wed, Aug 1, 2018 at 1:57 PM, shrey chauhan <shrey.chau...@enclouden.com>
wrote:

> Hi,
>
> I am trying to understand what happens when an OSD fails.
>
> Few days back I wanted to check what happens when an OSD goes down for that
> what I did was I just went to the node and stopped one of the osd's
> service. When OSD went in down and out state pgs started recovering and
> after
> sometime everything seemed fine as everything was recovered and the osd
> went in OUT and DOWN state I thought great I don't really have to worry
> about loss of data on osd going down.
> But recently an OSD went down on its own and I saw pgs were not able to
> recover they went to down state and everything was stuck, so I had to run
> this command
>
> ceph osd lost osd_number
>
> Which is not really safe and I might lose data here.
> I am not able to understand why it did not happen when I stopped the
> service the first time and why did it actually happen. As in RF2/EC21 all
> OSD
> data is replicated/erasure coded to other osds so Ideally the cluster
> should have come back in normal state on its own.
>
> Can someone please explain what am I missing here?
>
> Should I worry about putting my production data in cluster here?
>
>
> Thanks
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to