Hi, what is "min_size" on that pool? How many osd nodes you have in cluster and do you use any custom crushmap?
On Wed, Aug 1, 2018 at 1:57 PM, shrey chauhan <shrey.chau...@enclouden.com> wrote: > Hi, > > I am trying to understand what happens when an OSD fails. > > Few days back I wanted to check what happens when an OSD goes down for that > what I did was I just went to the node and stopped one of the osd's > service. When OSD went in down and out state pgs started recovering and > after > sometime everything seemed fine as everything was recovered and the osd > went in OUT and DOWN state I thought great I don't really have to worry > about loss of data on osd going down. > But recently an OSD went down on its own and I saw pgs were not able to > recover they went to down state and everything was stuck, so I had to run > this command > > ceph osd lost osd_number > > Which is not really safe and I might lose data here. > I am not able to understand why it did not happen when I stopped the > service the first time and why did it actually happen. As in RF2/EC21 all > OSD > data is replicated/erasure coded to other osds so Ideally the cluster > should have come back in normal state on its own. > > Can someone please explain what am I missing here? > > Should I worry about putting my production data in cluster here? > > > Thanks > > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com