Hi Karan, Thanks for your reply, OK I have spent some time on it and finally found a problem regarding this issue
1) If I reboot any of the node, and when its back then the OSD service are not start due to unmount of /var/lib/ceph/osd/ceph-0 then I manually edit /etc/fstab and add the mount point of ceph osd storage e.g. *UUID=142136cd-8325-44a7-ad67-80fe19ed3873 /var/lib/ceph/osd/ceph-0 xfs defaults,noatime* the above fixed the iss. Now question: is the valid approach? and why on reboot ceph not activated the osd drive? 2) After fixing the above issue I again reboot my all nodes, now this time there is another warning * health HEALTH_WARN clock skew detected on mon.vms2* here is the output health HEALTH_WARN clock skew detected on mon.vms2 monmap e1: 2 mons at {vms1= 192.168.1.128:6789/0,vms2=192.168.1.129:6789/0}, election epoch 14, quorum 0,1 vms1,vms2 mdsmap e11: 1/1/1 up {0=vms1=up:active} osdmap e36: 3 osds: 3 up, 3 in My current setup is 3 osd, 2 mons and 1 msd Br. Umar On Tue, Dec 17, 2013 at 2:54 PM, Karan Singh <ksi...@csc.fi> wrote: > Umar > > *Ceph is stable for production* , there are a large number of ceph > clusters deployed and running smoothly in PRODUCTIONS and countless in > testing / pre-production. > > Since you are facing problems with your ceph testing , it does not mean > CEPH is unstable. > > I would suggest put some time troubleshooting your problem. > > What i see from your logs -- > > 1) you have 2 Mons thats a problem ( either have 1 or have 3 to form > quorum ) . Add 1 more monitor node > 2) out of 2 OSD , only 1 is IN , check where is the other one and try > bringing both of them UP . Add few more OSD's to remove health warning . 2 > OSD is a very less numbers for OSD > > Many Thanks > Karan Singh > > > ------------------------------ > *From: *"Umar Draz" <unix...@gmail.com> > *To: *ceph-us...@ceph.com > *Sent: *Tuesday, 17 December, 2013 8:51:27 AM > *Subject: *[ceph-users] After reboot nothing worked > > > Hello, > > I have 2 node ceph cluster, I just rebooted both of the host just for > testing that after rebooting the cluster remain work or not, and the result > was cluster unable to start. > > here is ceph -s output > > health HEALTH_WARN 704 pgs stale; 704 pgs stuck stale; mds cluster is > degraded; 1/1 in osds are down; clock skew detected on mon.kvm2 > monmap e2: 2 mons at {kvm1= > 192.168.214.10:6789/0,kvm2=192.168.214.11:6789/0}, election epoch 16, > quorum 0,1 kvm1,kvm2 > mdsmap e13: 1/1/1 up {0=kvm1=up:replay} > osdmap e29: 2 osds: 0 up, 1 in > pgmap v68: 704 pgs, 4 pools, 9603 bytes data, 23 objects > 1062 MB used, 80816 MB / 81879 MB avail > 704 stale+active+clean > > according to this useless documentation. > > http://ceph.com/docs/master/rados/operations/monitoring-osd-pg/ > > I tried ceph osd tree > > the output was > > # id weight type name up/down reweight > -1 0.16 root default > -2 0.07999 host kvm1 > 0 0.07999 osd.0 down 1 > -3 0.07999 host kvm2 > 1 0.07999 osd.1 down 0 > > Then i tried > > sudo /etc/init.d/ceph -a start osd.0 > sudo /etc/init.d/ceph -a start osd.1 > > to start the osd on both host the result was > > /etc/init.d/ceph: osd.0 not found (/etc/ceph/ceph.conf defines , > /var/lib/ceph defines ) > > /etc/init.d/ceph: osd.1 not found (/etc/ceph/ceph.conf defines , > /var/lib/ceph defines ) > > Now question is what is this? is really ceph is stable? can we use this > for production environment? > > My both host has ntp running the time is upto date. > > Br. > > Umar > > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > -- Umar Draz Network Architect
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com