I'd have to look for details, but I don't think the auth monitor ever removes those keys, so if there are some missing, it sounds like some data got lost out from underneath it. That could have happened if the filesystem dropped a file, which we have seen on some kernels. -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com
On Tue, Jun 10, 2014 at 3:31 AM, Mohammad Salehe <sal...@gmail.com> wrote: > Hi Greg, > > Thank for your suggestion and information. I've installed the cluster over > again. > > I just wanted to investigate a little more based on your information. I can > see that auth/paxos values in monitor K/V store are these: > 'authfirst_commited': 251 > 'authlast_commited': 329 > > and I have all the keys 'auth251'...'auth329' in there. However, there is no > 'auth1' or 'auth250' but it seems monitor failed while reading 'auth1'. Is > this normal? > As a side note, I did not use cephx in this cluster. > > Thanks, > > > 2014-06-09 22:11 GMT+04:30 Gregory Farnum <g...@inktank.com>: >> >> Barring a newly-introduced bug (doubtful), that assert basically means >> that your computer lied to the ceph monitor about the durability or >> ordering of data going to disk, and the store is now inconsistent. If >> you don't have data you care about on the cluster, by far your best >> option is: >> 1) Figure out what part of the system is lying about data durability >> (probably your filesystem or controller is ignoring barriers), >> 2) start the Ceph install over >> It's possible that the ceph-monstore-tool will let you edit the store >> back into a consistent state, but it looks like the system can't find >> the *initial* commit, which means you'll need to manufacture a new one >> wholesale with the right keys from the other system components. >> >> (I am assuming that the system didn't crash right while you were >> turning on the monitor for the first time; if it did that makes it >> slightly more likely to be a bug on our end, but again it'll be >> easiest to just start over since you don't have any data in it yet.) >> -Greg >> Software Engineer #42 @ http://inktank.com | http://ceph.com >> >> >> On Sun, Jun 8, 2014 at 10:26 PM, Mohammad Salehe <sal...@gmail.com> wrote: >> > Hi, >> > >> > I'm receiving failed assertion in AuthMonitor::update_from_paxos(bool*) >> > after a system crash. I've saved a complete monitor log with 10/20 for >> > 'mon' >> > and 'paxos' here. >> > There is only one monitor and two OSDs in the cluster as I was just at >> > the >> > beginning of deployment. >> > >> > I will be thankful if someone could help. >> > >> > -- >> > Mohammad Salehe >> > sal...@gmail.com >> > >> > _______________________________________________ >> > ceph-users mailing list >> > ceph-users@lists.ceph.com >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> > > > > > > -- > Mohammad Salehe > sal...@gmail.com _______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com