Hi guys ... Thanks a lot for your support. I discovered what happened.
I had 2 monitors, osnode01 and osnode02. I tried do add a 3rd by using ceph-deploy. ceph-deploy was using a key different from the one used by my monitor cluster. So, I added osnode08 to the monitor cluster and it did not become part of the quorum. I removed it, and removed osnode02. The monitor count should be in an odd number. When I did that, my ceph stopped. I readded osnode02 to the monitor cluster. The thing is that I added using a wrong key. I don't know why ceph-deploy started using a different key. As suggested by Joao Eduardo, removing auth I could bring part of ceph up. After that I troubleshooted this key problem, solved it, and know my whole cluster is recovered and running fine ... Thanks a lot. Jose Tavares On Tue, Nov 17, 2015 at 11:13 AM, Jose Tavares <j...@terra.com.br> wrote: > Now I tried to inject the latest map I had. > Also, I created a second monitor on osnode02, like I had before, using the > same map. > I started both monitors ... > > Logs from osnode01 show my content ... and then it started to show lines > like > > 2015-11-17 10:56:26.515069 7fc73af67700 0 > mon.osnode01@0(probing).data_health(1) > update_stats avail 19% total 220 GB, used 178 GB, avail 43178 MB > > What does that mean? > Attached are the logs. > > Thanks a lot. > Jose Tavares > > > > > > > > > On Tue, Nov 17, 2015 at 10:33 AM, Jose Tavares <j...@terra.com.br> wrote: > >> >> >> On Tue, Nov 17, 2015 at 7:27 AM, Joao Eduardo Luis <j...@suse.de> wrote: >> >>> On 11/17/2015 03:56 AM, Jose Tavares wrote: >>> > The problem is that I think I don't have any good monitor anymore. >>> > How do I know if the map I am trying is ok? >>> > >>> > I also saw in the logs that the primary mon was trying to contact a >>> > removed mon at IP .112 .. So, I added .112 again ... and it didn't >>> help. >>> > >>> > Attached are the logs of what is going on and some monmaps that I >>> > capture that were from minutes before the cluster become inaccessible >>> .. >>> > >>> > Should I try inject this monmaps in my primary mon to see if it can >>> > recover the cluster? >>> > Is it possible to see if this monmaps match my content? >>> >>> Without access to the actual store.db there's no way to ascertain if the >>> store has any problems, and even then figuring out a potential >>> corruption from just one monitor store.db would either be impossible or >>> impractical. >>> >> >> I posted my store.db in my previous answer .. >> >> >> >>> >>> That said, from the log you attached it seems you only have issues with >>> authentication: you have pgmaps from epoch 91923 through to 92589, you >>> have an mds map (epoch 38), osdmaps at least through epoch 307, and 40 >>> versions for the auth keys. >>> >>> Somehow, though, your monitors are unable to authenticate each other. No >>> way to tell if that was corruption or user error. >>> >>> You should be able to get your monitors back to speaking terms again >>> simply by disabling cephx temporarily. Then you can figure out whatever >>> you need to figure out in terms of monitor keys. >>> >>> Just update your ceph.conf with 'auth supported = none' and restart the >>> monitors. See how it goes from there. >>> >> >> I tried your suggestion and it didn't make any change to the results .. :( >> >> Thanks a lot. >> Jose Tavares >> >> >> >>> HTH >>> >>> -Joao >>> >>> >>> >>> > >>> > Thanks a lot. >>> > Jose Tavares >>> > >>> > >>> > >>> > >>> > >>> > On Mon, Nov 16, 2015 at 10:48 PM, Nathan Harper >>> > <nathan.har...@cfms.org.uk <mailto:nathan.har...@cfms.org.uk>> wrote: >>> > >>> > I had to go through a similar process when we had a disaster which >>> > destroyed one of our monitors. I followed the process here: >>> > REMOVING MONITORS FROM AN UNHEALTHY CLUSTER >>> > <http://docs.ceph.com/docs/hammer/rados/operations/add-or-rm-mons/> >>> to >>> > remove all but one monitor, which let me bring the cluster back up. >>> > >>> > As you are running an older version of Ceph than hammer, some of >>> the >>> > commands might differ (perhaps this might >>> > help >>> http://docs.ceph.com/docs/v0.80/rados/operations/add-or-rm-mons/) >>> > >>> > >>> > -- >>> > *Nathan Harper*// IT Systems Architect >>> > >>> > *e: * nathan.har...@cfms.org.uk <mailto:nathan.har...@cfms.org.uk> >>> > // *t: * 0117 906 1104 // *m: * 07875 510891 // *w: * >>> > www.cfms.org.uk <http://www.cfms.org.uk%22> // Linkedin grey icon >>> > scaled <http://uk.linkedin.com/pub/nathan-harper/21/696/b81> >>> > CFMS Services Ltd// Bristol & Bath Science Park // Dirac Crescent >>> // >>> > Emersons Green // Bristol // BS16 7FR >>> > >>> > CFMS Services Ltd is registered in England and Wales No 05742022 - >>> a >>> > subsidiary of CFMS Ltd >>> > CFMS Services Ltd registered office // Victoria House // 51 >>> Victoria >>> > Street // Bristol // BS1 6AD >>> > >>> > On 16 November 2015 at 16:50, Jose Tavares <j...@terra.com.br >>> > <mailto:j...@terra.com.br>> wrote: >>> > >>> > Hi guys ... >>> > I need some help as my cluster seems to be corrupted. >>> > >>> > I saw here .. >>> > >>> https://www.mail-archive.com/ceph-users@lists.ceph.com/msg01919.html >>> > .. a msg from 2013 where Peter had a problem with his monitors. >>> > >>> > I had the same problem today when trying to add a new monitor, >>> > and than playing with monmap as the monitors were not entering >>> > the quorum. I'm using version 0.80.8. >>> > >>> > Right now my cluster won't start because of a corrupted >>> monitor. >>> > Is it possible to remove all monitors and create just a new one >>> > without losing data? I have ~260GB of data with work from 2 >>> weeks. >>> > >>> > What should I do? Do you recommend any specific procedure? >>> > >>> > Thanks a lot. >>> > Jose Tavares >>> > >>> > _______________________________________________ >>> > ceph-users mailing list >>> > ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com> >>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>> > >>> > >>> > >>> > >>> > >>> > _______________________________________________ >>> > ceph-users mailing list >>> > ceph-users@lists.ceph.com >>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>> > >>> >>> >> >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com