The crmd process looks to have stalled. Can you re-run with debug turned on in openais.conf?
On Mon, Oct 12, 2009 at 6:09 PM, Stratos Zolotas <str...@gmail.com> wrote: > > > On Mon, Oct 12, 2009 at 5:57 PM, Dejan Muhamedagic <deja...@fastmail.fm> > wrote: >> >> Hi, >> >> On Mon, Oct 12, 2009 at 03:32:15PM +0300, Stratos Zolotas wrote: >> > On Mon, Oct 12, 2009 at 3:10 PM, Dejan Muhamedagic >> > <deja...@fastmail.fm>wrote: >> > >> > > On Mon, Oct 12, 2009 at 02:57:29PM +0300, Stratos Zolotas wrote: >> > > > On Mon, Oct 12, 2009 at 2:51 PM, Dejan Muhamedagic >> > > > <deja...@fastmail.fm >> > > >wrote: >> > > > >> > > > > Hi, >> > > > > >> > > > > On Mon, Oct 12, 2009 at 02:42:25PM +0300, Stratos Zolotas wrote: >> > > > > > Hello to the list!!! >> > > > > > >> > > > > > This is my first question to the list and my first attempt to >> > > > > > built a >> > > two >> > > > > > node cluster on opensuse 11.1 with pacemaker 1.0.5 and openais >> > > 0.80.5, so >> > > > > > please forgive my lack of knowledge. >> > > > > > >> > > > > > I'm trying to build a Active/Passive scenario but i have the >> > > following on >> > > > > > both nodes: >> > > > > > >> > > > > > Oct 12 14:05:57 alpha kernel: crmd[30704]: segfault at 18 ip >> > > > > > 00007f7770526eee sp 00007fffc7379810 error 4 in >> > > > > > libplumb.so.2.0.0[7f777050a000+30000] >> > > > > >> > > > > It'd be excellent to see the backtrace, providing that there are >> > > > > core files. Please enable core file generation if there are none. >> > > > > If you don't know about backtraces, just use hb_report to capture >> > > > > it. >> > > > > >> > > > > > As result i'm getting the following: >> > > > > >> > > > > That's not the consequence of the previous problem. >> > > > > >> > > > > > alpha:/etc/ais # crm_mon --one-shot -V >> > > > > > crm_mon[30911]: 2009/10/12_14:39:00 ERROR: unpack_resources: No >> > > STONITH >> > > > > > resources have been defined >> > > > > > crm_mon[30911]: 2009/10/12_14:39:00 ERROR: unpack_resources: >> > > > > > Either >> > > > > > configure some or disable STONITH with the stonith-enabled >> > > > > > option >> > > > > > crm_mon[30911]: 2009/10/12_14:39:00 ERROR: unpack_resources: >> > > > > > NOTE: >> > > > > Clusters >> > > > > > with shared data need STONITH to ensure data integrity >> > > > > >> > > > > Thanks, >> > > > > >> > > > > Dejan >> > > > > >> > > > > > >> > > > > > ============ >> > > > > > Last updated: Mon Oct 12 14:39:00 2009 >> > > > > > Current DC: NONE >> > > > > > 0 Nodes configured, unknown expected votes >> > > > > > 0 Resources configured. >> > > > > > ============ >> > > > > > >> > > > > > The errors are regarding the configuration (i have search about >> > > > > > them) >> > > > > that i >> > > > > > am unable to do at the moment because "crm configure" cannot >> > > > > > connect >> > > to >> > > > > the >> > > > > > cluster. >> > > > > > >> > > > > > Both nodes are running opensuse 11.1 x86_64 with the latest >> > > > > > updates >> > > and >> > > > > the >> > > > > > version that i said above. >> > > > > > >> > > > > > Any help is appreciated and please again forgive my lack of >> > > knowledge. >> > > > > > >> > > > > > Thank you in advance. >> > > > > > >> > > > > > Stratos. >> > > > > >> > > > > > _______________________________________________ >> > > > > > Pacemaker mailing list >> > > > > > Pacemaker@oss.clusterlabs.org >> > > > > > http://oss.clusterlabs.org/mailman/listinfo/pacemaker >> > > > > >> > > > > >> > > > > _______________________________________________ >> > > > > Pacemaker mailing list >> > > > > Pacemaker@oss.clusterlabs.org >> > > > > http://oss.clusterlabs.org/mailman/listinfo/pacemaker >> > > > > >> > > > >> > > > >> > > > Thank you for the immediate response. I know about the errors (I >> > > > have to >> > > > disable stonith on the config) but i cannot configure anything with >> > > > crm. >> > > > After commit i get something like "node did not respond" >> > > > >> > > > The problem is that there is no nodes as you can see after the >> > > > errors. >> > > > >> > > > I want to help to eliminate the problem, but i'm not a programmer. >> > > > So if >> > > you >> > > > can please guide me so i can execute hb_report and provide the >> > > > necessary >> > > > logs. When i have to execute hb_report and with what parametes? >> > > >> > > First check if you have core dumps: >> > > >> > > # ls -lR /var/lib/heartbeat/cores >> > > >> > > Then run >> > > >> > > # hb_report -f <time> -A -n "<nodes>" /tmp/problem-1 >> > > >> > > Replace <time> with whichever time you started cluster at (say >> > > 13:00). <nodes> with a space separated list of nodes. >> > > >> > > Thanks, >> > > >> > > Dejan >> > > >> > > > Again please forgive my luck of knowledge (it is my first time with >> > > > clusters). >> > > > >> > > > Thanks again. >> > > > >> > > > Stratos. >> > > >> > > > _______________________________________________ >> > > > Pacemaker mailing list >> > > > Pacemaker@oss.clusterlabs.org >> > > > http://oss.clusterlabs.org/mailman/listinfo/pacemaker >> > > >> > > >> > > _______________________________________________ >> > > Pacemaker mailing list >> > > Pacemaker@oss.clusterlabs.org >> > > http://oss.clusterlabs.org/mailman/listinfo/pacemaker >> > > >> > >> > I don't think that there are any core dumps. The three folders returned >> > from >> > the command are empty. >> > >> > alpha:~ # ls -IR /var/lib/heartbeat/cores/ >> > hacluster nobody root >> > alpha:~ # >> > >> > hb_report -f 15:27 -A -n "alpha bravo" -u root /root/problem-3 >> > >> > returns >> >> The magic is: >> >> # ulimit -c unlimited >> >> You should put it somewhere so that it is run on boot. For now, >> just run it before /etc/init.d/openais start. >> >> > Password: >> > alpha: WARN: could not find the log file on alpha >> > Password: /etc/ha.d/shellfuncs: line 211: maketempdir: command not found >> > alpha: WARN: sorry, can't create temoary file for find_files >> > /etc/ha.d/shellfuncs: line 211: maketempdir: command not found >> > alpha: WARN: sorry, can't create temoary file for find_files >> > /etc/ha.d/shellfuncs: line 211: maketempdir: command not found >> > /etc/ha.d/shellfuncs: line 211: maketempdir: command not found >> > alpha: ERROR: cannot create temporary files >> >> This looks funny. Can you please show the package versions? And >> where did the packages come from? >> >> Thanks, >> >> Dejan >> >> > I have attached the generated folder as zip file, but with a quick look, >> > i >> > don't think that has something useful. Maybe its better to guide me how >> > to >> > produce dump core files. >> > >> > I have also tried without the -u option >> > >> > Thanks >> > >> > Stratos >> > >> > >> > >> > -- >> > Kernel IT Solutions Ltd >> > http://www.kernelit.gr >> > >> > Cyclades Wireless Network >> > http://www.cywn.gr >> >> >> > _______________________________________________ >> > Pacemaker mailing list >> > Pacemaker@oss.clusterlabs.org >> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker >> >> >> _______________________________________________ >> Pacemaker mailing list >> Pacemaker@oss.clusterlabs.org >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > After i have reinstalled all the packages, i'm running for about half an > hour without segfault. > > crm_mon still reports: > ============ > Last updated: Mon Oct 12 19:02:43 2009 > Current DC: NONE > 0 Nodes configured, unknown expected votes > 0 Resources configured. > ============ > > and when i try to "commit" a configuration (through crm configure) i get a > "Remote node did not respond" > > What i have to to do to make the nodes appear? (at least until a segfault > occurs and we have a core dump) > > I'm attaching my /var/log/messages from the first node after the last run of > openais. > > > > > > _______________________________________________ > Pacemaker mailing list > Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > _______________________________________________ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker