Re: [Pacemaker] Multiple thread after rebooting server: the node doesn't go online

Giovanni Di Milia Tue, 17 Nov 2009 13:33:08 -0800

Another problem has appeared:

after the reboot of one server I often have a cluster partition andboth servers elect themselves DC.

Even if the partition doesn't appear just after the reboot of oneserver (i.e. serverA), if I try to restart corosync on the otherserver (i.e. serverB), the partition appear.Then if I also restart corosync on the first server (serverA)everything work fine again.But if I restart corosync on the second server (serverB) nothingchange and the partition appears again.

It's seems to me that there is still something wrong with the firstrun of corosync just after the server reboot.

I didn't configure any fencing method, because I think that myconfiguration is really simple and I don't need it.


Thanks again for your patience,
Giovanni


On Nov 17, 2009, at 12:07 PM, Giovanni Di Milia wrote:

Disabling syslog the problem disappears.

Thank you very much,
Giovanni



On Nov 16, 2009, at 4:51 PM, hj lee wrote:
Hi,
Please disable syslog in openais.conf, and try it again. It seemsthis issue is related to fork() call and syslog().
hj
On Fri, Nov 13, 2009 at 1:08 PM, Giovanni Di Milia <gdimi...@cfa.harvard.edu> wrote:
Thank you very much for your response.
The only thing I really don't understand is: why this problemdoesn't appear in all my simulations?I configured at least 7 couple of virtual servers with vmware 2 andCentOS 5.3 and 5.4 (32 and 64 bits) and I never had this kind ofproblems!
The only difference in the configuration is that I used private IPsfor the simulations and public IPs for the real servers, but Idon't think it is important.
Thanks for your patience,
Giovanni



On Nov 13, 2009, at 1:36 PM, hj lee wrote:
Hi,
I have the same problem in CentOS 5.3 with pacemaker-1.0.5 andopenais-0.80.5. This is openais bug! Two problems.1. Starting openais service gets seg fault sometime. It morelikely happens if openais service get started before syslog.2. The seg fault handler of openais calls syslog(). The syslog isone of UNSAFE function that must not be called from signal handlerbecause it is non-reentrent function.
To fix this issue: get the openais source, find sigsegv_handlerfunction exec/main.c and just comment out log_flush(), shownbelow. Then recompile and isntall it(make and make install). Thelog_flush should be removed from all signal handlers in openaiscode base. I am still not sure where seg fault occurs, butcommenting out log_flush prevents seg fault.
-------------------------------------------------------------------------
static void sigsegv_handler (int num)
{
        signal (SIGSEGV, SIG_DFL);
//      log_flush ();
        raise (SIGSEGV);
}

Thanks
hj
On Thu, Nov 12, 2009 at 4:21 PM, Giovanni Di Milia <gdimi...@cfa.harvard.edu> wrote:I set up a cluster of two servers CentOS 5.4 x86_64 with pacemaker1.06 and corosync 1.1.2
I only installed the x86_64 packages (yum install pacemaker try toinstall also the 32 bits one).
I configured a shared cluster IP (it's a public ip) and a clusterwebsite.
Everything work fine if i try to stop corosync on one of the twoservers (the services pass from one machine to the other withoutproblems), but if I reboot one server, when it returns alive itcannot go online in the cluster.I also noticed that there are several thread of corosync and if Ikill all of them and then I start again corosync, everything workfine again.
I don't know what is happening and I'm not able to reproduce thesame situation on some virtual servers!
Thanks,
Giovanni



the configuration of corosync is the following:

##############################################
# Please read the corosync.conf.5 manual page
compatibility: whitetank

aisexec {
# Run as root - this is necessary to be able to manageresources with Pacemaker
       user:   root
       group:  root
}

service {
       # Load the Pacemaker Cluster Resource Manager
       ver:       0
       name:      pacemaker
       use_mgmtd: yes
       use_logd:  yes
}

totem {
       version: 2

       # How long before declaring a token lost (ms)
       token:          5000
# How many token retransmits before forming a newconfiguration
       token_retransmits_before_loss_const: 10
# How long to wait for join messages in the membershipprotocol (ms)
       join:           1000
# How long to wait for consensus to be achieved beforestarting a new round of membership configuration (ms)
       consensus:      2500

       # Turn off the virtual synchrony filter
       vsftype:        none
# Number of messages that may be sent by one processor onreceipt of the token
       max_messages:   20

       # Stagger sending the node join messages by 1..send_join ms
       send_join: 45
# Limit generated nodeids to 31-bits (positive signedintegers)
       clear_node_high_bit: yes

       # Disable encryption
       secauth:        off

       # How many threads to use for encryption/decryption
       threads:        0

       # Optionally assign a fixed node id (integer)
       # nodeid:         1234

       interface {
               ringnumber: 0
# The following values need to be set based on yourenvironmentbindnetaddr: XXX.XXX.XXX.0 #here I put the right ip for myconfiguration
mcastaddr: 226.94.1.1
mcastport: 4000
       }
}

logging {
       fileline: off
       to_stderr: yes
       to_logfile: yes
       to_syslog: yes
       logfile: /tmp/corosync.log
       debug: off
       timestamp: on
       logger_subsys {
               subsys: AMF
               debug: off
       }
}

amf {
       mode: disabled
}

##################################################



_______________________________________________
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker



--
Dream with longterm vision!
kerdosa
_______________________________________________
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
_______________________________________________
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker




--
Dream with longterm vision!
kerdosa
_______________________________________________
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
_______________________________________________
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

_______________________________________________
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Re: [Pacemaker] Multiple thread after rebooting server: the node doesn't go online

Reply via email to