Another problem has appeared:
after the reboot of one server I often have a cluster partition and both servers elect themselves DC.

Even if the partition doesn't appear just after the reboot of one server (i.e. serverA), if I try to restart corosync on the other server (i.e. serverB), the partition appear. Then if I also restart corosync on the first server (serverA) everything work fine again. But if I restart corosync on the second server (serverB) nothing change and the partition appears again.

It's seems to me that there is still something wrong with the first run of corosync just after the server reboot.

I didn't configure any fencing method, because I think that my configuration is really simple and I don't need it.

Thanks again for your patience,
Giovanni


On Nov 17, 2009, at 12:07 PM, Giovanni Di Milia wrote:

Disabling syslog the problem disappears.

Thank you very much,
Giovanni



On Nov 16, 2009, at 4:51 PM, hj lee wrote:

Hi,

Please disable syslog in openais.conf, and try it again. It seems this issue is related to fork() call and syslog().

hj

On Fri, Nov 13, 2009 at 1:08 PM, Giovanni Di Milia <gdimi...@cfa.harvard.edu > wrote:
Thank you very much for your response.

The only thing I really don't understand is: why this problem doesn't appear in all my simulations? I configured at least 7 couple of virtual servers with vmware 2 and CentOS 5.3 and 5.4 (32 and 64 bits) and I never had this kind of problems!

The only difference in the configuration is that I used private IPs for the simulations and public IPs for the real servers, but I don't think it is important.

Thanks for your patience,
Giovanni



On Nov 13, 2009, at 1:36 PM, hj lee wrote:

Hi,

I have the same problem in CentOS 5.3 with pacemaker-1.0.5 and openais-0.80.5. This is openais bug! Two problems. 1. Starting openais service gets seg fault sometime. It more likely happens if openais service get started before syslog. 2. The seg fault handler of openais calls syslog(). The syslog is one of UNSAFE function that must not be called from signal handler because it is non-reentrent function.

To fix this issue: get the openais source, find sigsegv_handler function exec/main.c and just comment out log_flush(), shown below. Then recompile and isntall it(make and make install). The log_flush should be removed from all signal handlers in openais code base. I am still not sure where seg fault occurs, but commenting out log_flush prevents seg fault.


-------------------------------------------------------------------------
static void sigsegv_handler (int num)
{
        signal (SIGSEGV, SIG_DFL);
//      log_flush ();
        raise (SIGSEGV);
}

Thanks
hj

On Thu, Nov 12, 2009 at 4:21 PM, Giovanni Di Milia <gdimi...@cfa.harvard.edu > wrote: I set up a cluster of two servers CentOS 5.4 x86_64 with pacemaker 1.06 and corosync 1.1.2

I only installed the x86_64 packages (yum install pacemaker try to install also the 32 bits one).

I configured a shared cluster IP (it's a public ip) and a cluster website.

Everything work fine if i try to stop corosync on one of the two servers (the services pass from one machine to the other without problems), but if I reboot one server, when it returns alive it cannot go online in the cluster. I also noticed that there are several thread of corosync and if I kill all of them and then I start again corosync, everything work fine again.

I don't know what is happening and I'm not able to reproduce the same situation on some virtual servers!

Thanks,
Giovanni



the configuration of corosync is the following:

##############################################
# Please read the corosync.conf.5 manual page
compatibility: whitetank

aisexec {
# Run as root - this is necessary to be able to manage resources with Pacemaker
       user:   root
       group:  root
}

service {
       # Load the Pacemaker Cluster Resource Manager
       ver:       0
       name:      pacemaker
       use_mgmtd: yes
       use_logd:  yes
}

totem {
       version: 2

       # How long before declaring a token lost (ms)
       token:          5000

# How many token retransmits before forming a new configuration
       token_retransmits_before_loss_const: 10

# How long to wait for join messages in the membership protocol (ms)
       join:           1000

# How long to wait for consensus to be achieved before starting a new round of membership configuration (ms)
       consensus:      2500

       # Turn off the virtual synchrony filter
       vsftype:        none

# Number of messages that may be sent by one processor on receipt of the token
       max_messages:   20

       # Stagger sending the node join messages by 1..send_join ms
       send_join: 45

# Limit generated nodeids to 31-bits (positive signed integers)
       clear_node_high_bit: yes

       # Disable encryption
       secauth:        off

       # How many threads to use for encryption/decryption
       threads:        0

       # Optionally assign a fixed node id (integer)
       # nodeid:         1234

       interface {
               ringnumber: 0

# The following values need to be set based on your environment bindnetaddr: XXX.XXX.XXX.0 #here I put the right ip for my configuration
mcastaddr: 226.94.1.1
mcastport: 4000
       }
}

logging {
       fileline: off
       to_stderr: yes
       to_logfile: yes
       to_syslog: yes
       logfile: /tmp/corosync.log
       debug: off
       timestamp: on
       logger_subsys {
               subsys: AMF
               debug: off
       }
}

amf {
       mode: disabled
}

##################################################



_______________________________________________
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker



--
Dream with longterm vision!
kerdosa
_______________________________________________
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


_______________________________________________
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker




--
Dream with longterm vision!
kerdosa
_______________________________________________
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

_______________________________________________
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

_______________________________________________
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Reply via email to