Another problem has appeared:
after the reboot of one server I often have a cluster partition and
both servers elect themselves DC.
Even if the partition doesn't appear just after the reboot of one
server (i.e. serverA), if I try to restart corosync on the other
server (i.e. serverB), the partition appear.
Then if I also restart corosync on the first server (serverA)
everything work fine again.
But if I restart corosync on the second server (serverB) nothing
change and the partition appears again.
It's seems to me that there is still something wrong with the first
run of corosync just after the server reboot.
I didn't configure any fencing method, because I think that my
configuration is really simple and I don't need it.
Thanks again for your patience,
Giovanni
On Nov 17, 2009, at 12:07 PM, Giovanni Di Milia wrote:
Disabling syslog the problem disappears.
Thank you very much,
Giovanni
On Nov 16, 2009, at 4:51 PM, hj lee wrote:
Hi,
Please disable syslog in openais.conf, and try it again. It seems
this issue is related to fork() call and syslog().
hj
On Fri, Nov 13, 2009 at 1:08 PM, Giovanni Di Milia <gdimi...@cfa.harvard.edu
> wrote:
Thank you very much for your response.
The only thing I really don't understand is: why this problem
doesn't appear in all my simulations?
I configured at least 7 couple of virtual servers with vmware 2 and
CentOS 5.3 and 5.4 (32 and 64 bits) and I never had this kind of
problems!
The only difference in the configuration is that I used private IPs
for the simulations and public IPs for the real servers, but I
don't think it is important.
Thanks for your patience,
Giovanni
On Nov 13, 2009, at 1:36 PM, hj lee wrote:
Hi,
I have the same problem in CentOS 5.3 with pacemaker-1.0.5 and
openais-0.80.5. This is openais bug! Two problems.
1. Starting openais service gets seg fault sometime. It more
likely happens if openais service get started before syslog.
2. The seg fault handler of openais calls syslog(). The syslog is
one of UNSAFE function that must not be called from signal handler
because it is non-reentrent function.
To fix this issue: get the openais source, find sigsegv_handler
function exec/main.c and just comment out log_flush(), shown
below. Then recompile and isntall it(make and make install). The
log_flush should be removed from all signal handlers in openais
code base. I am still not sure where seg fault occurs, but
commenting out log_flush prevents seg fault.
-------------------------------------------------------------------------
static void sigsegv_handler (int num)
{
signal (SIGSEGV, SIG_DFL);
// log_flush ();
raise (SIGSEGV);
}
Thanks
hj
On Thu, Nov 12, 2009 at 4:21 PM, Giovanni Di Milia <gdimi...@cfa.harvard.edu
> wrote:
I set up a cluster of two servers CentOS 5.4 x86_64 with pacemaker
1.06 and corosync 1.1.2
I only installed the x86_64 packages (yum install pacemaker try to
install also the 32 bits one).
I configured a shared cluster IP (it's a public ip) and a cluster
website.
Everything work fine if i try to stop corosync on one of the two
servers (the services pass from one machine to the other without
problems), but if I reboot one server, when it returns alive it
cannot go online in the cluster.
I also noticed that there are several thread of corosync and if I
kill all of them and then I start again corosync, everything work
fine again.
I don't know what is happening and I'm not able to reproduce the
same situation on some virtual servers!
Thanks,
Giovanni
the configuration of corosync is the following:
##############################################
# Please read the corosync.conf.5 manual page
compatibility: whitetank
aisexec {
# Run as root - this is necessary to be able to manage
resources with Pacemaker
user: root
group: root
}
service {
# Load the Pacemaker Cluster Resource Manager
ver: 0
name: pacemaker
use_mgmtd: yes
use_logd: yes
}
totem {
version: 2
# How long before declaring a token lost (ms)
token: 5000
# How many token retransmits before forming a new
configuration
token_retransmits_before_loss_const: 10
# How long to wait for join messages in the membership
protocol (ms)
join: 1000
# How long to wait for consensus to be achieved before
starting a new round of membership configuration (ms)
consensus: 2500
# Turn off the virtual synchrony filter
vsftype: none
# Number of messages that may be sent by one processor on
receipt of the token
max_messages: 20
# Stagger sending the node join messages by 1..send_join ms
send_join: 45
# Limit generated nodeids to 31-bits (positive signed
integers)
clear_node_high_bit: yes
# Disable encryption
secauth: off
# How many threads to use for encryption/decryption
threads: 0
# Optionally assign a fixed node id (integer)
# nodeid: 1234
interface {
ringnumber: 0
# The following values need to be set based on your
environment
bindnetaddr: XXX.XXX.XXX.0 #here I put the right ip for my
configuration
mcastaddr: 226.94.1.1
mcastport: 4000
}
}
logging {
fileline: off
to_stderr: yes
to_logfile: yes
to_syslog: yes
logfile: /tmp/corosync.log
debug: off
timestamp: on
logger_subsys {
subsys: AMF
debug: off
}
}
amf {
mode: disabled
}
##################################################
_______________________________________________
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
--
Dream with longterm vision!
kerdosa
_______________________________________________
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
_______________________________________________
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
--
Dream with longterm vision!
kerdosa
_______________________________________________
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
_______________________________________________
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
_______________________________________________
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker