Hello everyone,

My goal is to build a Round Robin balanced, HA Apache Web server cluster. The main purpose is to balance HTTP requests evenly between the nodes and have one machine pickup all requests if and ONLY if the others are not available at the moment. The cluster will be accessible only from internal network. Any advise on
this will be highly appreciated (resources to use, services to install and
configure etc.). After walking through ClusterLabs documentation, I think the
proper deployment is an active/active Pacemaker managed cluster.

I'm trying to follow the "Cluster from scratch" article in order to build a 2
node cluster on an experimental setup:

2 GNU/Linux Debian Unstable (sid) Virtual Machines (Kernel 3.0.0-1-686-pae,
Apache/2.2.21 (Debian)) on same LAN network.

node-0 IP: 192.168.0.101
node-1 IP: 192.168.0.102
Desired Cluster Virtual IP: 192.168.0.100

The two nodes are setup to communicate with proper SSH keys and it works
flawlessly. Also they can communicate with short names:

root@node-0:~# ssh node-1 -- hostname
node-1

root@node-1:~# ssh node-0 -- hostname
node-0

My problem is that although I've reached the part where you have the ClusterIP
resource setup properly, the Apache resource does not get started in either
node. The logs do not have a message explaining the failure in detail, even with debug messages enabled. All related messages report unknown errors while trying to start the service and after a while the cluster manager gives up. From the messages it seems like the manager is getting unexpected exit codes from the Apache resource. The server-status URL is accessible from 127.0.0.1 in both nodes.

root@node-0:~# crm_mon -1
============
Last updated: Fri Sep 30 14:04:55 2011
Stack: openais
Current DC: node-1 - partition with quorum
Version: 1.1.5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f
2 Nodes configured, 2 expected votes
2 Resources configured.
============

Online: [ node-1 node-0 ]

 ClusterIP    (ocf::heartbeat:IPaddr2):    Started node-1

Failed actions:
Apache2_monitor_0 (node=node-0, call=3, rc=1, status=complete): unknown error Apache2_start_0 (node=node-0, call=5, rc=1, status=complete): unknown error Apache2_monitor_0 (node=node-1, call=8, rc=1, status=complete): unknown error Apache2_start_0 (node=node-1, call=10, rc=1, status=complete): unknown error

Let's checkout the logs for this resource:

root@node-0:~# grep ERROR.*Apache2 /var/log/corosync/corosync.log
(Nothing)

root@node-0:~# grep WARN.*Apache2 /var/log/corosync/corosync.log
Sep 30 14:04:23 node-0 lrmd: [2555]: WARN: Managed Apache2:monitor process 2802 exited with return code 1. Sep 30 14:04:30 node-0 lrmd: [2555]: WARN: Managed Apache2:start process 2942 exited with return code 1.

root@node-1:~# grep ERROR.*Apache2 /var/log/corosync/corosync.log
Sep 30 14:04:23 node-1 pengine: [1676]: ERROR: native_create_actions: Resource Apache2 (ocf::apache) is active on 2 nodes attempting recovery

root@node-1:~# grep WARN.*Apache2 /var/log/corosync/corosync.log
Sep 30 14:04:23 node-1 lrmd: [1674]: WARN: Managed Apache2:monitor process 3006 exited with return code 1. Sep 30 14:04:23 node-1 crmd: [1677]: WARN: status_from_rc: Action 5 (Apache2_monitor_0) on node-1 failed (target: 7 vs. rc: 1): Error Sep 30 14:04:23 node-1 crmd: [1677]: WARN: status_from_rc: Action 7 (Apache2_monitor_0) on node-0 failed (target: 7 vs. rc: 1): Error Sep 30 14:04:23 node-1 pengine: [1676]: WARN: unpack_rsc_op: Processing failed op Apache2_monitor_0 on node-0: unknown error (1) Sep 30 14:04:23 node-1 pengine: [1676]: WARN: unpack_rsc_op: Processing failed op Apache2_monitor_0 on node-1: unknown error (1) Sep 30 14:04:30 node-1 crmd: [1677]: WARN: status_from_rc: Action 10 (Apache2_start_0) on node-0 failed (target: 0 vs. rc: 1): Error Sep 30 14:04:30 node-1 crmd: [1677]: WARN: update_failcount: Updating failcount for Apache2 on node-0 after failed start: rc=1 (update=INFINITY, time=1317380670) Sep 30 14:04:31 node-1 pengine: [1676]: WARN: unpack_rsc_op: Processing failed op Apache2_monitor_0 on node-0: unknown error (1) Sep 30 14:04:31 node-1 pengine: [1676]: WARN: unpack_rsc_op: Processing failed op Apache2_start_0 on node-0: unknown error (1) Sep 30 14:04:31 node-1 pengine: [1676]: WARN: unpack_rsc_op: Processing failed op Apache2_monitor_0 on node-1: unknown error (1) Sep 30 14:04:31 node-1 pengine: [1676]: WARN: common_apply_stickiness: Forcing Apache2 away from node-0 after 1000000 failures (max=1000000) Sep 30 14:04:31 node-1 pengine: [1676]: WARN: unpack_rsc_op: Processing failed op Apache2_monitor_0 on node-0: unknown error (1) Sep 30 14:04:31 node-1 pengine: [1676]: WARN: unpack_rsc_op: Processing failed op Apache2_start_0 on node-0: unknown error (1) Sep 30 14:04:31 node-1 pengine: [1676]: WARN: unpack_rsc_op: Processing failed op Apache2_monitor_0 on node-1: unknown error (1) Sep 30 14:04:31 node-1 pengine: [1676]: WARN: common_apply_stickiness: Forcing Apache2 away from node-0 after 1000000 failures (max=1000000) Sep 30 14:04:36 node-1 lrmd: [1674]: WARN: Managed Apache2:start process 3146 exited with return code 1. Sep 30 14:04:36 node-1 crmd: [1677]: WARN: status_from_rc: Action 9 (Apache2_start_0) on node-1 failed (target: 0 vs. rc: 1): Error Sep 30 14:04:36 node-1 crmd: [1677]: WARN: update_failcount: Updating failcount for Apache2 on node-1 after failed start: rc=1 (update=INFINITY, time=1317380676) Sep 30 14:04:37 node-1 pengine: [1676]: WARN: unpack_rsc_op: Processing failed op Apache2_monitor_0 on node-0: unknown error (1) Sep 30 14:04:37 node-1 pengine: [1676]: WARN: unpack_rsc_op: Processing failed op Apache2_start_0 on node-0: unknown error (1) Sep 30 14:04:37 node-1 pengine: [1676]: WARN: unpack_rsc_op: Processing failed op Apache2_monitor_0 on node-1: unknown error (1) Sep 30 14:04:37 node-1 pengine: [1676]: WARN: unpack_rsc_op: Processing failed op Apache2_start_0 on node-1: unknown error (1) Sep 30 14:04:37 node-1 pengine: [1676]: WARN: common_apply_stickiness: Forcing Apache2 away from node-1 after 1000000 failures (max=1000000) Sep 30 14:04:37 node-1 pengine: [1676]: WARN: common_apply_stickiness: Forcing Apache2 away from node-0 after 1000000 failures (max=1000000) Sep 30 14:04:37 node-1 pengine: [1676]: WARN: unpack_rsc_op: Processing failed op Apache2_monitor_0 on node-0: unknown error (1) Sep 30 14:04:37 node-1 pengine: [1676]: WARN: unpack_rsc_op: Processing failed op Apache2_start_0 on node-0: unknown error (1) Sep 30 14:04:37 node-1 pengine: [1676]: WARN: unpack_rsc_op: Processing failed op Apache2_monitor_0 on node-1: unknown error (1) Sep 30 14:04:37 node-1 pengine: [1676]: WARN: unpack_rsc_op: Processing failed op Apache2_start_0 on node-1: unknown error (1) Sep 30 14:04:37 node-1 pengine: [1676]: WARN: common_apply_stickiness: Forcing Apache2 away from node-1 after 1000000 failures (max=1000000) Sep 30 14:04:37 node-1 pengine: [1676]: WARN: common_apply_stickiness: Forcing Apache2 away from node-0 after 1000000 failures (max=1000000) Sep 30 14:13:38 node-1 pengine: [1676]: WARN: unpack_rsc_op: Processing failed op Apache2_monitor_0 on node-0: unknown error (1) Sep 30 14:13:38 node-1 pengine: [1676]: WARN: unpack_rsc_op: Processing failed op Apache2_start_0 on node-0: unknown error (1) Sep 30 14:13:38 node-1 pengine: [1676]: WARN: unpack_rsc_op: Processing failed op Apache2_monitor_0 on node-1: unknown error (1) Sep 30 14:13:38 node-1 pengine: [1676]: WARN: unpack_rsc_op: Processing failed op Apache2_start_0 on node-1: unknown error (1) Sep 30 14:13:38 node-1 pengine: [1676]: WARN: common_apply_stickiness: Forcing Apache2 away from node-1 after 1000000 failures (max=1000000) Sep 30 14:13:38 node-1 pengine: [1676]: WARN: common_apply_stickiness: Forcing Apache2 away from node-0 after 1000000 failures (max=1000000) Sep 30 14:13:52 node-1 pengine: [1676]: WARN: unpack_rsc_op: Processing failed op Apache2_monitor_0 on node-1: unknown error (1) Sep 30 14:13:52 node-1 pengine: [1676]: WARN: unpack_rsc_op: Processing failed op Apache2_start_0 on node-1: unknown error (1) Sep 30 14:13:52 node-1 pengine: [1676]: WARN: common_apply_stickiness: Forcing Apache2 away from node-1 after 1000000 failures (max=1000000) Sep 30 14:13:52 node-1 pengine: [1676]: WARN: common_apply_stickiness: Forcing Apache2 away from node-0 after 1000000 failures (max=1000000)

Any suggestions?

File /etc/corosync/corosync.conf (Only changes here , see attached for full file)

# Please read the openais.conf.5 manual page

totem {

... (Default)

     interface {
        # The following values need to be set based on your environment
        ringnumber: 0
        bindnetaddr: 192.168.0.0
        mcastaddr: 226.94.1.1
        mcastport: 5405
    }
}

... (Default)

service {
     # Load the Pacemaker Cluster Resource Manager
     ver:       1
     name:      pacemaker
}

... (Default)

logging {
        fileline: off
        to_stderr: no
        to_logfile: yes
        logfile: /var/log/corosync/corosync.log
        to_syslog: no
        syslog_facility: daemon
        debug: on
        timestamp: on
        logger_subsys {
                subsys: AMF
                debug: off
                tags: enter|leave|trace1|trace2|trace3|trace4|trace6
        }
}

--
Koutsokeras Miltiadis M.Sc.
Software Engineer
Biovista Inc.

US Offices
2421 Ivy Road
Charlottesville, VA 22903
USA
T: +1.434.971.1141
F: +1.434.971.1144

European Offices
34 Rodopoleos Street
Ellinikon, Athens 16777
GREECE
T: +30.210.9629848
F: +30.210.9647606

www.biovista.com

Biovista is a privately held biotechnology company that finds novel uses for 
existing drugs, and profiles their side effects using their mechanism of 
action. Biovista develops its own pipeline of drugs in CNS, oncology, 
auto-immune and rare diseases. Biovista is collaborating with biopharmaceutical 
companies on indication expansion and de-risking of their portfolios and with 
the FDA on adverse event prediction.


# Please read the openais.conf.5 manual page

totem {
        version: 2

        # How long before declaring a token lost (ms)
        token: 3000

        # How many token retransmits before forming a new configuration
        token_retransmits_before_loss_const: 10

        # How long to wait for join messages in the membership protocol (ms)
        join: 60

        # How long to wait for consensus to be achieved before starting a new 
round of membership configuration (ms)
        consensus: 3600

        # Turn off the virtual synchrony filter
        vsftype: none

        # Number of messages that may be sent by one processor on receipt of 
the token
        max_messages: 20

        # Limit generated nodeids to 31-bits (positive signed integers)
        clear_node_high_bit: yes

        # Disable encryption
        secauth: off

        # How many threads to use for encryption/decryption
        threads: 0

        # Optionally assign a fixed node id (integer)
        # nodeid: 1234

        # This specifies the mode of redundant ring, which may be none, active, 
or passive.
        rrp_mode: none

        interface {
                # The following values need to be set based on your environment 
                ringnumber: 0
                bindnetaddr: 192.168.0.0
                mcastaddr: 226.94.1.1
                mcastport: 5405
        }
}

amf {
        mode: disabled
}

service {
        # Load the Pacemaker Cluster Resource Manager
        ver:       1
        name:      pacemaker
}

aisexec {
        user:   root
        group:  root
}

logging {
        fileline: off
        to_stderr: no
        to_logfile: yes
        logfile: /var/log/corosync/corosync.log
        to_syslog: no
        syslog_facility: daemon
        debug: on
        timestamp: on
        logger_subsys {
                subsys: AMF
                debug: off
                tags: enter|leave|trace1|trace2|trace3|trace4|trace6
        }
}
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
<html><head>
<title>Apache Status</title>
</head><body>
<h1>Apache Server Status for localhost</h1>

<dl><dt>Server Version: Apache/2.2.21 (Debian)</dt>
<dt>Server Built: Sep 26 2011 16:32:28
</dt></dl><hr /><dl>
<dt>Current Time: Friday, 30-Sep-2011 13:59:51 EEST</dt>
<dt>Restart Time: Friday, 30-Sep-2011 12:41:35 EEST</dt>
<dt>Parent Server Generation: 0</dt>
<dt>Server uptime:  1 hour 18 minutes 16 seconds</dt>
<dt>Total accesses: 0 - Total Traffic: 0 kB</dt>
<dt>CPU Usage: u0 s0 cu0 cs0<dt>0 requests/sec - 0 B/second - </dt>
<dt>1 requests currently being processed, 49 idle workers</dt>
</dl><pre>W________________________.......................................
_________________________.......................................
................................................................
................................................................
................................................................
................................................................
................................................................
................................................................
................................................................
................................................................
................................................................
................................................................
................................................................
................................................................
................................................................
................................................................
</pre>
<p>Scoreboard Key:<br />
"<b><code>_</code></b>" Waiting for Connection, 
"<b><code>S</code></b>" Starting up, 
"<b><code>R</code></b>" Reading Request,<br />
"<b><code>W</code></b>" Sending Reply, 
"<b><code>K</code></b>" Keepalive (read), 
"<b><code>D</code></b>" DNS Lookup,<br />
"<b><code>C</code></b>" Closing connection, 
"<b><code>L</code></b>" Logging, 
"<b><code>G</code></b>" Gracefully finishing,<br /> 
"<b><code>I</code></b>" Idle cleanup of worker, 
"<b><code>.</code></b>" Open slot with no current process</p>
<p />


<table border="0"><tr><th>Srv</th><th>PID</th><th>Acc</th><th>M</th><th>CPU
</th><th>SS</th><th>Req</th><th>Conn</th><th>Child</th><th>Slot</th><th>Client</th><th>VHost</th><th>Request</th></tr>

<tr><td><b>0-0</b></td><td>945</td><td>0/0/0</td><td><b>W</b>
</td><td>0.00</td><td>0</td><td>1174568539</td><td>0.0</td><td>0.00</td><td>0.00
</td><td>127.0.0.1</td><td nowrap>node-0.biovista.com</td><td nowrap>GET 
/server-status HTTP/1.1</td></tr>

</table>
 <hr /> <table>
 <tr><th>Srv</th><td>Child Server number - generation</td></tr>
 <tr><th>PID</th><td>OS process ID</td></tr>
 <tr><th>Acc</th><td>Number of accesses this connection / this child / this 
slot</td></tr>
 <tr><th>M</th><td>Mode of operation</td></tr>
<tr><th>CPU</th><td>CPU usage, number of seconds</td></tr>
<tr><th>SS</th><td>Seconds since beginning of most recent request</td></tr>
 <tr><th>Req</th><td>Milliseconds required to process most recent 
request</td></tr>
 <tr><th>Conn</th><td>Kilobytes transferred this connection</td></tr>
 <tr><th>Child</th><td>Megabytes transferred this child</td></tr>
 <tr><th>Slot</th><td>Total megabytes transferred this slot</td></tr>
 </table>
<hr />
<address>Apache/2.2.21 (Debian) Server at localhost Port 80</address>
</body></html>
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
<html><head>
<title>Apache Status</title>
</head><body>
<h1>Apache Server Status for localhost</h1>

<dl><dt>Server Version: Apache/2.2.21 (Debian)</dt>
<dt>Server Built: Sep 26 2011 16:32:28
</dt></dl><hr /><dl>
<dt>Current Time: Friday, 30-Sep-2011 13:59:00 EEST</dt>
<dt>Restart Time: Friday, 30-Sep-2011 12:41:31 EEST</dt>
<dt>Parent Server Generation: 0</dt>
<dt>Server uptime:  1 hour 17 minutes 28 seconds</dt>
<dt>Total accesses: 1 - Total Traffic: 1 kB</dt>
<dt>CPU Usage: u0 s.04 cu0 cs0 - .000861% CPU load</dt>
<dt>.000215 requests/sec - 0 B/second - 1024 B/request</dt>
<dt>1 requests currently being processed, 49 idle workers</dt>
</dl><pre>_________________________.......................................
W________________________.......................................
................................................................
................................................................
................................................................
................................................................
................................................................
................................................................
................................................................
................................................................
................................................................
................................................................
................................................................
................................................................
................................................................
................................................................
</pre>
<p>Scoreboard Key:<br />
"<b><code>_</code></b>" Waiting for Connection, 
"<b><code>S</code></b>" Starting up, 
"<b><code>R</code></b>" Reading Request,<br />
"<b><code>W</code></b>" Sending Reply, 
"<b><code>K</code></b>" Keepalive (read), 
"<b><code>D</code></b>" DNS Lookup,<br />
"<b><code>C</code></b>" Closing connection, 
"<b><code>L</code></b>" Logging, 
"<b><code>G</code></b>" Gracefully finishing,<br /> 
"<b><code>I</code></b>" Idle cleanup of worker, 
"<b><code>.</code></b>" Open slot with no current process</p>
<p />


<table border="0"><tr><th>Srv</th><th>PID</th><th>Acc</th><th>M</th><th>CPU
</th><th>SS</th><th>Req</th><th>Conn</th><th>Child</th><th>Slot</th><th>Client</th><th>VHost</th><th>Request</th></tr>

<tr><td><b>0-0</b></td><td>944</td><td>0/1/1</td><td>_
</td><td>0.04</td><td>3776</td><td>33</td><td>0.0</td><td>0.00</td><td>0.00
</td><td>127.0.0.1</td><td nowrap>node-1.biovista.com</td><td nowrap>GET 
/server-status HTTP/1.0</td></tr>

<tr><td><b>1-0</b></td><td>945</td><td>0/0/0</td><td><b>W</b>
</td><td>0.00</td><td>0</td><td>1174619200</td><td>0.0</td><td>0.00</td><td>0.00
</td><td>127.0.0.1</td><td nowrap>node-1.biovista.com</td><td nowrap>GET 
/server-status HTTP/1.1</td></tr>

</table>
 <hr /> <table>
 <tr><th>Srv</th><td>Child Server number - generation</td></tr>
 <tr><th>PID</th><td>OS process ID</td></tr>
 <tr><th>Acc</th><td>Number of accesses this connection / this child / this 
slot</td></tr>
 <tr><th>M</th><td>Mode of operation</td></tr>
<tr><th>CPU</th><td>CPU usage, number of seconds</td></tr>
<tr><th>SS</th><td>Seconds since beginning of most recent request</td></tr>
 <tr><th>Req</th><td>Milliseconds required to process most recent 
request</td></tr>
 <tr><th>Conn</th><td>Kilobytes transferred this connection</td></tr>
 <tr><th>Child</th><td>Megabytes transferred this child</td></tr>
 <tr><th>Slot</th><td>Total megabytes transferred this slot</td></tr>
 </table>
<hr />
<address>Apache/2.2.21 (Debian) Server at localhost Port 80</address>
</body></html>
_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

Reply via email to