On Wed, Jan 25, 2012 at 04:35:39PM +0100, Michal Vyoral wrote: > Hi Dejan, > > On Tue, Jan 24, 2012 at 11:52:20PM +0100, Dejan Muhamedagic wrote: > > Hi, > > > > On Tue, Jan 24, 2012 at 06:31:54PM +0100, Michal Vyoral wrote: > > > Hello, > > > we had a cluster of two nodes both running Debian 5.0, each with two > > > resources > > > IPaddr2 and apache managed by pacemaker 1.0.9.1. After an upgrading of > > > one node from Debian 5.0 to 6.0 we have a problem to start the > > > apache resource on the upgraded node. Here are the details: > > > > > > Versions of heartbeat and pacemaker before the upgrade: > > > pr-iso1:~# dpkg -l pacemaker heartbeat > > > Desired=Unknown/Install/Remove/Purge/Hold > > > | > > > Status=Not/Inst/Cfg-files/Unpacked/Failed-cfg/Half-inst/trig-aWait/Trig-pend > > > |/ Err?=(none)/Hold/Reinst-required/X=both-problems (Status,Err: > > > uppercase=bad) > > > ||/ Name Version Description > > > +++-==============-==============-============================================ > > > ii heartbeat 1:3.0.3-2~bpo5 Subsystem for High-Availability Linux > > > ii pacemaker 1.0.9.1+hg1562 HA cluster resource manager > > > > > > Versions of heartbeat and pacemaker after the upgrade: > > > pr-iso2:~# dpkg -l pacemaker heartbeat > > > Desired=Unknown/Install/Remove/Purge/Hold > > > | > > > Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend > > > |/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad) > > > ||/ Name Version Description > > > +++-==============-==============-============================================ > > > ii heartbeat 1:3.0.3-2 Subsystem for High-Availability Linux > > > ii pacemaker 1.0.9.1+hg1562 HA cluster resource manager > > > > > > Status of the resources on the upgraded node: > > > pr-iso2:~# crm_mon > > > ============ > > > Last updated: Tue Jan 24 10:14:12 2012 > > > Stack: Heartbeat > > > Current DC: pr-iso2 (511079a9-0f71-4537-bdf9-07714b454441) - partition > > > with quorum > > > Version: 1.0.9-74392a28b7f31d7ddc86689598bd23114f58978b > > > 2 Nodes configured, unknown expected votes > > > 2 Resources configured. > > > ============ > > > > > > Online: [ pr-iso2 ] > > > OFFLINE: [ pr-iso1 ] > > > > > > ClusterIP (ocf::heartbeat:IPaddr2): Started pr-iso2 > > > > > > Failed actions: > > > RTWeb_start_0 (node=pr-iso2, call=7, rc=1, status=complete): unknown > > > error > > > > > > Status of the resources on the non upgraded node: > > > pr-iso1:~# crm_mon > > > ============ > > > Last updated: Tue Jan 24 17:08:22 2012 > > > Stack: Heartbeat > > > Current DC: pr-iso1 (014268aa-f234-4789-b4a1-0053cf4e61b9) - partition > > > with quor > > > um > > > Version: 1.0.9-74392a28b7f31d7ddc86689598bd23114f58978b > > > 2 Nodes configured, unknown expected votes > > > 2 Resources configured. > > > ============ > > > > > > Online: [ pr-iso1 pr-iso2 ] > > > > > > ClusterIP (ocf::heartbeat:IPaddr2): Started pr-iso1 > > > RTWeb (ocf::heartbeat:apache): Started pr-iso1 > > > > > > Configuration of the resources: > > > pr-iso1:~# crm configure show > > > node $id="014268aa-f234-4789-b4a1-0053cf4e61b9" pr-iso1 > > > node $id="511079a9-0f71-4537-bdf9-07714b454441" pr-iso2 > > > primitive ClusterIP ocf:heartbeat:IPaddr2 \ > > > params ip="10.5.75.83" cidr_netmask="24" \ > > > op monitor interval="30s" > > > primitive RTWeb ocf:heartbeat:apache \ > > > params configfile="/etc/apache2/apache2.conf" \ > > > op monitor interval="1min" \ > > > meta target-role="Started" is-managed="true" > > > colocation website-with-ip inf: RTWeb ClusterIP > > > order rtweb_after_clustrip inf: ClusterIP RTWeb > > > property $id="cib-bootstrap-options" \ > > > dc-version="1.0.9-74392a28b7f31d7ddc86689598bd23114f58978b" \ > > > cluster-infrastructure="Heartbeat" \ > > > stonith-enabled="false" \ > > > last-lrm-refresh="1327399494" > > > rsc_defaults $id="rsc-options" \ > > > resource-stickiness="100" > > > > > > Records in the /var/log/ha-log related to RTWeb resource: > > > pr-iso2:~# grep RTWeb /var/log/ha-log > > > Jan 24 10:04:56 pr-iso2 crmd: [6130]: info: do_lrm_rsc_op: Performing > > > key=7:76:7:41cbad9d-9090-4aba-bd6a-bf171077c74b op=RTWeb_monitor_0 ) > > > Jan 24 10:04:56 pr-iso2 lrmd: [6127]: info: rsc:RTWeb:4: probe > > > Jan 24 10:04:56 pr-iso2 crmd: [6130]: info: process_lrm_event: LRM > > > operation RTWeb_monitor_0 (call=4, rc=7, cib-update=13, confirmed=true) > > > not running > > > Jan 24 10:12:48 pr-iso2 crmd: [6130]: info: do_lrm_rsc_op: Performing > > > key=11:77:0:41cbad9d-9090-4aba-bd6a-bf171077c74b op=RTWeb_start_0 ) > > > Jan 24 10:12:48 pr-iso2 lrmd: [6127]: info: rsc:RTWeb:7: start > > > > After this message there should be a bit more (look for "apache" > > or "lrmd"). Next resource agents are going to log the resource > > name too (RTWeb in this case). If you cannot find anything here, > > then the answer must be in the apache logs. > > > > Thanks, > > > > Dejan > > Yes, you are right: here are two more lines after the previous line: > > apache[9454]: 2012/01/24_10:12:49 INFO: apache not running > apache[9454]: 2012/01/24_10:12:49 INFO: waiting for apache > /etc/apache2/apache2.conf to come up
That's all? > There are no records in /var/log/apache2/error.log giving some clue, see: > > pr-iso2:/var/log/apache2# cat error.log > [Tue Jan 24 11:12:50 2012] [notice] Apache/2.2.16 (Debian) > PHP/5.3.3-7+squeeze3 with Suhosin-Patch mod_perl/2.0.4 Perl/v5.10.1 > configured -- resuming normal operations > [Tue Jan 24 11:13:08 2012] [notice] caught SIGTERM, shutting down > [Wed Jan 25 13:09:02 2012] [notice] Apache/2.2.16 (Debian) > PHP/5.3.3-7+squeeze3 with Suhosin-Patch mod_perl/2.0.4 Perl/v5.10.1 > configured -- resuming normal operations > [Wed Jan 25 13:09:21 2012] [notice] caught SIGTERM, shutting down > > See the interesting thing: our nodes shold use UTC time, but after the upgrade > we have noticed, that the time on the upgraded node is our local time (= UTC > + 1) > I have return the system time back to UTC, but Apache still uses the local > time in the log. > > We have tried to start the Apache on the upgraded node alone: > > 1. we have modified the file /etc/apache2/ports2.conf to > Apache listen on the physical address > 2. we have run the command '/etc/init.d/apache2 start' > 3. we have download an index.html page > > Here is the record in the error log: > > [Wed Jan 25 13:28:11 2012] [notice] Apache/2.2.16 (Debian) > PHP/5.3.3-7+squeeze3 with Suhosin-Patch mod_perl/2.0.4 Perl/v5.10.1 > configured -- resuming normal operations > [Wed Jan 25 13:28:27 2012] [warn] [client 10.5.77.29] incomplete redirection > target of '/rt/' for URI '/' modified to 'http://10.5.75.82/rt/' > > So, Apache alone could run. > > Before the upgrade we have made some minor changes to apache2.conf > on the active node, but not on the passive node. We have return > the changes back, but the resource is stil failed, see the tail from th ha-log > on the upgraded node: [...] > Jan 25 14:04:18 pr-iso2 pengine: [16392]: info: get_failcount: RTWeb has > failed INFINITY times on pr-iso2 You need to cleanup the resource: crm resource cleanup RTWeb Otherwise, I really cannot say what's wrong with your apache, but it's definitely resource specific. You can leave out the cluster and try to resolve the issue using ocf-tester. Also, it is necessary that the apache status module is enabled. Thanks, Dejan _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org