On Wed, Apr 13, 2011 at 12:23 AM, Stallmann, Andreas <[email protected]> wrote: > Hi! > > We've got a pretty straightforward and easy configuration: > > Corosync 1.2.1 / Pacemaker 2.0.0 on OpenSuSE 11.3 running DRBD (M/S), Ping > (clone), and a resource-group, containing a shared IP, tomcat and mysql > (where the datafiles of mysql reside on the DRBD). The cluster consists of > two virtual machines running on VMware ESXi 4. > > Since we moved the cluster to an other vmware esxi, strange things happen: > > While DRBD and the ping resource come up on both nodes, the resource group > "appl_grp" (see below) doesn't. No failures are shown in crm_mon and the > failcount is zero.
Is host_list="191.224.111.1 191.224.111.78 194.25.2.129" still valid? If no value is being set for pingd then I can imagine this would be the result. > > Output of crm_mon: > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > ============ > Last updated: Tue Apr 12 23:39:39 2011 > Stack: openais > Current DC: cms-appl02 - partition with quorum > Version: 1.1.2-8b9ec9ccc5060457ac761dce1de719af86895b10 > 2 Nodes configured, 2 expected votes > 3 Resources configured. > ============ > > Online: [ cms-appl01 cms-appl02 ] > > Master/Slave Set: ms_drbd_r0 > Masters: [ cms-appl01 ] > Slaves: [ cms-appl02 ] > Clone Set: pingy_clone > Started: [ cms-appl01 cms-appl02 ] > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > Normally, I'd at least saw the resource group as stoped, but now it doesn't > even turn up in the crm_mon display! > > The crm-Tool at least shows, that the resources still exist: > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > crm(live)# resource > crm(live)resource# show > Resource Group: appl_grp > fs_r0 (ocf::heartbeat:Filesystem) Stopped > sharedIP (ocf::heartbeat:IPaddr2) Stopped > tomcat_res (ocf::heartbeat:tomcat) Stopped > database_res (ocf::heartbeat:mysql) Stopped > Master/Slave Set: ms_drbd_r0 > Masters: [ cms-appl01 ] > Slaves: [ cms-appl02 ] > Clone Set: pingy_clone > Started: [ cms-appl01 cms-appl02 ] > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > And finally, here's our configuration: > > ~~~~~~~~~~~~~~output of "crm configure show"~~~~~~~~ > node cms-appl01 > node cms-appl02 > primitive database_res ocf:heartbeat:mysql \ > params binary="/usr/bin/mysqld_safe" config="/etc/my.cnf" > datadir="/drbd/mysql" user="mysql" > log="/var/log/mysql/mysqld.logpid=/var/run/mysql/mysqld.pid" > socket="/drbd/run/mysql/mysql.sock" \ > op start interval="0" timeout="120s" \ > op stop interval="0" timeout="120s" \ > op monitor interval="10s" timeout="30s" \ > op notify interval="0" timeout="90s" > primitive drbd_r0 ocf:linbit:drbd \ > params drbd_resource="r0" \ > op monitor interval="15s" \ > op start interval="0" timeout="240s" \ > op stop interval="0" timeout="100s" > primitive fs_r0 ocf:heartbeat:Filesystem \ > params device="/dev/drbd0" directory="/drbd" fstype="ext4" \ > op start interval="0" timeout="60s" \ > op stop interval="0" timeout="60s" > primitive pingy_res ocf:pacemaker:ping \ > params dampen="5s" multiplier="1000" host_list="191.224.111.1 > 191.224.111.78 194.25.2.129" \ > op monitor interval="60s" timeout="60s" \ > op start interval="0" timeout="60s" > primitive sharedIP ocf:heartbeat:IPaddr2 \ > params ip="191.224.111.50" cidr_netmask="255.255.255.0" nic="eth0:0" > primitive tomcat_res ocf:heartbeat:tomcat \ > params java_home="/etc/alternatives/jre" \ > params catalina_home="/usr/share/tomcat6" \ > op start interval="0" timeout="60s" \ > op stop interval="0" timeout="120s" \ > op monitor interval="10s" timeout="30s" > group appl_grp fs_r0 sharedIP tomcat_res database_res \ > meta target-role="Started" > ms ms_drbd_r0 drbd_r0 \ > meta master-max="1" master-node-max="1" clone-max="2" > clone-node-max="1" notify="true" > clone pingy_clone pingy_res > location appl_loc appl_grp 100: cms-appl01 > location only-if-connected appl_grp \ > rule $id="only-if-connected-rule" -inf: not_defined pingd or pingd lte > 2000 > colocation appl_grp-only-on-master inf: appl_grp ms_drbd_r0:Master > order appl_grp-after-drbd inf: ms_drbd_r0:promote appl_grp:start > order mysql-after-fs inf: fs_r0 database_res > property $id="cib-bootstrap-options" \ > stonith-enabled="false" \ > no-quorum-policy="ignore" \ > stonith-action="poweroff" \ > default-resource-stickiness="100" \ > dc-version="1.1.2-8b9ec9ccc5060457ac761dce1de719af86895b10" \ > cluster-infrastructure="openais" \ > expected-quorum-votes="2" \ > last-lrm-refresh="1302643565" > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > When I (re)activate the appl_grp, literarily nothing happens: > > crm(live)resource# start nag_grp > > No new entries in /var/log/messages, no visible changes in crm_mon. It is as > if the resource didn't exist. > > Any ideas? You'll find the logs below. > > Cheers and good night, > > Andreas > > I found only one error message in /var/log/messages: > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > Apr 12 23:56:11 cms-appl01 cib: [3888]: info: retrieveCib: Reading cluster > configuration from: /var/lib/heartbeat/crm/cib.dtzl4N (digest: > /var/lib/heartbeat/crm/cib.QPtzfE) > Apr 12 23:56:11 cms-appl01 pengine: [2662]: info: process_pe_message: > Transition 0: PEngine Input stored in: /var/lib/pengine/pe-input-2971.bz2 > Apr 12 23:56:11 cms-appl01 crmd: [2663]: info: match_graph_event: Action > tomcat_res_monitor_0 (13) confirmed on cms-appl02 (rc=0) > Apr 12 23:56:11 cms-appl01 crmd: [2663]: info: match_graph_event: Action > database_res_monitor_0 (14) confirmed on cms-appl02 (rc=0) > Apr 12 23:56:11 cms-appl01 crmd: [2663]: info: process_lrm_event: LRM > operation tomcat_res_monitor_0 (call=4, rc=7, cib-update=31, confirmed=true) > not running > Apr 12 23:56:11 cms-appl01 crmd: [2663]: info: match_graph_event: Action > tomcat_res_monitor_0 (6) confirmed on cms-appl01 (rc=0) > Apr 12 23:56:11 cms-appl01 crmd: [2663]: info: match_graph_event: Action > fs_r0_monitor_0 (11) confirmed on cms-appl02 (rc=0) > Apr 12 23:56:11 cms-appl01 crmd: [2663]: info: match_graph_event: Action > sharedIP_monitor_0 (12) confirmed on cms-appl02 (rc=0) > Apr 12 23:56:11 cms-appl01 Filesystem[3889]: [3917]: WARNING: Couldn't find > device [/dev/drbd0]. Expected /dev/??? to exist > Apr 12 23:56:11 cms-appl01 mysql[3892]: [3932]: ERROR: Datadir /drbd/mysql > doesn't exist > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > There are quite a lot of warnings: > > Apr 12 23:54:32 cms-appl01 cib: [2651]: WARN: send_ipc_message: IPC Channel > to 2655 is not connected > Apr 12 23:54:32 cms-appl01 cib: [2651]: WARN: send_via_callback_channel: > Delivery of reply to client 2655/d4c6501f-32cb-49a4-a800-17d5385d71cb failed > Apr 12 23:54:32 cms-appl01 cib: [2651]: WARN: do_local_notify: A-Sync reply > to crmd failed: reply failed > Apr 12 23:55:04 cms-appl01 rchal: boot with 'CPUFREQ=no' in to avoid this > warning. > Apr 12 23:55:15 cms-appl01 logd: [2326]: WARN: Core dumps could be lost if > multiple dumps occur. > Apr 12 23:55:15 cms-appl01 logd: [2326]: WARN: Consider setting non-default > value in /proc/sys/kernel/core_pattern (or equivalent) for maximum > supportability > Apr 12 23:55:15 cms-appl01 logd: [2326]: WARN: Consider setting > /proc/sys/kernel/core_uses_pid (or equivalent) to 1 for maximum supportability > Apr 12 23:55:18 cms-appl01 corosync[2650]: [pcmk ] WARN: route_ais_message: > Sending message to local.crmd failed: ipc delivery failed (rc=-2) > Apr 12 23:55:18 cms-appl01 corosync[2650]: [pcmk ] WARN: route_ais_message: > Sending message to local.crmd failed: ipc delivery failed (rc=-2) > Apr 12 23:55:18 cms-appl01 mgmtd: [2664]: WARN: Core dumps could be lost if > multiple dumps occur. > Apr 12 23:55:18 cms-appl01 mgmtd: [2664]: WARN: Consider setting non-default > value in /proc/sys/kernel/core_pattern (or equivalent) for maximum > supportability > Apr 12 23:55:18 cms-appl01 mgmtd: [2664]: WARN: Consider setting > /proc/sys/kernel/core_uses_pid (or equivalent) to 1 for maximum supportability > Apr 12 23:55:18 cms-appl01 mgmtd: [2664]: WARN: lrm_signon: can not initiate > connection > Apr 12 23:55:18 cms-appl01 lrmd: [2660]: WARN: Core dumps could be lost if > multiple dumps occur. > Apr 12 23:55:18 cms-appl01 lrmd: [2660]: WARN: Consider setting non-default > value in /proc/sys/kernel/core_pattern (or equivalent) for maximum > supportability > Apr 12 23:55:18 cms-appl01 lrmd: [2660]: WARN: Consider setting > /proc/sys/kernel/core_uses_pid (or equivalent) to 1 for maximum supportability > Apr 12 23:56:11 cms-appl01 crmd: [2663]: WARN: > cib_client_add_notify_callback: Callback already present > Apr 12 23:56:11 cms-appl01 Filesystem[3889]: [3917]: WARNING: Couldn't find > device [/dev/drbd0]. Expected /dev/??? to exist > Apr 12 23:56:11 cms-appl01 lrmd: [2660]: info: RA output: > (sharedIP:probe:stderr) Converted dotted-quad netmask to CIDR as: > 24#012eth0:0: warning: name may be invalid > > > > -- > CONET Solutions GmbH > Andreas Stallmann, > Theodor-Heuss-Allee 19, 53773 Hennef > Tel.: +49 2242 939-677, Fax: +49 2242 939-393 > Mobil: +49 172 2455051 > Internet: http://www.conet.de, mailto: > [email protected]<mailto:[email protected]> > > ------------------------ > CONET Solutions GmbH, Theodor-Heuss-Allee 19, 53773 Hennef. > Registergericht/Registration Court: Amtsgericht Siegburg (HRB Nr. 9136) > Gesch?ftsf?hrer/Managing Directors: J?rgen Zender (Sprecher/Chairman), Anke > H?fer > _______________________________________________ > Linux-HA mailing list > [email protected] > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems > _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
