Hi, On Thu, May 28, 2009 at 05:19:34PM +0000, Alain St-Denis wrote: > > Andrew Beekhof wrote: > > You might want to check out Martin's packages. > > If I understood correctly, he's built the version of clvm used by SUSE > > (which we know works) against 0.80.5 > > > > Look for his email with the subject "lvm2-clvm RPMs in opensuse.org > > package repo?" > > Thanks! > > I installed Martin's packages. Here's what I have: > > pacemaker-openais 1.0.3+svn20090522-2~bpo50+1 > clvm-openais 2.02.44-4~bpo50+1 > libopenais-legacy-2 0.80.5+svn20090522-2~bpo50+1 > openais-legacy 0.80.5+svn20090522-2~bpo50+1 > heartbeat-common 2.99.2+sles11r9-3~bpo50+1 > libheartbeat2 2.99.2+sles11r9-3~bpo50+1 > > Now, soon after I start clvmd, aisexec dies with a segv (in > openais_conn_private_data_get). On my 3 nodes test cluster, I start openais > on all nodes, then I start clvmd on one of the nodes. Not long after, aisexec > dies on the other nodes. Here are the last messages logged by aisexec: > > May 28 16:19:04.924914 [TOTEM] entering GATHER state from 11. > May 28 16:19:05.079052 [TOTEM] Saving state aru 20 high seq received 20 > May 28 16:19:05.079094 [TOTEM] Storing new sequence id for ring 298 > May 28 16:19:05.079155 [TOTEM] entering COMMIT state. > May 28 16:19:05.079500 [TOTEM] entering RECOVERY state. > May 28 16:19:05.079558 [TOTEM] position [0] member 142.135.16.107: > May 28 16:19:05.079571 [TOTEM] previous ring seq 660 rep 142.135.16.107 > May 28 16:19:05.079578 [TOTEM] aru a high delivered a received flag 1 > May 28 16:19:05.079587 [TOTEM] position [1] member 142.135.16.109: > May 28 16:19:05.079594 [TOTEM] previous ring seq 660 rep 142.135.16.109 > May 28 16:19:05.079612 [TOTEM] aru 20 high delivered 20 received flag 1 > May 28 16:19:05.079627 [TOTEM] Did not need to originate any messages in > recovery. > May 28 16:19:05.080669 [CLM ] CLM CONFIGURATION CHANGE > May 28 16:19:05.080711 [CLM ] New Configuration: > May 28 16:19:05.080724 [CLM ] r(0) ip(142.135.16.109) > May 28 16:19:05.080733 [CLM ] Members Left: > May 28 16:19:05.080774 [CLM ] Members Joined: > May 28 16:19:05.080790 [crm ] notice: pcmk_peer_update: Transitional > membership event on ring 664: memb=1, new=0, lost=0 > May 28 16:19:05.080805 [crm ] info: pcmk_peer_update: memb: lab09 1829799822 > May 28 16:19:05.080843 [CLM ] CLM CONFIGURATION CHANGE > May 28 16:19:05.080855 [CLM ] New Configuration: > May 28 16:19:05.080865 [CLM ] r(0) ip(142.135.16.107) > May 28 16:19:05.080901 [CLM ] r(0) ip(142.135.16.109) > May 28 16:19:05.080914 [CLM ] Members Left: > May 28 16:19:05.080923 [CLM ] Members Joined: > May 28 16:19:05.080938 [CLM ] r(0) ip(142.135.16.107) > May 28 16:19:05.080972 [crm ] notice: pcmk_peer_update: Stable membership > event on ring 664: memb=2, new=1, lost=0 > May 28 16:19:05.080985 [MAIN ] info: update_member: Node 1796245390/lab07 is > now: member > May 28 16:19:05.081001 [crm ] info: pcmk_peer_update: NEW: lab07 1796245390 > May 28 16:19:05.081036 [crm ] info: pcmk_peer_update: MEMB: lab07 1796245390 > May 28 16:19:05.081044 [crm ] info: pcmk_peer_update: MEMB: lab09 1829799822 > May 28 16:19:05.081063 [crm ] info: send_member_notification: Sending > membership update 664 to 2 children > May 28 16:19:05.081118 [SYNC ] This node is within the primary component and > will provide service. > May 28 16:19:05.081144 [TOTEM] entering OPERATIONAL state. > May 28 16:19:05.082382 [MAIN ] info: update_member: 0x7f1188002510 Node > 1796245390 (lab07) born on: 664 > May 28 16:19:05.082416 [crm ] info: send_member_notification: Sending > membership update 664 to 2 children > May 28 16:19:05.082757 [CLM ] got nodejoin message 142.135.16.107 > May 28 16:19:05.082832 [CLM ] got nodejoin message 142.135.16.109 > May 28 16:19:05.087292 [CPG ] got joinlist message from node 1829799822 > > Then it crashes. Martin (or anybody), have you seen this? I attached my > openais.conf file. Maybe I'm doing something stupid in there?
You should file a bugzilla for openais. Please use hb_report, it is going to get all the relevant stuff including the stack traces (I hope that there was a core dumped). Thanks, Dejan > Alain > > -- > Alain St-Denis > Supercomputing, Systems and Storage / Superinformatique, syst?mes et stockage, > High Performance Computing Support / Soutien aux calculs en haute performance > Chief Information Officer Branch / Direction G?n?rale du dirigeant principal > de l'information > Environment Canada / Environnement Canada > Tel: +1 514 421 4697 > # Please read the openais.conf.5 manual page > > aisexec { > # Run as root - this is necessary to be able to manage resources with > Pacemaker > user: root > group: root > } > > service { > # Load the Pacemaker Cluster Resource Manager > name: pacemaker > ver: 0 > } > > totem { > version: 2 > > # How long before declaring a token lost (ms) > token: 10000 > > # How many token retransmits before forming a new configuration > token_retransmits_before_loss_const: 20 > > # How long to wait for join messages in the membership protocol (ms) > join: 60 > > # How long to wait for consensus to be achieved before starting a new > round of membership configuration (ms) > consensus: 4800 > > # Turn off the virtual synchrony filter > vsftype: none > > # Number of messages that may be sent by one processor on receipt of > the token > max_messages: 20 > > # Limit generated nodeids to 31-bits (positive signed integers) > clear_node_high_bit: yes > > # Disable encryption > secauth: off > > # How many threads to use for encryption/decryption > threads: 0 > > # Optionally assign a fixed node id (integer) > # nodeid: 1234 > > interface { > ringnumber: 0 > > # The following values need to be set based on your environment > bindnetaddr: 142.135.16.0 > mcastaddr: 226.94.1.1 > mcastport: 5405 > } > } > > logging { > debug: on > fileline: off > to_syslog: yes > to_stderr: yes > syslog_facility: daemon > timestamp: on > } > > amf { > mode: disabled > } > _______________________________________________ > Pacemaker mailing list > Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker _______________________________________________ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker