What about /dev/shm ? Libqb tries to create some shared memory in that location by default.
On Thu, Mar 28, 2013 at 8:50 AM, John White <jwh...@lbl.gov> wrote: > Yup: > -bash-4.1$ cd /var/run/crm/ > -bash-4.1$ ls > lost+found pcmk pengine st_callback st_command > -bash-4.1$ touch blah > -bash-4.1$ ls -l > total 16 > -rw-r--r-- 1 hacluster haclient 0 Mar 27 14:50 blah > drwx------ 2 root root 16384 Mar 14 15:00 lost+found > srwxrwxrwx 1 root root 0 Mar 22 11:25 pcmk > srwxrwxrwx 1 hacluster root 0 Mar 22 11:25 pengine > srwxrwxrwx 1 root root 0 Mar 22 11:25 st_callback > srwxrwxrwx 1 root root 0 Mar 22 11:25 st_command > -bash-4.1$ ls -l /var/run/| grep crm > drwxr-xr-x 3 hacluster haclient 4096 Mar 27 14:50 crm > -bash-4.1$ whoami > hacluster > -bash-4.1$ > ---------------- > John White > HPC Systems Engineer > (510) 486-7307 > One Cyclotron Rd, MS: 50C-3209C > Lawrence Berkeley National Lab > Berkeley, CA 94720 > > On Mar 25, 2013, at 4:21 PM, Andreas Kurz <andr...@hastexo.com> wrote: > >> On 2013-03-22 19:31, John White wrote: >>> Hello Folks, >>> We're trying to get a corosync/pacemaker instance going on a 4 node >>> cluster that boots via pxe. There have been a number of state/file system >>> issues, but those appear to be *mostly* taken care of thus far. We're >>> running into an issue now where cib just isn't staying up with errors akin >>> to the following (sorry for the lengthy dump, note the attrd and cib >>> connection errors). Any ideas would be greatly appreciated: >>> >>> Mar 22 11:25:18 n0014 cib: [25839]: info: validate_with_relaxng: Creating >>> RNG parser context >>> Mar 22 11:25:18 n0014 attrd: [25841]: info: Invoked: >>> /usr/lib64/heartbeat/attrd >>> Mar 22 11:25:18 n0014 attrd: [25841]: info: crm_log_init_worker: Changed >>> active directory to /var/lib/heartbeat/cores/hacluster >>> Mar 22 11:25:18 n0014 attrd: [25841]: info: main: Starting up >>> Mar 22 11:25:18 n0014 attrd: [25841]: info: get_cluster_type: Cluster type >>> is: 'corosync' >>> Mar 22 11:25:18 n0014 attrd: [25841]: notice: crm_cluster_connect: >>> Connecting to cluster infrastructure: corosync >>> Mar 22 11:25:18 n0014 attrd: [25841]: ERROR: init_cpg_connection: Could not >>> connect to the Cluster Process Group API: 2 >>> Mar 22 11:25:18 n0014 attrd: [25841]: ERROR: main: HA Signon failed >>> Mar 22 11:25:18 n0014 attrd: [25841]: info: main: Cluster connection active >>> Mar 22 11:25:18 n0014 attrd: [25841]: info: main: Accepting attribute >>> updates >>> Mar 22 11:25:18 n0014 attrd: [25841]: ERROR: main: Aborting startup >>> Mar 22 11:25:18 n0014 pengine: [25842]: info: Invoked: >>> /usr/lib64/heartbeat/pengine >>> Mar 22 11:25:18 n0014 pengine: [25842]: info: crm_log_init_worker: Changed >>> active directory to /var/lib/heartbeat/cores/hacluster >>> Mar 22 11:25:18 n0014 pengine: [25842]: debug: main: Checking for old >>> instances of pengine >>> Mar 22 11:25:18 n0014 pengine: [25842]: debug: >>> init_client_ipc_comms_nodispatch: Attempting to talk on: >>> /var/run/crm/pengine >> >> That "/var/run/crm" directory is available and owned by >> hacluster.haclient ... and writable by at least the hacluster user? >> >> Regards, >> Andreas >> >> -- >> Need help with Pacemaker? >> http://www.hastexo.com/now >> >>> Mar 22 11:25:18 n0014 pacemakerd: [25834]: ERROR: pcmk_child_exit: Child >>> process attrd exited (pid=25841, rc=100) >>> Mar 22 11:25:18 n0014 pacemakerd: [25834]: notice: pcmk_child_exit: Child >>> process attrd no longer wishes to be respawned >>> Mar 22 11:25:18 n0014 pacemakerd: [25834]: info: update_node_processes: >>> Node n0014.lustre now has process list: 00000000000000000000000000110312 >>> (was 00000000000000000000000000111312) >>> Mar 22 11:25:18 n0014 pengine: [25842]: debug: >>> init_client_ipc_comms_nodispatch: Could not init comms on: >>> /var/run/crm/pengine >>> Mar 22 11:25:18 n0014 pengine: [25842]: debug: main: Init server comms >>> Mar 22 11:25:18 n0014 pengine: [25842]: info: main: Starting pengine >>> Mar 22 11:25:18 n0014 stonith-ng: [25838]: debug: init_cpg_connection: >>> Adding fd=4 to mainloop >>> Mar 22 11:25:18 n0014 stonith-ng: [25838]: info: init_ais_connection_once: >>> Connection to 'corosync': established >>> Mar 22 11:25:18 n0014 stonith-ng: [25838]: debug: crm_new_peer: Creating >>> entry for node n0014.lustre/247988234 >>> Mar 22 11:25:18 n0014 stonith-ng: [25838]: info: crm_new_peer: Node >>> n0014.lustre now has id: 247988234 >>> Mar 22 11:25:18 n0014 stonith-ng: [25838]: info: crm_new_peer: Node >>> 247988234 is now known as n0014.lustre >>> Mar 22 11:25:18 n0014 stonith-ng: [25838]: debug: >>> init_client_ipc_comms_nodispatch: Attempting to talk on: /var/run/crm/pcmk >>> Mar 22 11:25:18 n0014 crmd: [25843]: info: Invoked: >>> /usr/lib64/heartbeat/crmd >>> Mar 22 11:25:18 n0014 pacemakerd: [25834]: debug: pcmk_client_connect: >>> Channel 0x995530 connected: 1 children >>> Mar 22 11:25:18 n0014 stonith-ng: [25838]: info: main: Starting stonith-ng >>> mainloop >>> Mar 22 11:25:18 n0014 crmd: [25843]: info: crm_log_init_worker: Changed >>> active directory to /var/lib/heartbeat/cores/hacluster >>> Mar 22 11:25:18 n0014 crmd: [25843]: info: main: CRM Hg Version: >>> a02c0f19a00c1eb2527ad38f146ebc0834814558 >>> Mar 22 11:25:18 n0014 crmd: [25843]: info: crmd_init: Starting crmd >>> Mar 22 11:25:18 n0014 crmd: [25843]: debug: s_crmd_fsa: Processing >>> I_STARTUP: [ state=S_STARTING cause=C_STARTUP origin=crmd_init ] >>> Mar 22 11:25:18 n0014 crmd: [25843]: debug: do_fsa_action: actions:trace: >>> #011// A_LOG >>> Mar 22 11:25:18 n0014 crmd: [25843]: debug: do_fsa_action: actions:trace: >>> #011// A_STARTUP >>> Mar 22 11:25:18 n0014 crmd: [25843]: debug: do_startup: Registering Signal >>> Handlers >>> Mar 22 11:25:18 n0014 crmd: [25843]: debug: do_startup: Creating CIB and >>> LRM objects >>> Mar 22 11:25:18 n0014 stonith-ng: [25838]: info: crm_update_peer: Node >>> n0014.lustre: id=247988234 state=unknown addr=(null) votes=0 born=0 seen=0 >>> proc=00000000000000000000000000110312 (new) >>> Mar 22 11:25:18 n0014 crmd: [25843]: info: G_main_add_SignalHandler: Added >>> signal handler for signal 17 >>> Mar 22 11:25:18 n0014 crmd: [25843]: debug: do_fsa_action: actions:trace: >>> #011// A_CIB_START >>> Mar 22 11:25:18 n0014 crmd: [25843]: debug: >>> init_client_ipc_comms_nodispatch: Attempting to talk on: /var/run/crm/cib_rw >>> Mar 22 11:25:18 n0014 crmd: [25843]: debug: >>> init_client_ipc_comms_nodispatch: Could not init comms on: >>> /var/run/crm/cib_rw >>> Mar 22 11:25:18 n0014 crmd: [25843]: debug: cib_native_signon_raw: >>> Connection to command channel failed >>> Mar 22 11:25:18 n0014 crmd: [25843]: debug: >>> init_client_ipc_comms_nodispatch: Attempting to talk on: >>> /var/run/crm/cib_callback >>> Mar 22 11:25:18 n0014 crmd: [25843]: debug: >>> init_client_ipc_comms_nodispatch: Could not init comms on: >>> /var/run/crm/cib_callback >>> Mar 22 11:25:18 n0014 crmd: [25843]: debug: cib_native_signon_raw: >>> Connection to callback channel failed >>> Mar 22 11:25:18 n0014 crmd: [25843]: debug: cib_native_signon_raw: >>> Connection to CIB failed: connection failed >>> Mar 22 11:25:18 n0014 crmd: [25843]: debug: cib_native_signoff: Signing out >>> of the CIB Service >>> Mar 22 11:25:18 n0014 cib: [25839]: ERROR: Element cib failed to validate >>> content >>> Mar 22 11:25:18 n0014 cib: [25839]: ERROR: readCibXmlFile: CIB does not >>> validate with <null> >>> Mar 22 11:25:18 n0014 cib: [25839]: info: startCib: CIB Initialization >>> completed successfully >>> Mar 22 11:25:18 n0014 cib: [25839]: info: get_cluster_type: Cluster type >>> is: 'corosync' >>> Mar 22 11:25:18 n0014 cib: [25839]: notice: crm_cluster_connect: Connecting >>> to cluster infrastructure: corosync >>> Mar 22 11:25:18 n0014 cib: [25839]: ERROR: init_cpg_connection: Could not >>> connect to the Cluster Process Group API: 2 >>> Mar 22 11:25:18 n0014 cib: [25839]: CRIT: cib_init: Cannot sign in to the >>> cluster... terminating >>> >>> >>> ---------------- >>> John White >>> HPC Systems Engineer >>> (510) 486-7307 >>> One Cyclotron Rd, MS: 50C-3209C >>> Lawrence Berkeley National Lab >>> Berkeley, CA 94720 >>> >>> >>> _______________________________________________ >>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >>> >>> Project Home: http://www.clusterlabs.org >>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>> Bugs: http://bugs.clusterlabs.org >>> >> >> >> >> >> _______________________________________________ >> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >> >> Project Home: http://www.clusterlabs.org >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: http://bugs.clusterlabs.org > > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org