Yup: -bash-4.1$ cd /var/run/crm/ -bash-4.1$ ls lost+found pcmk pengine st_callback st_command -bash-4.1$ touch blah -bash-4.1$ ls -l total 16 -rw-r--r-- 1 hacluster haclient 0 Mar 27 14:50 blah drwx------ 2 root root 16384 Mar 14 15:00 lost+found srwxrwxrwx 1 root root 0 Mar 22 11:25 pcmk srwxrwxrwx 1 hacluster root 0 Mar 22 11:25 pengine srwxrwxrwx 1 root root 0 Mar 22 11:25 st_callback srwxrwxrwx 1 root root 0 Mar 22 11:25 st_command -bash-4.1$ ls -l /var/run/| grep crm drwxr-xr-x 3 hacluster haclient 4096 Mar 27 14:50 crm -bash-4.1$ whoami hacluster -bash-4.1$ ---------------- John White HPC Systems Engineer (510) 486-7307 One Cyclotron Rd, MS: 50C-3209C Lawrence Berkeley National Lab Berkeley, CA 94720
On Mar 25, 2013, at 4:21 PM, Andreas Kurz <andr...@hastexo.com> wrote: > On 2013-03-22 19:31, John White wrote: >> Hello Folks, >> We're trying to get a corosync/pacemaker instance going on a 4 node >> cluster that boots via pxe. There have been a number of state/file system >> issues, but those appear to be *mostly* taken care of thus far. We're >> running into an issue now where cib just isn't staying up with errors akin >> to the following (sorry for the lengthy dump, note the attrd and cib >> connection errors). Any ideas would be greatly appreciated: >> >> Mar 22 11:25:18 n0014 cib: [25839]: info: validate_with_relaxng: Creating >> RNG parser context >> Mar 22 11:25:18 n0014 attrd: [25841]: info: Invoked: >> /usr/lib64/heartbeat/attrd >> Mar 22 11:25:18 n0014 attrd: [25841]: info: crm_log_init_worker: Changed >> active directory to /var/lib/heartbeat/cores/hacluster >> Mar 22 11:25:18 n0014 attrd: [25841]: info: main: Starting up >> Mar 22 11:25:18 n0014 attrd: [25841]: info: get_cluster_type: Cluster type >> is: 'corosync' >> Mar 22 11:25:18 n0014 attrd: [25841]: notice: crm_cluster_connect: >> Connecting to cluster infrastructure: corosync >> Mar 22 11:25:18 n0014 attrd: [25841]: ERROR: init_cpg_connection: Could not >> connect to the Cluster Process Group API: 2 >> Mar 22 11:25:18 n0014 attrd: [25841]: ERROR: main: HA Signon failed >> Mar 22 11:25:18 n0014 attrd: [25841]: info: main: Cluster connection active >> Mar 22 11:25:18 n0014 attrd: [25841]: info: main: Accepting attribute updates >> Mar 22 11:25:18 n0014 attrd: [25841]: ERROR: main: Aborting startup >> Mar 22 11:25:18 n0014 pengine: [25842]: info: Invoked: >> /usr/lib64/heartbeat/pengine >> Mar 22 11:25:18 n0014 pengine: [25842]: info: crm_log_init_worker: Changed >> active directory to /var/lib/heartbeat/cores/hacluster >> Mar 22 11:25:18 n0014 pengine: [25842]: debug: main: Checking for old >> instances of pengine >> Mar 22 11:25:18 n0014 pengine: [25842]: debug: >> init_client_ipc_comms_nodispatch: Attempting to talk on: /var/run/crm/pengine > > That "/var/run/crm" directory is available and owned by > hacluster.haclient ... and writable by at least the hacluster user? > > Regards, > Andreas > > -- > Need help with Pacemaker? > http://www.hastexo.com/now > >> Mar 22 11:25:18 n0014 pacemakerd: [25834]: ERROR: pcmk_child_exit: Child >> process attrd exited (pid=25841, rc=100) >> Mar 22 11:25:18 n0014 pacemakerd: [25834]: notice: pcmk_child_exit: Child >> process attrd no longer wishes to be respawned >> Mar 22 11:25:18 n0014 pacemakerd: [25834]: info: update_node_processes: Node >> n0014.lustre now has process list: 00000000000000000000000000110312 (was >> 00000000000000000000000000111312) >> Mar 22 11:25:18 n0014 pengine: [25842]: debug: >> init_client_ipc_comms_nodispatch: Could not init comms on: >> /var/run/crm/pengine >> Mar 22 11:25:18 n0014 pengine: [25842]: debug: main: Init server comms >> Mar 22 11:25:18 n0014 pengine: [25842]: info: main: Starting pengine >> Mar 22 11:25:18 n0014 stonith-ng: [25838]: debug: init_cpg_connection: >> Adding fd=4 to mainloop >> Mar 22 11:25:18 n0014 stonith-ng: [25838]: info: init_ais_connection_once: >> Connection to 'corosync': established >> Mar 22 11:25:18 n0014 stonith-ng: [25838]: debug: crm_new_peer: Creating >> entry for node n0014.lustre/247988234 >> Mar 22 11:25:18 n0014 stonith-ng: [25838]: info: crm_new_peer: Node >> n0014.lustre now has id: 247988234 >> Mar 22 11:25:18 n0014 stonith-ng: [25838]: info: crm_new_peer: Node >> 247988234 is now known as n0014.lustre >> Mar 22 11:25:18 n0014 stonith-ng: [25838]: debug: >> init_client_ipc_comms_nodispatch: Attempting to talk on: /var/run/crm/pcmk >> Mar 22 11:25:18 n0014 crmd: [25843]: info: Invoked: >> /usr/lib64/heartbeat/crmd >> Mar 22 11:25:18 n0014 pacemakerd: [25834]: debug: pcmk_client_connect: >> Channel 0x995530 connected: 1 children >> Mar 22 11:25:18 n0014 stonith-ng: [25838]: info: main: Starting stonith-ng >> mainloop >> Mar 22 11:25:18 n0014 crmd: [25843]: info: crm_log_init_worker: Changed >> active directory to /var/lib/heartbeat/cores/hacluster >> Mar 22 11:25:18 n0014 crmd: [25843]: info: main: CRM Hg Version: >> a02c0f19a00c1eb2527ad38f146ebc0834814558 >> Mar 22 11:25:18 n0014 crmd: [25843]: info: crmd_init: Starting crmd >> Mar 22 11:25:18 n0014 crmd: [25843]: debug: s_crmd_fsa: Processing >> I_STARTUP: [ state=S_STARTING cause=C_STARTUP origin=crmd_init ] >> Mar 22 11:25:18 n0014 crmd: [25843]: debug: do_fsa_action: actions:trace: >> #011// A_LOG >> Mar 22 11:25:18 n0014 crmd: [25843]: debug: do_fsa_action: actions:trace: >> #011// A_STARTUP >> Mar 22 11:25:18 n0014 crmd: [25843]: debug: do_startup: Registering Signal >> Handlers >> Mar 22 11:25:18 n0014 crmd: [25843]: debug: do_startup: Creating CIB and LRM >> objects >> Mar 22 11:25:18 n0014 stonith-ng: [25838]: info: crm_update_peer: Node >> n0014.lustre: id=247988234 state=unknown addr=(null) votes=0 born=0 seen=0 >> proc=00000000000000000000000000110312 (new) >> Mar 22 11:25:18 n0014 crmd: [25843]: info: G_main_add_SignalHandler: Added >> signal handler for signal 17 >> Mar 22 11:25:18 n0014 crmd: [25843]: debug: do_fsa_action: actions:trace: >> #011// A_CIB_START >> Mar 22 11:25:18 n0014 crmd: [25843]: debug: >> init_client_ipc_comms_nodispatch: Attempting to talk on: /var/run/crm/cib_rw >> Mar 22 11:25:18 n0014 crmd: [25843]: debug: >> init_client_ipc_comms_nodispatch: Could not init comms on: >> /var/run/crm/cib_rw >> Mar 22 11:25:18 n0014 crmd: [25843]: debug: cib_native_signon_raw: >> Connection to command channel failed >> Mar 22 11:25:18 n0014 crmd: [25843]: debug: >> init_client_ipc_comms_nodispatch: Attempting to talk on: >> /var/run/crm/cib_callback >> Mar 22 11:25:18 n0014 crmd: [25843]: debug: >> init_client_ipc_comms_nodispatch: Could not init comms on: >> /var/run/crm/cib_callback >> Mar 22 11:25:18 n0014 crmd: [25843]: debug: cib_native_signon_raw: >> Connection to callback channel failed >> Mar 22 11:25:18 n0014 crmd: [25843]: debug: cib_native_signon_raw: >> Connection to CIB failed: connection failed >> Mar 22 11:25:18 n0014 crmd: [25843]: debug: cib_native_signoff: Signing out >> of the CIB Service >> Mar 22 11:25:18 n0014 cib: [25839]: ERROR: Element cib failed to validate >> content >> Mar 22 11:25:18 n0014 cib: [25839]: ERROR: readCibXmlFile: CIB does not >> validate with <null> >> Mar 22 11:25:18 n0014 cib: [25839]: info: startCib: CIB Initialization >> completed successfully >> Mar 22 11:25:18 n0014 cib: [25839]: info: get_cluster_type: Cluster type is: >> 'corosync' >> Mar 22 11:25:18 n0014 cib: [25839]: notice: crm_cluster_connect: Connecting >> to cluster infrastructure: corosync >> Mar 22 11:25:18 n0014 cib: [25839]: ERROR: init_cpg_connection: Could not >> connect to the Cluster Process Group API: 2 >> Mar 22 11:25:18 n0014 cib: [25839]: CRIT: cib_init: Cannot sign in to the >> cluster... terminating >> >> >> ---------------- >> John White >> HPC Systems Engineer >> (510) 486-7307 >> One Cyclotron Rd, MS: 50C-3209C >> Lawrence Berkeley National Lab >> Berkeley, CA 94720 >> >> >> _______________________________________________ >> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >> >> Project Home: http://www.clusterlabs.org >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: http://bugs.clusterlabs.org >> > > > > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org