Yup:
-bash-4.1$ cd /var/run/crm/
-bash-4.1$ ls
lost+found  pcmk  pengine  st_callback  st_command
-bash-4.1$ touch blah
-bash-4.1$ ls -l
total 16
-rw-r--r-- 1 hacluster haclient     0 Mar 27 14:50 blah
drwx------ 2 root      root     16384 Mar 14 15:00 lost+found
srwxrwxrwx 1 root      root         0 Mar 22 11:25 pcmk
srwxrwxrwx 1 hacluster root         0 Mar 22 11:25 pengine
srwxrwxrwx 1 root      root         0 Mar 22 11:25 st_callback
srwxrwxrwx 1 root      root         0 Mar 22 11:25 st_command
-bash-4.1$ ls -l /var/run/| grep crm
drwxr-xr-x 3 hacluster haclient 4096 Mar 27 14:50 crm
-bash-4.1$ whoami
hacluster
-bash-4.1$ 
----------------
John White
HPC Systems Engineer
(510) 486-7307
One Cyclotron Rd, MS: 50C-3209C
Lawrence Berkeley National Lab
Berkeley, CA 94720

On Mar 25, 2013, at 4:21 PM, Andreas Kurz <andr...@hastexo.com> wrote:

> On 2013-03-22 19:31, John White wrote:
>> Hello Folks,
>>      We're trying to get a corosync/pacemaker instance going on a 4 node 
>> cluster that boots via pxe.  There have been a number of state/file system 
>> issues, but those appear to be *mostly* taken care of thus far.  We're 
>> running into an issue now where cib just isn't staying up with errors akin 
>> to the following (sorry for the lengthy dump, note the attrd and cib 
>> connection errors).  Any ideas would be greatly appreciated: 
>> 
>> Mar 22 11:25:18 n0014 cib: [25839]: info: validate_with_relaxng: Creating 
>> RNG parser context
>> Mar 22 11:25:18 n0014 attrd: [25841]: info: Invoked: 
>> /usr/lib64/heartbeat/attrd 
>> Mar 22 11:25:18 n0014 attrd: [25841]: info: crm_log_init_worker: Changed 
>> active directory to /var/lib/heartbeat/cores/hacluster
>> Mar 22 11:25:18 n0014 attrd: [25841]: info: main: Starting up
>> Mar 22 11:25:18 n0014 attrd: [25841]: info: get_cluster_type: Cluster type 
>> is: 'corosync'
>> Mar 22 11:25:18 n0014 attrd: [25841]: notice: crm_cluster_connect: 
>> Connecting to cluster infrastructure: corosync
>> Mar 22 11:25:18 n0014 attrd: [25841]: ERROR: init_cpg_connection: Could not 
>> connect to the Cluster Process Group API: 2
>> Mar 22 11:25:18 n0014 attrd: [25841]: ERROR: main: HA Signon failed
>> Mar 22 11:25:18 n0014 attrd: [25841]: info: main: Cluster connection active
>> Mar 22 11:25:18 n0014 attrd: [25841]: info: main: Accepting attribute updates
>> Mar 22 11:25:18 n0014 attrd: [25841]: ERROR: main: Aborting startup
>> Mar 22 11:25:18 n0014 pengine: [25842]: info: Invoked: 
>> /usr/lib64/heartbeat/pengine 
>> Mar 22 11:25:18 n0014 pengine: [25842]: info: crm_log_init_worker: Changed 
>> active directory to /var/lib/heartbeat/cores/hacluster
>> Mar 22 11:25:18 n0014 pengine: [25842]: debug: main: Checking for old 
>> instances of pengine
>> Mar 22 11:25:18 n0014 pengine: [25842]: debug: 
>> init_client_ipc_comms_nodispatch: Attempting to talk on: /var/run/crm/pengine
> 
> That "/var/run/crm" directory is available and owned by
> hacluster.haclient ... and writable by at least the hacluster user?
> 
> Regards,
> Andreas
> 
> -- 
> Need help with Pacemaker?
> http://www.hastexo.com/now
> 
>> Mar 22 11:25:18 n0014 pacemakerd: [25834]: ERROR: pcmk_child_exit: Child 
>> process attrd exited (pid=25841, rc=100)
>> Mar 22 11:25:18 n0014 pacemakerd: [25834]: notice: pcmk_child_exit: Child 
>> process attrd no longer wishes to be respawned
>> Mar 22 11:25:18 n0014 pacemakerd: [25834]: info: update_node_processes: Node 
>> n0014.lustre now has process list: 00000000000000000000000000110312 (was 
>> 00000000000000000000000000111312)
>> Mar 22 11:25:18 n0014 pengine: [25842]: debug: 
>> init_client_ipc_comms_nodispatch: Could not init comms on: 
>> /var/run/crm/pengine
>> Mar 22 11:25:18 n0014 pengine: [25842]: debug: main: Init server comms
>> Mar 22 11:25:18 n0014 pengine: [25842]: info: main: Starting pengine
>> Mar 22 11:25:18 n0014 stonith-ng: [25838]: debug: init_cpg_connection: 
>> Adding fd=4 to mainloop
>> Mar 22 11:25:18 n0014 stonith-ng: [25838]: info: init_ais_connection_once: 
>> Connection to 'corosync': established
>> Mar 22 11:25:18 n0014 stonith-ng: [25838]: debug: crm_new_peer: Creating 
>> entry for node n0014.lustre/247988234
>> Mar 22 11:25:18 n0014 stonith-ng: [25838]: info: crm_new_peer: Node 
>> n0014.lustre now has id: 247988234
>> Mar 22 11:25:18 n0014 stonith-ng: [25838]: info: crm_new_peer: Node 
>> 247988234 is now known as n0014.lustre
>> Mar 22 11:25:18 n0014 stonith-ng: [25838]: debug: 
>> init_client_ipc_comms_nodispatch: Attempting to talk on: /var/run/crm/pcmk
>> Mar 22 11:25:18 n0014 crmd: [25843]: info: Invoked: 
>> /usr/lib64/heartbeat/crmd 
>> Mar 22 11:25:18 n0014 pacemakerd: [25834]: debug: pcmk_client_connect: 
>> Channel 0x995530 connected: 1 children
>> Mar 22 11:25:18 n0014 stonith-ng: [25838]: info: main: Starting stonith-ng 
>> mainloop
>> Mar 22 11:25:18 n0014 crmd: [25843]: info: crm_log_init_worker: Changed 
>> active directory to /var/lib/heartbeat/cores/hacluster
>> Mar 22 11:25:18 n0014 crmd: [25843]: info: main: CRM Hg Version: 
>> a02c0f19a00c1eb2527ad38f146ebc0834814558
>> Mar 22 11:25:18 n0014 crmd: [25843]: info: crmd_init: Starting crmd
>> Mar 22 11:25:18 n0014 crmd: [25843]: debug: s_crmd_fsa: Processing 
>> I_STARTUP: [ state=S_STARTING cause=C_STARTUP origin=crmd_init ]
>> Mar 22 11:25:18 n0014 crmd: [25843]: debug: do_fsa_action: actions:trace: 
>> #011// A_LOG   
>> Mar 22 11:25:18 n0014 crmd: [25843]: debug: do_fsa_action: actions:trace: 
>> #011// A_STARTUP
>> Mar 22 11:25:18 n0014 crmd: [25843]: debug: do_startup: Registering Signal 
>> Handlers
>> Mar 22 11:25:18 n0014 crmd: [25843]: debug: do_startup: Creating CIB and LRM 
>> objects
>> Mar 22 11:25:18 n0014 stonith-ng: [25838]: info: crm_update_peer: Node 
>> n0014.lustre: id=247988234 state=unknown addr=(null) votes=0 born=0 seen=0 
>> proc=00000000000000000000000000110312 (new)
>> Mar 22 11:25:18 n0014 crmd: [25843]: info: G_main_add_SignalHandler: Added 
>> signal handler for signal 17
>> Mar 22 11:25:18 n0014 crmd: [25843]: debug: do_fsa_action: actions:trace: 
>> #011// A_CIB_START
>> Mar 22 11:25:18 n0014 crmd: [25843]: debug: 
>> init_client_ipc_comms_nodispatch: Attempting to talk on: /var/run/crm/cib_rw
>> Mar 22 11:25:18 n0014 crmd: [25843]: debug: 
>> init_client_ipc_comms_nodispatch: Could not init comms on: 
>> /var/run/crm/cib_rw
>> Mar 22 11:25:18 n0014 crmd: [25843]: debug: cib_native_signon_raw: 
>> Connection to command channel failed
>> Mar 22 11:25:18 n0014 crmd: [25843]: debug: 
>> init_client_ipc_comms_nodispatch: Attempting to talk on: 
>> /var/run/crm/cib_callback
>> Mar 22 11:25:18 n0014 crmd: [25843]: debug: 
>> init_client_ipc_comms_nodispatch: Could not init comms on: 
>> /var/run/crm/cib_callback
>> Mar 22 11:25:18 n0014 crmd: [25843]: debug: cib_native_signon_raw: 
>> Connection to callback channel failed
>> Mar 22 11:25:18 n0014 crmd: [25843]: debug: cib_native_signon_raw: 
>> Connection to CIB failed: connection failed
>> Mar 22 11:25:18 n0014 crmd: [25843]: debug: cib_native_signoff: Signing out 
>> of the CIB Service
>> Mar 22 11:25:18 n0014 cib: [25839]: ERROR: Element cib failed to validate 
>> content
>> Mar 22 11:25:18 n0014 cib: [25839]: ERROR: readCibXmlFile: CIB does not 
>> validate with <null>
>> Mar 22 11:25:18 n0014 cib: [25839]: info: startCib: CIB Initialization 
>> completed successfully
>> Mar 22 11:25:18 n0014 cib: [25839]: info: get_cluster_type: Cluster type is: 
>> 'corosync'
>> Mar 22 11:25:18 n0014 cib: [25839]: notice: crm_cluster_connect: Connecting 
>> to cluster infrastructure: corosync
>> Mar 22 11:25:18 n0014 cib: [25839]: ERROR: init_cpg_connection: Could not 
>> connect to the Cluster Process Group API: 2
>> Mar 22 11:25:18 n0014 cib: [25839]: CRIT: cib_init: Cannot sign in to the 
>> cluster... terminating
>> 
>> 
>> ----------------
>> John White
>> HPC Systems Engineer
>> (510) 486-7307
>> One Cyclotron Rd, MS: 50C-3209C
>> Lawrence Berkeley National Lab
>> Berkeley, CA 94720
>> 
>> 
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>> 
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>> 
> 
> 
> 
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org


_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Reply via email to