On 26/04/2013, at 9:12 PM, James Masson <james.mas...@opencredo.com> wrote:
> > > On 26/04/13 01:29, Andrew Beekhof wrote: >> >> On 26/04/2013, at 12:12 AM, James Masson <james.mas...@opencredo.com> wrote: >> >>> >>> Hi list, >>> >>> I'm trying to build and run pacemaker from a custom location. >>> >>> Corosync starts up fine. >>> >>> Pacemakerd does not - the result is. >> >> Try turning up the debug to see why the cib isn't happy: >> >>> Apr 25 13:54:10 [10482] fcde02a2-cc41-4c58-b6d2-b7bb0bada436 pacemakerd: >>> error: pcmk_child_exit: Child process cib exited (pid=10484, rc=100) >>> Apr 25 13:54:10 [10482] fcde02a2-cc41-4c58-b6d2-b7bb0bada436 pacemakerd: >>> warning: pcmk_child_exit: Pacemaker child process cib no longer >> >> >> > Hi Andrew, > > debug log + strace are attached. The strace has something interesting... > > > 5195 open("/dev/shm/qb-cpg-request-5173-5195-19-header", O_RDWR) = -1 EACCES > (Permission denied) > > > I know pacemaker uses shm to communicate. permissions on /dev/shm are (I > think) correct. Looks reasonable (now that I understand vcap :-) > > root@5627a5e1-9e30-4fe2-9178-6445e26a8ccc:~# ls -al /dev/shm/ > total 8224 > drwxrwx--- 2 root vcap 80 2013-04-26 10:30 . > drwxr-xr-x 12 root root 3900 2013-04-26 08:23 .. > -rw------- 1 root root 8388608 2013-04-26 10:30 qb-corosync-blackbox-data > -rw------- 1 root root 8248 2013-04-26 10:28 qb-corosync-blackbox-header > > When I changed permissions on /dev/shm to 777 - things get a little further - > CIB stays up, crmd respawns, and I get this over and over again in the logs. > > ################################## > Apr 26 10:55:52 [5758] 5627a5e1-9e30-4fe2-9178-6445e26a8ccc lrmd: > info: crm_client_destroy: Destroying 0 events > Apr 26 10:55:54 [5758] 5627a5e1-9e30-4fe2-9178-6445e26a8ccc lrmd: > info: crm_client_new: Connecting 0x1a498e0 for uid=1000 gid=0 pid=5775 > id=95b6eca5-a34e-49e5-b0f8-74b84857d690 > Apr 26 10:55:54 [5758] 5627a5e1-9e30-4fe2-9178-6445e26a8ccc lrmd: > info: crm_client_destroy: Destroying 0 events > Apr 26 10:55:56 [5758] 5627a5e1-9e30-4fe2-9178-6445e26a8ccc lrmd: > info: crm_client_new: Connecting 0x1a498e0 for uid=1000 gid=0 pid=5775 > id=117e515b-da4d-4842-9414-7b7d004e5c92 > Apr 26 10:55:56 [5758] 5627a5e1-9e30-4fe2-9178-6445e26a8ccc lrmd: > info: crm_client_destroy: Destroying 0 events > Apr 26 10:55:58 [5758] 5627a5e1-9e30-4fe2-9178-6445e26a8ccc lrmd: > info: crm_client_new: Connecting 0x1a498e0 for uid=1000 gid=0 pid=5775 > id=cf7c10b1-14a1-47d1-9e2e-30707254256f > Apr 26 10:55:58 [5758] 5627a5e1-9e30-4fe2-9178-6445e26a8ccc lrmd: > info: crm_client_destroy: Destroying 0 events > Apr 26 10:55:58 [5754] 5627a5e1-9e30-4fe2-9178-6445e26a8ccc pacemakerd: > error: pcmk_child_exit: Child process crmd exited (pid=5775, rc=2) No logs from the crmd? > Apr 26 10:55:58 [5754] 5627a5e1-9e30-4fe2-9178-6445e26a8ccc pacemakerd: > trace: update_node_processes: Empty uname for node 839122954 > Apr 26 10:55:58 [5754] 5627a5e1-9e30-4fe2-9178-6445e26a8ccc pacemakerd: > debug: update_node_processes: Node > 5627a5e1-9e30-4fe2-9178-6445e26a8ccc now has process list: > 00000000000000000000000000111112 (was 00000000000000000000000000111312) > Apr 26 10:55:58 [5754] 5627a5e1-9e30-4fe2-9178-6445e26a8ccc pacemakerd: > trace: update_process_clients: Sending process list to 0 children > Apr 26 10:55:58 [5754] 5627a5e1-9e30-4fe2-9178-6445e26a8ccc pacemakerd: > trace: update_process_peers: Sending <node > uname="5627a5e1-9e30-4fe2-9178-6445e26a8ccc" proclist="1118482"/> > Apr 26 10:55:58 [5754] 5627a5e1-9e30-4fe2-9178-6445e26a8ccc pacemakerd: > notice: pcmk_process_exit: Respawning failed child process: crmd > Apr 26 10:55:58 [5754] 5627a5e1-9e30-4fe2-9178-6445e26a8ccc pacemakerd: > info: start_child: Forked child 5789 for process crmd > Apr 26 10:55:58 [5754] 5627a5e1-9e30-4fe2-9178-6445e26a8ccc pacemakerd: > trace: update_node_processes: Empty uname for node 839122954 > Apr 26 10:55:58 [5754] 5627a5e1-9e30-4fe2-9178-6445e26a8ccc pacemakerd: > debug: update_node_processes: Node > 5627a5e1-9e30-4fe2-9178-6445e26a8ccc now has process list: > 00000000000000000000000000111312 (was 00000000000000000000000000111112) > Apr 26 10:55:58 [5754] 5627a5e1-9e30-4fe2-9178-6445e26a8ccc pacemakerd: > trace: update_process_clients: Sending process list to 0 children > Apr 26 10:55:58 [5754] 5627a5e1-9e30-4fe2-9178-6445e26a8ccc pacemakerd: > trace: update_process_peers: Sending <node > uname="5627a5e1-9e30-4fe2-9178-6445e26a8ccc" proclist="1118994"/> > Apr 26 10:55:58 [5754] 5627a5e1-9e30-4fe2-9178-6445e26a8ccc pacemakerd: > trace: crm_user_lookup: Cluster user vcap has uid=1000 gid=1000 > Apr 26 10:55:58 [5754] 5627a5e1-9e30-4fe2-9178-6445e26a8ccc pacemakerd: > trace: mainloop_gio_callback: New message from corosync-cpg[0x21b1c60] > Apr 26 10:55:58 [5758] 5627a5e1-9e30-4fe2-9178-6445e26a8ccc lrmd: > info: crm_client_new: Connecting 0x1a498e0 for uid=1000 gid=0 pid=5789 > id=5dfb6f5a-8b53-42f6-b5f5-61e49efa93dd > Apr 26 10:55:58 [5758] 5627a5e1-9e30-4fe2-9178-6445e26a8ccc lrmd: > info: crm_client_new: Connecting 0x1a636f0 for uid=1000 gid=0 pid=5789 > id=3198d49f-8ff9-4799-9496-1b9aed0de807 > Apr 26 10:55:58 [5758] 5627a5e1-9e30-4fe2-9178-6445e26a8ccc lrmd: > info: crm_client_destroy: Destroying 0 events > Apr 26 10:55:58 [5758] 5627a5e1-9e30-4fe2-9178-6445e26a8ccc lrmd: > info: crm_client_new: Connecting 0x1a56cb0 for uid=1000 gid=0 pid=5789 > id=2713f990-2533-4fb8-82e0-31e40b1ef577 > Apr 26 10:55:58 [5758] 5627a5e1-9e30-4fe2-9178-6445e26a8ccc lrmd: > info: crm_client_destroy: Destroying 0 events > Apr 26 10:55:58 [5758] 5627a5e1-9e30-4fe2-9178-6445e26a8ccc lrmd: > info: crm_client_new: Connecting 0x1a571f0 for uid=1000 gid=0 pid=5789 > id=2bf401a2-3bd5-43af-9328-0a53bb61d9f7 > Apr 26 10:55:58 [5758] 5627a5e1-9e30-4fe2-9178-6445e26a8ccc lrmd: > info: crm_client_destroy: Destroying 0 events > Apr 26 10:55:58 [5758] 5627a5e1-9e30-4fe2-9178-6445e26a8ccc lrmd: > info: crm_client_destroy: Destroying 0 events > Apr 26 10:56:00 [5758] 5627a5e1-9e30-4fe2-9178-6445e26a8ccc lrmd: > info: crm_client_new: Connecting 0x1a498e0 for uid=1000 gid=0 pid=5789 > id=7233fbec-3633-4a48-8fe7-3028bfa58029 > Apr 26 10:56:00 [5758] 5627a5e1-9e30-4fe2-9178-6445e26a8ccc lrmd: > info: crm_client_destroy: Destroying 0 events > Apr 26 10:56:02 [5758] 5627a5e1-9e30-4fe2-9178-6445e26a8ccc lrmd: > info: crm_client_new: Connecting 0x1a498e0 for uid=1000 gid=0 pid=5789 > id=a7b76888-7137-4eb1-888d-d7a3ea273a4f > Apr 26 10:56:02 [5758] 5627a5e1-9e30-4fe2-9178-6445e26a8ccc lrmd: > info: crm_client_destroy: Destroying 0 events > Apr 26 10:56:04 [5758] 5627a5e1-9e30-4fe2-9178-6445e26a8ccc lrmd: > info: crm_client_new: Connecting 0x1a498e0 for uid=1000 gid=0 pid=5789 > id=4fbd695d-902b-4a29-957f-8d36fd072178 > Apr 26 10:56:04 [5758] 5627a5e1-9e30-4fe2-9178-6445e26a8ccc lrmd: > info: crm_client_destroy: Destroying 0 events > Apr 26 10:56:06 [5758] 5627a5e1-9e30-4fe2-9178-6445e26a8ccc lrmd: > info: crm_client_new: Connecting 0x1a498e0 for uid=1000 gid=0 pid=5789 > id=a3e00689-d842-456d-957a-22e2e4e7eedf > ################## > > SHM while running... > > ##################### > root@5627a5e1-9e30-4fe2-9178-6445e26a8ccc:~# ls -al /dev/shm/ > total 34936 > drwxrwxrwx 2 root vcap 1280 2013-04-26 10:57 . > drwxr-xr-x 12 root root 3900 2013-04-26 08:23 .. > -rw------- 1 root root 1048576 2013-04-26 10:54 > qb-cfg-event-5598-5754-16-data > -rw------- 1 root root 8248 2013-04-26 10:54 > qb-cfg-event-5598-5754-16-header > -rw------- 1 root root 1048576 2013-04-26 10:54 > qb-cfg-request-5598-5754-16-data > -rw------- 1 root root 8252 2013-04-26 10:54 > qb-cfg-request-5598-5754-16-header > -rw------- 1 root root 1048576 2013-04-26 10:54 > qb-cfg-response-5598-5754-16-data > -rw------- 1 root root 8248 2013-04-26 10:54 > qb-cfg-response-5598-5754-16-header > -rw-rw---- 1 vcap root 524288 2013-04-26 10:54 > qb-cib_rw-event-5756-5757-9-data > -rw-rw---- 1 vcap root 8248 2013-04-26 10:54 > qb-cib_rw-event-5756-5757-9-header > -rw-rw---- 1 vcap root 524288 2013-04-26 10:54 > qb-cib_rw-event-5756-5759-10-data > -rw-rw---- 1 vcap root 8248 2013-04-26 10:54 > qb-cib_rw-event-5756-5759-10-header > -rw-rw---- 1 vcap root 524288 2013-04-26 10:54 > qb-cib_rw-request-5756-5757-9-data > -rw-rw---- 1 vcap root 8252 2013-04-26 10:54 > qb-cib_rw-request-5756-5757-9-header > -rw-rw---- 1 vcap root 524288 2013-04-26 10:54 > qb-cib_rw-request-5756-5759-10-data > -rw-rw---- 1 vcap root 8252 2013-04-26 10:54 > qb-cib_rw-request-5756-5759-10-header > -rw-rw---- 1 vcap root 524288 2013-04-26 10:54 > qb-cib_rw-response-5756-5757-9-data > -rw-rw---- 1 vcap root 8248 2013-04-26 10:54 > qb-cib_rw-response-5756-5757-9-header > -rw-rw---- 1 vcap root 524288 2013-04-26 10:54 > qb-cib_rw-response-5756-5759-10-data > -rw-rw---- 1 vcap root 8248 2013-04-26 10:54 > qb-cib_rw-response-5756-5759-10-header > -rw-rw---- 1 vcap root 524288 2013-04-26 10:56 > qb-cib_shm-event-5756-5808-7-data > -rw-rw---- 1 vcap root 8248 2013-04-26 10:56 > qb-cib_shm-event-5756-5808-7-header > -rw-rw---- 1 vcap root 524288 2013-04-26 10:56 > qb-cib_shm-request-5756-5808-7-data > -rw-rw---- 1 vcap root 8252 2013-04-26 10:56 > qb-cib_shm-request-5756-5808-7-header > -rw-rw---- 1 vcap root 524288 2013-04-26 10:56 > qb-cib_shm-response-5756-5808-7-data > -rw-rw---- 1 vcap root 8248 2013-04-26 10:56 > qb-cib_shm-response-5756-5808-7-header > -rw------- 1 root root 8388608 2013-04-26 10:56 qb-corosync-blackbox-data > -rw------- 1 root root 8248 2013-04-26 10:47 qb-corosync-blackbox-header > -rw------- 1 root root 1048576 2013-04-26 10:54 > qb-cpg-event-5598-5754-17-data > -rw------- 1 root root 8248 2013-04-26 10:54 > qb-cpg-event-5598-5754-17-header > -rw------- 1 vcap root 1048576 2013-04-26 10:54 > qb-cpg-event-5598-5756-19-data > -rw------- 1 vcap root 8248 2013-04-26 10:54 > qb-cpg-event-5598-5756-19-header > -rw------- 1 root root 1048576 2013-04-26 10:54 > qb-cpg-event-5598-5757-18-data > -rw------- 1 root root 8248 2013-04-26 10:54 > qb-cpg-event-5598-5757-18-header > -rw------- 1 vcap root 1048576 2013-04-26 10:54 > qb-cpg-event-5598-5759-20-data > -rw------- 1 vcap root 8248 2013-04-26 10:54 > qb-cpg-event-5598-5759-20-header > -rw------- 1 vcap root 1048576 2013-04-26 10:56 > qb-cpg-event-5598-5808-21-data > -rw------- 1 vcap root 8248 2013-04-26 10:56 > qb-cpg-event-5598-5808-21-header > -rw------- 1 root root 1048576 2013-04-26 10:54 > qb-cpg-request-5598-5754-17-data > -rw------- 1 root root 8252 2013-04-26 10:54 > qb-cpg-request-5598-5754-17-header > -rw------- 1 vcap root 1048576 2013-04-26 10:54 > qb-cpg-request-5598-5756-19-data > -rw------- 1 vcap root 8252 2013-04-26 10:54 > qb-cpg-request-5598-5756-19-header > -rw------- 1 root root 1048576 2013-04-26 10:54 > qb-cpg-request-5598-5757-18-data > -rw------- 1 root root 8252 2013-04-26 10:54 > qb-cpg-request-5598-5757-18-header > -rw------- 1 vcap root 1048576 2013-04-26 10:54 > qb-cpg-request-5598-5759-20-data > -rw------- 1 vcap root 8252 2013-04-26 10:54 > qb-cpg-request-5598-5759-20-header > -rw------- 1 vcap root 1048576 2013-04-26 10:56 > qb-cpg-request-5598-5808-21-data > -rw------- 1 vcap root 8252 2013-04-26 10:56 > qb-cpg-request-5598-5808-21-header > -rw------- 1 root root 1048576 2013-04-26 10:54 > qb-cpg-response-5598-5754-17-data > -rw------- 1 root root 8248 2013-04-26 10:54 > qb-cpg-response-5598-5754-17-header > -rw------- 1 vcap root 1048576 2013-04-26 10:54 > qb-cpg-response-5598-5756-19-data > -rw------- 1 vcap root 8248 2013-04-26 10:54 > qb-cpg-response-5598-5756-19-header > -rw------- 1 root root 1048576 2013-04-26 10:54 > qb-cpg-response-5598-5757-18-data > -rw------- 1 root root 8248 2013-04-26 10:54 > qb-cpg-response-5598-5757-18-header > -rw------- 1 vcap root 1048576 2013-04-26 10:54 > qb-cpg-response-5598-5759-20-data > -rw------- 1 vcap root 8248 2013-04-26 10:54 > qb-cpg-response-5598-5759-20-header > -rw------- 1 vcap root 1048576 2013-04-26 10:56 > qb-cpg-response-5598-5808-21-data > -rw------- 1 vcap root 8248 2013-04-26 10:56 > qb-cpg-response-5598-5808-21-header > -rw------- 1 vcap root 1048576 2013-04-26 10:56 > qb-quorum-event-5598-5808-22-data > -rw------- 1 vcap root 8248 2013-04-26 10:56 > qb-quorum-event-5598-5808-22-header > -rw------- 1 vcap root 1048576 2013-04-26 10:56 > qb-quorum-request-5598-5808-22-data > -rw------- 1 vcap root 8252 2013-04-26 10:56 > qb-quorum-request-5598-5808-22-header > -rw------- 1 vcap root 1048576 2013-04-26 10:56 > qb-quorum-response-5598-5808-22-data > -rw------- 1 vcap root 8248 2013-04-26 10:56 > qb-quorum-response-5598-5808-22-header > ##################################### > > snippets from pacemaker-strace after chmod 777 /dev/shm > > ################### > CIB > 5833 chown("/dev/shm/qb-cib_shm-event-5833-5858-7-data", 4294967295, 1000) = > -1 EPERM (Operation not permitted) > 5833 chown("/dev/shm/qb-cib_shm-event-5833-5858-7-header", 4294967295, 1000) > = -1 EPERM (Operation not permitted) > 5833 chmod("/dev/shm/qb-cib_shm-event-5833-5858-7-data", 0660) = 0 > 5833 chmod("/dev/shm/qb-cib_shm-event-5833-5858-7-header", 0660) = 0 > #################### > CRMD > 5838 connect(3, {sa_family=AF_FILE, path=@"cib_shm"}, 110) = -1 ECONNREFUSED > (Connection refused) > 5838 close(3) = 0 > 5838 shutdown(4294967295, 2 /* send and receive */) = -1 EBADF (Bad file > descriptor) > 5838 close(4294967295) = -1 EBADF (Bad file descriptor) > 5838 write(2, "Could not establish cib_shm conn"..., 65) = 65 > 5838 clock_gettime(CLOCK_REALTIME, {1366973927, 255600506}) = 0 > 5838 munmap(0x7f6c1bcc3000, 528384) = 0 > ######################### > > this is looking more and more like a permissions problem on files > read/written on SHM. > > I read - http://www.ultrabug.fr/pacemaker-vulnerability-and-v1-1-9-release/ > - and added root to group vcap, and vcap to group root. ( vcap is my > equivalent for haclient user/group) - no change to behavior. I did add > "--with-acls" at compile time - but I'm not planning on using them. Which exact version (git hash) of pacemaker and libqb are you using? _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org