Further investigation (not yet correlated) suggests that it might possibly be a permissions or a timing issue. I can see from straces crmd creating the lrm sockets
[pid 4433] unlink("/var/run/heartbeat/lrm_cmd_sock") = -1 ENOENT (No such file or directory) [pid 4433] bind(4, {sa_family=AF_FILE, path="/var/run/heartbeat/lrm_cmd_sock"}, 110) = 0 [pid 4433] chmod("/var/run/heartbeat/lrm_cmd_sock", 0777) = 0 [pid 4433] listen(4, 10) = 0 [pid 4433] fcntl(4, F_GETFL) = 0x2 (flags O_RDWR) [pid 4433] fcntl(4, F_SETFL, O_RDWR|O_NONBLOCK) = 0 [pid 4433] socket(PF_FILE, SOCK_STREAM, 0) = 7 [pid 4433] unlink("/var/run/heartbeat/lrm_callback_sock") = -1 ENOENT (No such file or directory) [pid 4433] bind(7, {sa_family=AF_FILE, path="/var/run/heartbeat/lrm_callback_sock"}, 110) = 0 [pid 4433] chmod("/var/run/heartbeat/lrm_callback_sock", 0777) = 0 and then shortly afterwards delete them again [pid 4433] unlink("/var/run/heartbeat/lrm_cmd_sock" <unfinished ...> [pid 4436] <... mprotect resumed> ) = 0 [pid 4433] <... unlink resumed> ) = 0 [pid 4425] futex(0x7f33c929b0c4, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x7f33c929b0c0, {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1} <unfinished ...> [pid 4433] close(7 <unfinished ...> [pid 4426] <... futex resumed> ) = 0 [pid 4433] <... close resumed> ) = 0 [pid 4426] futex(0x7f33c929b100, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...> [pid 4433] unlink("/var/run/heartbeat/lrm_callback_sock" <unfinished ...> On one of the two nodes I did manage to make some progress in that I stopped corosync, waited for a few minutes then started it again and now I get the sockets (Doing /etc/init.d/corosync restarts or machine reboots wasnt ever successful. r...@node1:/var/run/heartbeat# ls -l total 0 srwxrwxrwx 1 root root 0 2010-11-17 01:22 lrm_callback_sock srwxrwxrwx 1 root root 0 2010-11-17 01:22 lrm_cmd_sock drwxr-xr-x 2 root root 40 2010-11-17 01:22 rsctmp srwxrwxrwx 1 root root 0 2010-11-17 01:22 stonithd srwxrwxrwx 1 root root 0 2010-11-17 01:22 stonithd_callback but I cant repeat this on the second node. I havent tried it on the first node again (at least I can compare things with the 2nd node). After -- do_lrm_control: Failed to sign on to the LRM after upgrade to Maverick https://bugs.launchpad.net/bugs/676391 You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs