Further investigation (not yet correlated) suggests that it might possibly be a 
permissions or a timing issue.
I can see from straces crmd creating the lrm sockets

[pid  4433] unlink("/var/run/heartbeat/lrm_cmd_sock") = -1 ENOENT (No such file 
or directory)
[pid  4433] bind(4, {sa_family=AF_FILE, 
path="/var/run/heartbeat/lrm_cmd_sock"}, 110) = 0
[pid  4433] chmod("/var/run/heartbeat/lrm_cmd_sock", 0777) = 0
[pid  4433] listen(4, 10)               = 0
[pid  4433] fcntl(4, F_GETFL)           = 0x2 (flags O_RDWR)
[pid  4433] fcntl(4, F_SETFL, O_RDWR|O_NONBLOCK) = 0
[pid  4433] socket(PF_FILE, SOCK_STREAM, 0) = 7
[pid  4433] unlink("/var/run/heartbeat/lrm_callback_sock") = -1 ENOENT (No such 
file or directory)
[pid  4433] bind(7, {sa_family=AF_FILE, 
path="/var/run/heartbeat/lrm_callback_sock"}, 110) = 0
[pid  4433] chmod("/var/run/heartbeat/lrm_callback_sock", 0777) = 0

 and then shortly afterwards delete them again
[pid  4433] unlink("/var/run/heartbeat/lrm_cmd_sock" <unfinished ...>
[pid  4436] <... mprotect resumed> )    = 0
[pid  4433] <... unlink resumed> )      = 0
[pid  4425] futex(0x7f33c929b0c4, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x7f33c929b0c0, 
{FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1} <unfinished ...>
[pid  4433] close(7 <unfinished ...>
[pid  4426] <... futex resumed> )       = 0
[pid  4433] <... close resumed> )       = 0
[pid  4426] futex(0x7f33c929b100, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...>
[pid  4433] unlink("/var/run/heartbeat/lrm_callback_sock" <unfinished ...>


On one of the two nodes I did manage to make some progress in that I stopped 
corosync, waited for a few minutes then started it again and now I get the 
sockets (Doing /etc/init.d/corosync restarts or machine reboots wasnt ever 
successful.
r...@node1:/var/run/heartbeat# ls -l
total 0
srwxrwxrwx 1 root root  0 2010-11-17 01:22 lrm_callback_sock
srwxrwxrwx 1 root root  0 2010-11-17 01:22 lrm_cmd_sock
drwxr-xr-x 2 root root 40 2010-11-17 01:22 rsctmp
srwxrwxrwx 1 root root  0 2010-11-17 01:22 stonithd
srwxrwxrwx 1 root root  0 2010-11-17 01:22 stonithd_callback
but I cant repeat this on the second node.
I havent tried it on the first node again (at least I can compare things with 
the 2nd node). 



After

-- 
 do_lrm_control: Failed to sign on to the LRM after upgrade to Maverick
https://bugs.launchpad.net/bugs/676391
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to