I know David has been putting a lot of time into the pacemaker-remote stuff lately, its quite possible that you're hitting a bug on our side. Is trying out the latest from Git an option?
Making rpms is pretty easy, just 'make rpm-dep rpm' should be enough. > On 14 Oct 2014, at 1:31 am, Саша Александров <shurr...@gmail.com> wrote: > > Hi! > > Most likely related... > I have node vm-vmwww with remote-node vmwww. Both are reported online > (vmwww:vm-vmwww) and vm-vmwww is reported as 'started on wings1'. > However, when I try to cleanup faulty failed action " vmwww_start_0 on wings1 > 'unknown error' (1): call=100, status=Timed Out ", here is what I get in the > log: > > Oct 13 18:25:43 wings1 crmd[3844]: warning: qb_ipcs_event_sendv: > new_event_notification (3844-18918-16): Broken pipe (32) > Oct 13 18:25:43 wings1 crmd[3844]: error: do_lrm_invoke: no lrmd > connection for remote node vmwww found on cluster node wings1. Can not > process request. > Oct 13 18:25:43 wings1 crmd[3844]: error: send_msg_via_ipc: Unknown > Sub-system (d483a600-5535-4f0d-8ffd-2af391f5cb21)... discarding message. > Oct 13 18:25:43 wings1 crmd[3844]: error: send_msg_via_ipc: Unknown > Sub-system (d483a600-5535-4f0d-8ffd-2af391f5cb21)... discarding message. > Oct 13 18:25:43 wings1 crmd[3844]: error: send_msg_via_ipc: Unknown > Sub-system (d483a600-5535-4f0d-8ffd-2af391f5cb21)... discarding message. > Oct 13 18:25:43 wings1 crmd[3844]: error: send_msg_via_ipc: Unknown > Sub-system (d483a600-5535-4f0d-8ffd-2af391f5cb21)... discarding message. > > I go to the VM, and try to run 'crm_mon': > > Oct 13 18:27:06 vmwww pacemaker_remoted[3798]: error: ipc_proxy_accept: No > ipc providers available for uid 0 gid 0 > Oct 13 18:27:06 vmwww pacemaker_remoted[3798]: error: > handle_new_connection: Error in connection setup (3798-3868-13): Remote I/O > error (121) > > ps aux | grep pace > root 3798 0.1 0.1 76396 2868 ? S 18:16 0:00 > pacemaker_remoted > > netstat -nltp | grep 3121 > tcp 0 0 0.0.0.0:3121 0.0.0.0:* > LISTEN 3798/pacemaker_remo > > However I can telnet ok: > > [root@wings1 ~]# telnet vmwww 3121 > Trying 192.168.222.89... > Connected to vmwww. > Escape character is '^]'. > ^] > telnet> quit > Connection closed. > > This is pretty weird... > > Best regards, > Alex > > > 2014-10-13 17:47 GMT+04:00 Саша Александров <shurr...@gmail.com>: > Hi! > > I was building a cluster with pacemaker+pacemaker-remote (CentOS 6.5, > everything from the official repo). > While I had several resources, everything was fine. However, when I added > more VMs (2 nodes and 10 VMs currently) I started to run into problems (see > below). > Strange thing is that when I start cman/pacemaker some time later - they seem > to work fine for some time. > > Oct 13 17:03:54 wings1 pacemakerd[26440]: notice: pcmk_child_exit: Child > process crmd terminated with signal 13 (pid=30010, core=0) > Oct 13 17:03:54 wings1 lrmd[26448]: warning: qb_ipcs_event_sendv: > new_event_notification (26448-30010-6): Bad file descriptor (9) > Oct 13 17:03:54 wings1 lrmd[26448]: warning: send_client_notify: > Notification of client crmd/665bd130-2630-454b-9102-3f17d2bd71f3 failed > Oct 13 17:03:54 wings1 pacemakerd[26440]: notice: pcmk_process_exit: > Respawning failed child process: crmd > Oct 13 17:03:54 wings1 lrmd[26448]: warning: send_client_notify: > Notification of client crmd/665bd130-2630-454b-9102-3f17d2bd71f3 failed > Oct 13 17:03:54 wings1 lrmd[26448]: warning: send_client_notify: > Notification of client crmd/665bd130-2630-454b-9102-3f17d2bd71f3 failed > Oct 13 17:03:54 wings1 lrmd[26448]: warning: send_client_notify: > Notification of client crmd/665bd130-2630-454b-9102-3f17d2bd71f3 failed > Oct 13 17:03:54 wings1 lrmd[26448]: warning: send_client_notify: > Notification of client crmd/665bd130-2630-454b-9102-3f17d2bd71f3 failed > Oct 13 17:03:54 wings1 lrmd[26448]: warning: send_client_notify: > Notification of client crmd/665bd130-2630-454b-9102-3f17d2bd71f3 failed > > Oct 13 17:03:57 wings1 pacemakerd[26440]: notice: pcmk_child_exit: Child > process crmd terminated with signal 13 (pid=30603, core=0) > Oct 13 17:03:57 wings1 lrmd[26448]: warning: qb_ipcs_event_sendv: > new_event_notification (26448-30603-6): Bad file descriptor (9) > Oct 13 17:03:57 wings1 lrmd[26448]: warning: send_client_notify: > Notification of client crmd/820ac884-24ca-4fff-9dc8-0a09e82e0e0a failed > Oct 13 17:03:57 wings1 pacemakerd[26440]: notice: pcmk_process_exit: > Respawning failed child process: crmd > Oct 13 17:03:57 wings1 lrmd[26448]: warning: send_client_notify: > Notification of client crmd/820ac884-24ca-4fff-9dc8-0a09e82e0e0a failed > Oct 13 17:03:57 wings1 lrmd[26448]: warning: send_client_notify: > Notification of client crmd/820ac884-24ca-4fff-9dc8-0a09e82e0e0a failed > Oct 13 17:03:57 wings1 lrmd[26448]: warning: send_client_notify: > Notification of client crmd/820ac884-24ca-4fff-9dc8-0a09e82e0e0a failed > Oct 13 17:03:57 wings1 lrmd[26448]: warning: send_client_notify: > Notification of client crmd/820ac884-24ca-4fff-9dc8-0a09e82e0e0a failed > Oct 13 17:03:57 wings1 lrmd[26448]: warning: send_client_notify: > Notification of client crmd/820ac884-24ca-4fff-9dc8-0a09e82e0e0a failed > Oct 13 17:03:57 wings1 crmd[31192]: notice: crm_add_logfile: Additional > logging available in /var/log/cluster/corosync.log > Oct 13 17:03:57 wings1 cib[26446]: warning: qb_ipcs_event_sendv: > new_event_notification (26446-30603-11): Broken pipe (32) > Oct 13 17:03:57 wings1 cib[26446]: warning: cib_notify_send_one: > Notification of client crmd/fe944296-b3a1-4177-a94c-650568e8ff0a failed > > .................. > > So it keeps restarting, I even had to unmanage resources and stop > pacemaker/cman. > > Oct 13 17:04:13 wings1 lrmd[26448]: warning: qb_ipcs_event_sendv: > new_event_notification (26448-32444-6): Bad file descriptor (9) > Oct 13 17:04:13 wings1 lrmd[26448]: warning: send_client_notify: > Notification of client crmd/ea7ab099-1005-450b-9e46-d9d13ea266e4 failed > Oct 13 17:04:13 wings1 lrmd[26448]: warning: send_client_notify: > Notification of client crmd/ea7ab099-1005-450b-9e46-d9d13ea266e4 failed > Oct 13 17:04:13 wings1 pacemakerd[26440]: notice: pcmk_child_exit: Child > process crmd terminated with signal 13 (pid=32444, core=0) > Oct 13 17:04:13 wings1 pacemakerd[26440]: notice: pcmk_process_exit: > Respawning failed child process: crmd > Oct 13 17:04:13 wings1 lrmd[26448]: warning: send_client_notify: > Notification of client crmd/ea7ab099-1005-450b-9e46-d9d13ea266e4 failed > Oct 13 17:04:13 wings1 lrmd[26448]: warning: send_client_notify: > Notification of client crmd/ea7ab099-1005-450b-9e46-d9d13ea266e4 failed > Oct 13 17:04:13 wings1 lrmd[26448]: warning: send_client_notify: > Notification of client crmd/ea7ab099-1005-450b-9e46-d9d13ea266e4 failed > Oct 13 17:04:13 wings1 lrmd[26448]: warning: send_client_notify: > Notification of client crmd/ea7ab099-1005-450b-9e46-d9d13ea266e4 failed > Oct 13 17:04:13 wings1 cib[26446]: warning: qb_ipcs_event_sendv: > new_event_notification (26446-32444-11): Broken pipe (32) > Oct 13 17:04:13 wings1 cib[26446]: warning: cib_notify_send_one: > Notification of client crmd/ef727424-ce2b-4b3b-8749-82136dc72af8 failed > > > > And one more thing (probably not related, but who knows) - I have CentOS 7.0 > on one of the VMs, LRMD is unable to establish communications with > pacemaker_remote on that VM: > > (node): > Oct 13 17:31:43 wings1 crmd[3844]: error: lrmd_tls_send_recv: Remote lrmd > server disconnected while waiting for reply with id 6. > Oct 13 17:31:45 wings1 crmd[3844]: error: lrmd_tls_send_recv: Remote lrmd > server disconnected while waiting for reply with id 7. > Oct 13 17:31:47 wings1 crmd[3844]: error: lrmd_tls_send_recv: Remote lrmd > server disconnected while waiting for reply with id 8. > Oct 13 17:31:48 wings1 crmd[3844]: error: lrmd_tls_send_recv: Remote lrmd > server disconnected while waiting for reply with id 9. > Oct 13 17:31:50 wings1 crmd[3844]: error: lrmd_tls_send_recv: Remote lrmd > server disconnected while waiting for reply with id 10. > Oct 13 17:31:51 wings1 crmd[3844]: error: lrmd_tls_send_recv: Remote lrmd > server disconnected while waiting for reply with id 11. > Oct 13 17:31:53 wings1 crmd[3844]: error: lrmd_tls_send_recv: Remote lrmd > server disconnected while waiting for reply with id 12. > > (VM): > Oct 13 21:27:32 bank systemd: Started Pacemaker Remote Service. > Oct 13 21:27:32 bank pacemaker_remoted: Cannot change active directory to > /var/lib/pacemaker/cores: No such file or directory (2) > Oct 13 21:27:32 bank pacemaker_remoted[1853]: notice: > lrmd_init_remote_tls_server: Starting a tls listener on port 3121. > Oct 13 21:27:32 bank pacemaker_remoted[1853]: notice: bind_and_listen: > Listening on address :: > Oct 13 21:31:39 bank pacemaker_remoted[1853]: notice: lrmd_remote_listen: > LRMD client connection established. 0x1c49d60 id: > de49ea57-e94c-45bf-9d2d-d0f36cb2c4f7 > Oct 13 21:31:40 bank pacemaker_remoted[1853]: error: crm_abort: > crm_remote_header: Triggered assert at remote.c:118 : endian == ENDIAN_LOCAL > Oct 13 21:31:40 bank pacemaker_remoted[1853]: error: crm_remote_header: > Invalid message detected, endian mismatch: badadbbd is neither 6d726c3c nor > the swab'd 3c6 > c726d > Oct 13 21:31:40 bank pacemaker_remoted[1853]: error: crm_abort: > crm_remote_header: Triggered assert at remote.c:118 : endian == ENDIAN_LOCAL > Oct 13 21:31:40 bank pacemaker_remoted[1853]: error: crm_remote_header: > Invalid message detected, endian mismatch: badadbbd is neither 6d726c3c nor > the swab'd 3c6c726d > Oct 13 21:31:40 bank pacemaker_remoted[1853]: error: crm_abort: > crm_remote_header: Triggered assert at remote.c:118 : endian == ENDIAN_LOCAL > Oct 13 21:31:40 bank pacemaker_remoted[1853]: error: crm_remote_header: > Invalid message detected, endian mismatch: badadbbd is neither 6d726c3c nor > the swab'd 3c6c726d > Oct 13 21:31:40 bank pacemaker_remoted[1853]: notice: > lrmd_remote_client_destroy: LRMD client disconnecting remote client - name: > <unknown> id: de49ea57-e94c-45bf-9d2d-d0f36cb2c4f7 > > > > -- > Best regards, > Alexandr > > > > -- > С уважением, ААА. > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org