Ok, I fixed this issue with the following patch against libqb 0.14.4: --- lib/unix.c.orig 2013-03-25 12:30:50.445762231 +0000 +++ lib/unix.c 2013-03-25 12:49:59.322276376 +0000 @@ -83,7 +83,7 @@ #if defined(QB_LINUX) || defined(QB_CYGWIN) snprintf(path, PATH_MAX, "/dev/shm/%s", file); #else - snprintf(path, PATH_MAX, LOCALSTATEDIR "/run/%s", file); + snprintf(path, PATH_MAX, "%s/%s", SOCKETDIR, file); is_absolute = path; #endif } @@ -91,7 +91,7 @@ if (fd < 0 && !is_absolute) { qb_util_perror(LOG_ERR, "couldn't open file %s", path); - snprintf(path, PATH_MAX, LOCALSTATEDIR "/run/%s", file); + snprintf(path, PATH_MAX, "%s/%s", SOCKETDIR, file); fd = open_mmap_file(path, file_flags); if (fd < 0) { res = -errno;
libqb was configured with --with-socket-dir=/var/run/qb, /var/run/qb owned by hacluster:haclient - this configuration works fine with both corosync 2.3.0 and pacemaker 1.1.8. Though I'm not sure that libqb is the right place to touch - maybe it'd be better to add some enhancements to pacemaker's lib/common/mainloop.c, mainloop_add_ipc_server() ? Cheers. On Mar 25, 2013, at 16:01 , Andrei Belov <defana...@gmail.com> wrote: > > I've rebuilt libqb using separated SOCKETDIR (/var/run/qb), and set > hacluster:haclient ownership to this dir. > > After that pacemakerd has been successfully started with all its childs: > > [root@ha1 /var/run/qb]# pacemakerd -fV > Could not establish pacemakerd connection: Connection refused (146) > info: crm_ipc_connect: Could not establish pacemakerd connection: > Connection refused (146) > info: get_cluster_type: Detected an active 'corosync' cluster > info: read_config: Reading configure for stack: corosync > notice: crm_add_logfile: Additional logging available in > /var/log/cluster/corosync.log > notice: main: Starting Pacemaker 1.1.8 (Build: 1f8858c): ncurses > libqb-logging libqb-ipc upstart systemd corosync-native > info: main: Maximum core file size is: 18446744073709551613 > info: qb_ipcs_us_publish: server name: pacemakerd > notice: update_node_processes: 48de70 Node 182452614 now known as > ha1, was: > info: start_child: Forked child 60719 for process cib > info: start_child: Forked child 60720 for process stonith-ng > info: start_child: Forked child 60721 for process lrmd > info: start_child: Forked child 60722 for process attrd > info: start_child: Forked child 60723 for process pengine > info: start_child: Forked child 60724 for process crmd > info: main: Starting mainloop > > [root@ha1 /var/run/qb]# ls -l > total 0 > srwxrwxrwx 1 hacluster root 0 Mar 25 11:50 attrd > srwxrwxrwx 1 root root 0 Mar 25 11:43 cfg > srwxrwxrwx 1 hacluster root 0 Mar 25 11:50 cib_ro > srwxrwxrwx 1 hacluster root 0 Mar 25 11:50 cib_rw > srwxrwxrwx 1 hacluster root 0 Mar 25 11:50 cib_shm > srwxrwxrwx 1 root root 0 Mar 25 11:43 cmap > srwxrwxrwx 1 root root 0 Mar 25 11:43 cpg > srwxrwxrwx 1 root root 0 Mar 25 11:50 lrmd > srwxrwxrwx 1 root root 0 Mar 25 11:50 pacemakerd > srwxrwxrwx 1 hacluster root 0 Mar 25 11:50 pengine > srwxrwxrwx 1 root root 0 Mar 25 11:43 quorum > srwxrwxrwx 1 root root 0 Mar 25 11:50 stonith-ng > > However, libqb still can not create some files in /var/run due to > insufficient permissions: > > Mar 25 11:50:45 [60719] cib: info: init_cs_connection_once: > Connection to 'corosync': established > Mar 25 11:50:45 [60719] cib: info: crm_get_peer: Node > 182452614 is now known as ha1 > Mar 25 11:50:45 [60719] cib: info: crm_get_peer: Node > 182452614 has uuid 182452614 > Mar 25 11:50:45 [60719] cib: info: qb_ipcs_us_publish: > server name: cib_ro > Mar 25 11:50:45 [60719] cib: info: qb_ipcs_us_publish: > server name: cib_rw > Mar 25 11:50:45 [60719] cib: info: qb_ipcs_us_publish: > server name: cib_shm > Mar 25 11:50:45 [60719] cib: info: cib_init: Starting cib > mainloop > Mar 25 11:50:45 [60719] cib: info: pcmk_cpg_membership: > Joined[0.0] cib.182452614 > Mar 25 11:50:45 [60719] cib: info: pcmk_cpg_membership: > Member[0.0] cib.182452614 > Mar 25 11:50:45 [60719] cib: info: pcmk_cpg_membership: > Member[0.1] cib.182452614 > Mar 25 11:50:46 [60719] cib: error: qb_sys_mmap_file_open: > couldn't open file /var/run/qb-cib_rw-control-60719-60720-15: Permission > denied (13) > Mar 25 11:50:46 [60719] cib: error: qb_ipcs_us_connect: > couldn't create file for mmap (60719-60720-15): Permission denied (13) > Mar 25 11:50:46 [60719] cib: error: handle_new_connection: > Invalid IPC credentials (60719-60720-15). > Mar 25 11:50:46 [60720] stonith-ng: info: crm_ipc_connect: Could not > establish cib_rw connection: Permission denied (13) > Mar 25 11:50:46 [60719] cib: error: qb_sys_mmap_file_open: > couldn't open file /var/run/qb-cib_shm-control-60719-60724-16: Permission > denied (13) > Mar 25 11:50:46 [60719] cib: error: qb_ipcs_us_connect: > couldn't create file for mmap (60719-60724-16): Permission denied (13) > Mar 25 11:50:46 [60719] cib: error: handle_new_connection: > Invalid IPC credentials (60719-60724-16). > Mar 25 11:50:46 [60724] crmd: info: crm_ipc_connect: Could not > establish cib_shm connection: Permission denied (13) > Mar 25 11:50:46 [60724] crmd: info: do_cib_control: Could not > connect to the CIB service: Transport endpoint is not connected > Mar 25 11:50:46 [60724] crmd: warning: do_cib_control: Couldn't > complete CIB registration 1 times... pause and retry > > > If someone has working setup on Linux with corosync 2.x, libqb and pacemaker > 1.1.x - I'd be very appreciated for sharing some information about a places > which libqb uses for its special socket files. > > Thanks in advance! > > (Can we say now that this problem is libqb-related, not pacemaker?) > > > > On Mar 25, 2013, at 15:30 , Andrei Belov <defana...@gmail.com> wrote: > >> Andreas, >> >> just tried "PCMK_ipc_type=socket pacemaker -fV" - a bunch of additional >> "event_send" errors appeared: >> >> Mar 25 11:15:55 [33641] ha1 corosync error [MAIN ] event_send retuned >> -32, expected 256! >> Mar 25 11:15:55 [33641] ha1 corosync error [SERV ] event_send retuned >> -32, expected 217! >> Mar 25 11:15:55 [33641] ha1 corosync error [SERV ] event_send retuned >> -32, expected 219! >> Mar 25 11:15:55 [33641] ha1 corosync error [SERV ] event_send retuned >> -32, expected 223! >> Mar 25 11:15:55 [33641] ha1 corosync error [SERV ] event_send retuned >> -32, expected 223! >> Mar 25 11:15:55 [33641] ha1 corosync error [SERV ] event_send retuned >> -32, expected 223! >> Mar 25 11:15:55 [33641] ha1 corosync error [SERV ] event_send retuned >> -32, expected 223! >> Mar 25 11:15:55 [33641] ha1 corosync error [SERV ] event_send retuned >> -32, expected 223! >> Mar 25 11:15:55 [33641] ha1 corosync error [SERV ] event_send retuned >> -32, expected 223! >> Mar 25 11:15:55 [33641] ha1 corosync error [SERV ] event_send retuned >> -32, expected 256! >> Mar 25 11:15:55 [53980] pengine: error: qb_ipcs_us_publish: >> Could not bind AF_UNIX (/var/run/pengine): Permission denied (13) >> Mar 25 11:15:55 [53980] pengine: error: mainloop_add_ipc_server: >> Could not start pengine IPC server: Unknown error (-13) >> Mar 25 11:15:55 [53980] pengine: error: main: Couldn't start IPC >> server >> Mar 25 11:15:55 [53975] pacemakerd: error: pcmk_child_exit: Child >> process pengine exited (pid=53980, rc=1) >> Mar 25 11:15:55 [33641] ha1 corosync error [SERV ] event_send retuned >> -32, expected 256! >> Mar 25 11:15:55 [33641] ha1 corosync error [SERV ] event_send retuned >> -32, expected 223! >> Mar 25 11:15:55 [53979] attrd: error: qb_ipcs_us_publish: >> Could not bind AF_UNIX (/var/run/attrd): Permission denied (13) >> Mar 25 11:15:55 [53979] attrd: error: mainloop_add_ipc_server: >> Could not start attrd IPC server: Unknown error (-13) >> Mar 25 11:15:55 [53979] attrd: error: main: Could not start IPC >> server >> Mar 25 11:15:55 [53979] attrd: error: main: Aborting startup >> Mar 25 11:15:55 [33641] ha1 corosync error [SERV ] event_send retuned >> -32, expected 223! >> Mar 25 11:15:55 [53975] pacemakerd: error: pcmk_child_exit: Child >> process attrd exited (pid=53979, rc=100) >> Mar 25 11:15:55 [33641] ha1 corosync error [SERV ] event_send retuned >> -32, expected 223! >> Mar 25 11:15:55 [33641] ha1 corosync error [SERV ] event_send retuned >> -32, expected 223! >> Mar 25 11:15:55 [33641] ha1 corosync error [SERV ] event_send retuned >> -32, expected 223! >> Mar 25 11:15:55 [33641] ha1 corosync error [SERV ] event_send retuned >> -32, expected 223! >> Mar 25 11:15:55 [33641] ha1 corosync error [SERV ] event_send retuned >> -32, expected 256! >> Mar 25 11:15:55 [53976] cib: error: qb_ipcs_us_publish: >> Could not bind AF_UNIX (/var/run/cib_ro): Permission denied (13) >> Mar 25 11:15:55 [53976] cib: error: mainloop_add_ipc_server: >> Could not start cib_ro IPC server: Unknown error (-13) >> Mar 25 11:15:55 [53976] cib: error: qb_ipcs_us_publish: >> Could not bind AF_UNIX (/var/run/cib_rw): Permission denied (13) >> Mar 25 11:15:55 [53976] cib: error: mainloop_add_ipc_server: >> Could not start cib_rw IPC server: Unknown error (-13) >> Mar 25 11:15:55 [53976] cib: error: qb_ipcs_us_publish: >> Could not bind AF_UNIX (/var/run/cib_shm): Permission denied (13) >> Mar 25 11:15:55 [53976] cib: error: mainloop_add_ipc_server: >> Could not start cib_shm IPC server: Unknown error (-13) >> Mar 25 11:15:55 [53976] cib: error: cib_init: Couldnt >> start all IPC channels, exiting. >> Mar 25 11:15:55 [53975] pacemakerd: error: pcmk_child_exit: Child >> process cib exited (pid=53976, rc=255) >> Mar 25 11:15:55 [33641] ha1 corosync error [SERV ] event_send retuned >> -32, expected 223! >> Mar 25 11:16:04 [53977] stonith-ng: error: setup_cib: Could not >> connect to the CIB service: -134 fffffd7fc421a0b0 >> Mar 25 11:16:04 [33641] ha1 corosync error [SERV ] event_send retuned >> -32, expected 217! >> Mar 25 11:16:04 [53975] pacemakerd: notice: pcmk_shutdown_worker: >> Attempting to inhibit respawning after fatal error >> >> >> # fgrep 32 /usr/include/sys/errno.h >> #define EPIPE 32 /* Broken pipe */ >> >> >> >> On Mar 25, 2013, at 13:55 , "Grüninger, Andreas (LGL Extern)" >> <andreas.gruenin...@lgl.bwl.de> wrote: >> >>> With solaris/openindiana you should use this setting >>> export PCMK_ipc_type=socket >>> >>> Andreas >>> >>> -----Ursprüngliche Nachricht----- >>> Von: Andrei Belov [mailto:defana...@gmail.com] >>> Gesendet: Montag, 25. März 2013 10:43 >>> An: pacemaker@oss.clusterlabs.org >>> Betreff: [Pacemaker] solaris problem >>> >>> Hi folks, >>> >>> I'm trying to build test HA cluster on Solaris 5.11 using libqb 0.14.4, >>> corosync 2.3.0 and pacemaker 1.1.8, and I'm facing a strange problem while >>> starting pacemaker. >>> >>> Log shows the following errors: >>> >>> Mar 25 09:21:26 [33720] lrmd: error: mainloop_add_ipc_server: >>> Could not start lrmd IPC server: Unknown error (-48) >>> Mar 25 09:21:26 [33720] lrmd: error: try_server_create: New >>> IPC server could not be created because another lrmd process exists, >>> sending shutdown command to old lrmd process. >>> Mar 25 09:21:26 [33720] lrmd: error: mainloop_add_ipc_server: >>> Could not start lrmd IPC server: Unknown error (-48) >>> Mar 25 09:21:26 [33720] lrmd: error: try_server_create: New >>> IPC server could not be created because another lrmd process exists, >>> sending shutdown command to old lrmd process. >>> Mar 25 09:21:26 [33720] lrmd: error: mainloop_add_ipc_server: >>> Could not start lrmd IPC server: Unknown error (-48) >>> Mar 25 09:21:26 [33720] lrmd: error: try_server_create: New >>> IPC server could not be created because another lrmd process exists, >>> sending shutdown command to old lrmd process. >>> Mar 25 09:21:26 [33720] lrmd: error: mainloop_add_ipc_server: >>> Could not start lrmd IPC server: Unknown error (-48) >>> Mar 25 09:21:26 [33720] lrmd: error: try_server_create: New >>> IPC server could not be created because another lrmd process exists, >>> sending shutdown command to old lrmd process. >>> Mar 25 09:21:26 [33720] lrmd: error: mainloop_add_ipc_server: >>> Could not start lrmd IPC server: Unknown error (-48) >>> Mar 25 09:21:26 [33720] lrmd: error: try_server_create: New >>> IPC server could not be created because another lrmd process exists, >>> sending shutdown command to old lrmd process. >>> Mar 25 09:21:26 [33720] lrmd: error: mainloop_add_ipc_server: >>> Could not start lrmd IPC server: Unknown error (-48) >>> Mar 25 09:21:26 [33720] lrmd: error: try_server_create: New >>> IPC server could not be created because another lrmd process exists, >>> sending shutdown command to old lrmd process. >>> Mar 25 09:21:26 [33720] lrmd: error: mainloop_add_ipc_server: >>> Could not start lrmd IPC server: Unknown error (-48) >>> Mar 25 09:21:26 [33720] lrmd: error: try_server_create: New >>> IPC server could not be created because another lrmd process exists, >>> sending shutdown command to old lrmd process. >>> Mar 25 09:21:26 [33720] lrmd: error: mainloop_add_ipc_server: >>> Could not start lrmd IPC server: Unknown error (-48) >>> Mar 25 09:21:26 [33720] lrmd: error: try_server_create: New >>> IPC server could not be created because another lrmd process exists, >>> sending shutdown command to old lrmd process. >>> Mar 25 09:21:26 [33720] lrmd: error: mainloop_add_ipc_server: >>> Could not start lrmd IPC server: Unknown error (-48) >>> Mar 25 09:21:26 [33720] lrmd: error: try_server_create: New >>> IPC server could not be created because another lrmd process exists, >>> sending shutdown command to old lrmd process. >>> Mar 25 09:21:26 [33720] lrmd: error: mainloop_add_ipc_server: >>> Could not start lrmd IPC server: Unknown error (-48) >>> Mar 25 09:21:26 [33720] lrmd: error: try_server_create: New >>> IPC server could not be created because another lrmd process exists, >>> sending shutdown command to old lrmd process. >>> Mar 25 09:21:26 [33720] lrmd: error: main: Failed to allocate >>> lrmd server. shutting down >>> Mar 25 09:21:26 [33722] pengine: error: mainloop_add_ipc_server: >>> Could not start pengine IPC server: Unknown error (-48) >>> Mar 25 09:21:26 [33722] pengine: error: main: Couldn't start IPC >>> server >>> Mar 25 09:21:26 [33717] pacemakerd: error: pcmk_child_exit: Child >>> process lrmd exited (pid=33720, rc=255) >>> Mar 25 09:21:26 [33721] attrd: error: qb_ipcs_us_publish: >>> Could not bind AF_UNIX (/var/run/attrd): Permission denied (13) >>> Mar 25 09:21:26 [33721] attrd: error: mainloop_add_ipc_server: >>> Could not start attrd IPC server: Unknown error (-13) >>> Mar 25 09:21:26 [33721] attrd: error: main: Could not start IPC >>> server >>> Mar 25 09:21:26 [33721] attrd: error: main: Aborting startup >>> Mar 25 09:21:26 [33717] pacemakerd: error: pcmk_child_exit: Child >>> process pengine exited (pid=33722, rc=1) >>> Mar 25 09:21:26 [33717] pacemakerd: error: pcmk_child_exit: Child >>> process attrd exited (pid=33721, rc=100) >>> Mar 25 09:21:26 [33718] cib: error: qb_ipcs_us_publish: >>> Could not bind AF_UNIX (/var/run/cib_ro): Permission denied (13) >>> Mar 25 09:21:26 [33718] cib: error: mainloop_add_ipc_server: >>> Could not start cib_ro IPC server: Unknown error (-13) >>> Mar 25 09:21:26 [33718] cib: error: qb_ipcs_us_publish: >>> Could not bind AF_UNIX (/var/run/cib_rw): Permission denied (13) >>> Mar 25 09:21:26 [33718] cib: error: mainloop_add_ipc_server: >>> Could not start cib_rw IPC server: Unknown error (-13) >>> Mar 25 09:21:26 [33718] cib: error: mainloop_add_ipc_server: >>> Could not start cib_shm IPC server: Unknown error (-48) >>> Mar 25 09:21:26 [33718] cib: error: cib_init: Couldnt >>> start all IPC channels, exiting. >>> Mar 25 09:21:26 [33717] pacemakerd: error: pcmk_child_exit: Child >>> process cib exited (pid=33718, rc=255) >>> Mar 25 09:21:35 [33719] stonith-ng: error: setup_cib: Could not >>> connect to the CIB service: -134 fffffd7fc421a0b0 >>> Mar 25 09:21:35 [33717] pacemakerd: notice: pcmk_shutdown_worker: >>> Attempting to inhibit respawning after fatal error >>> >>> Full log (in case of any things I've probably missed) is attached. >>> >>> I wonder to know the reason of "unknown error (-48)" - on this system 48 in >>> errno.h is "ENOTSUP", but I haven't found the exact place in code where >>> this may happen (so I'm not sure about that). >>> >>> Just for record - I'm able to run corosync on two nodes and see them >>> connected without any visible problems - thus, I suppose there may be >>> something wrong with either pacemaker or libqb. >>> >>> Any help will be greatly appreciated! >>> >>> Thanks, >>> Andrei. >>> >>> >>> >>> _______________________________________________ >>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >>> >>> Project Home: http://www.clusterlabs.org >>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>> Bugs: http://bugs.clusterlabs.org >> > _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org