On Thu, May 27, 2010 at 5:50 PM, Steven Dake <sd...@redhat.com> wrote:
> On 05/27/2010 08:40 AM, Diego Remolina wrote: > >> Is there any workaround for this? Perhaps a slightly older version of >> the rpms? If so where do I find those? >> >> > Corosync 1.2.1 doesn't have this issue apparently. With corosync 1.2.1, > please don't use "debug: on" keyword in your config options. I am not sure > where Andrew has corosync 1.2.1 rpms available. > > The corosync project itself doesn't release rpms. See our policy on this > topic: > > http://www.corosync.org/doku.php?id=faq:release_binaries > > Regards > -steve > > > In my case, using pacemaker/corosync from clusterlabs repo on rh el 5.5 32 bit I had: - both nodes ha1 and ha2 with [r...@ha1 ~]# rpm -qa corosync\* pacemaker\* pacemaker-1.0.8-6.el5 corosynclib-1.2.1-1.el5 corosync-1.2.1-1.el5 pacemaker-libs-1.0.8-6.el5 - stop of corosync on node ha1 - update (using clusterlabs repo proposed and applied packages for pacemaker with same version... donna if same bits..) This takes corosync to 1.2.2 - start of corosync on ha1 and successfull join with the still corosync 1.2.1 one May 27 18:59:23 ha1 corosync[5136]: [MAIN ] Corosync Cluster Engine exiting with status -1 at main.c:160. May 27 19:06:19 ha1 yum: Updated: corosynclib-1.2.2-1.1.el5.i386 May 27 19:06:19 ha1 yum: Updated: pacemaker-libs-1.0.8-6.1.el5.i386 May 27 19:06:19 ha1 yum: Updated: corosync-1.2.2-1.1.el5.i386 May 27 19:06:20 ha1 yum: Updated: pacemaker-1.0.8-6.1.el5.i386 May 27 19:06:20 ha1 yum: Updated: corosynclib-devel-1.2.2-1.1.el5.i386 May 27 19:06:22 ha1 yum: Updated: pacemaker-libs-devel-1.0.8-6.1.el5.i386 May 27 19:06:59 ha1 corosync[7442]: [MAIN ] Corosync Cluster Engine ('1.2.2'): started and ready to provide service. May 27 19:06:59 ha1 corosync[7442]: [MAIN ] Corosync built-in features: nss rdma May 27 19:06:59 ha1 corosync[7442]: [MAIN ] Successfully read main configuration file '/etc/corosync/corosync.conf'. May 27 19:06:59 ha1 corosync[7442]: [TOTEM ] Initializing transport (UDP/IP). May 27 19:06:59 ha1 corosync[7442]: [TOTEM ] Initializing transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0). this implies also start of resources on it (nfsclient and apache in my case) - move (and unmove to be able to take them again) of resources from ha2 to the updated node ha1 (nfs-group in my case) Resource Group: nfs-group lv_drbd0 (ocf::heartbeat:LVM): Started ha1 ClusterIP (ocf::heartbeat:IPaddr2): Started ha1 NfsFS (ocf::heartbeat:Filesystem): Started ha1 nfssrv (ocf::heartbeat:nfsserver): Started ha1 - stop of corosync 1.2.1 on ha2 - update of pacemaker and corosync on ha2 - startup of corosync on ha2 and correct join to cluster with start of its resources (nfsclient and apache in my case) May 27 19:14:42 ha2 corosync[30954]: [pcmk ] notice: pcmk_shutdown: cib confirmed stopped May 27 19:14:42 ha2 corosync[30954]: [pcmk ] notice: stop_child: Sent -15 to stonithd: [30961] May 27 19:14:42 ha2 stonithd: [30961]: notice: /usr/lib/heartbeat/stonithd normally quit. May 27 19:14:42 ha2 corosync[30954]: [pcmk ] info: pcmk_ipc_exit: Client stonithd (conn=0x82aee48, async-conn=0x82aee48) left May 27 19:14:43 ha2 corosync[30954]: [pcmk ] notice: pcmk_shutdown: stonithd confirmed stopped May 27 19:14:43 ha2 corosync[30954]: [pcmk ] info: update_member: Node ha2 now has process list: 00000000000000000000000000000002 (2) May 27 19:14:43 ha2 corosync[30954]: [pcmk ] notice: pcmk_shutdown: Shutdown complete May 27 19:14:43 ha2 corosync[30954]: [SERV ] Service engine unloaded: Pacemaker Cluster Manager 1.0.8 May 27 19:14:43 ha2 corosync[30954]: [SERV ] Service engine unloaded: corosync extended virtual synchrony service May 27 19:14:43 ha2 corosync[30954]: [SERV ] Service engine unloaded: corosync configuration service May 27 19:14:43 ha2 corosync[30954]: [SERV ] Service engine unloaded: corosync cluster closed process group service v1.01 May 27 19:14:43 ha2 corosync[30954]: [SERV ] Service engine unloaded: corosync cluster config database access v1.01 May 27 19:14:43 ha2 corosync[30954]: [SERV ] Service engine unloaded: corosync profile loading service May 27 19:14:43 ha2 corosync[30954]: [SERV ] Service engine unloaded: corosync cluster quorum service v0.1 May 27 19:14:43 ha2 corosync[30954]: [MAIN ] Corosync Cluster Engine exiting with status -1 at main.c:160. May 27 19:15:51 ha2 yum: Updated: corosynclib-1.2.2-1.1.el5.i386 May 27 19:15:51 ha2 yum: Updated: pacemaker-libs-1.0.8-6.1.el5.i386 May 27 19:15:52 ha2 yum: Updated: corosync-1.2.2-1.1.el5.i386 May 27 19:15:52 ha2 yum: Updated: pacemaker-1.0.8-6.1.el5.i386 May 27 19:17:00 ha2 corosync[3430]: [MAIN ] Corosync Cluster Engine ('1.2.2'): started and ready to provide service. May 27 19:17:00 ha2 corosync[3430]: [MAIN ] Corosync built-in features: nss rdma May 27 19:17:00 ha2 corosync[3430]: [MAIN ] Successfully read main configuration file '/etc/corosync/corosync.conf'. May 27 19:17:00 ha2 corosync[3430]: [TOTEM ] Initializing transport (UDP/IP). May 27 19:17:00 ha2 corosync[3430]: [TOTEM ] Initializing transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0). So in my case the sw upgrade was successfull with no downtime. Gianluca
_______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf