Re: [Pacemaker] Master/Slave resource cannot start

2009-08-24 Thread Andrew Beekhof
On Mon, Aug 24, 2009 at 2:52 PM, Andrew Beekhof wrote: > On Mon, Aug 24, 2009 at 2:33 PM, Diego > Remolina wrote: >> I was noticing this even before the 1.0.5 update right after I changed from >> heartbeat to openais. I assume there may be some files in that folder from >> back when I was using hea

Re: [Pacemaker] Master/Slave resource cannot start

2009-08-24 Thread Andrew Beekhof
On Mon, Aug 24, 2009 at 2:33 PM, Diego Remolina wrote: > I was noticing this even before the 1.0.5 update right after I changed from > heartbeat to openais. I assume there may be some files in that folder from > back when I was using heartbeat which were causing the problem even with the > older pa

Re: [Pacemaker] Master/Slave resource cannot start

2009-08-24 Thread Diego Remolina
I was noticing this even before the 1.0.5 update right after I changed from heartbeat to openais. I assume there may be some files in that folder from back when I was using heartbeat which were causing the problem even with the older pacemaker version. If I want to delete all files in /var/lib

Re: [Pacemaker] Master/Slave resource cannot start

2009-08-24 Thread Andrew Beekhof
The stack trace makes it look like a logging deadlock. I'll ask the openais maintainer about it. On Fri, Aug 21, 2009 at 5:11 PM, Diego Remolina wrote: > Here is what I am seeing now right after stopping openais, updating > heartbeat and pacemaker and trying to start openais again: > > [r...@phys-

Re: [Pacemaker] Master/Slave resource cannot start

2009-08-24 Thread Andrew Beekhof
On Fri, Aug 21, 2009 at 9:15 PM, hj lee wrote: > Hi, > > I had the same problem after upgrading to pacemaker 1.0.5 in RHLE 5.3. After > deleting all the files in /var/lib/pengine/ directory, this problem seems > gone, I haven't seen it so far. Maybe it is related the UID change in > pengine(haclust

Re: [Pacemaker] Master/Slave resource cannot start

2009-08-21 Thread hj lee
Hi, I had the same problem after upgrading to pacemaker 1.0.5 in RHLE 5.3. After deleting all the files in /var/lib/pengine/ directory, this problem seems gone, I haven't seen it so far. Maybe it is related the UID change in pengine(hacluster to daemon) in 1.0.5, but not exactly sure. hj On Fri,

Re: [Pacemaker] Master/Slave resource cannot start

2009-08-21 Thread Diego Remolina
Here is what I am seeing now right after stopping openais, updating heartbeat and pacemaker and trying to start openais again: [r...@phys-file02 ~]# /etc/init.d/openais status Stopped [r...@phys-file02 ~]# /etc/init.d/openais start Starting OpenAIS daemon (aisexec): starting... rc=0: OK [r...@ph

Re: [Pacemaker] Master/Slave resource cannot start

2009-08-13 Thread Andrew Beekhof
On Wed, Aug 12, 2009 at 3:35 PM, Diego Remolina wrote: >> could you instead attach to it with gdb and see what it was doing? > > I will try, but cannot promise it will be soon, beginning of the semester is > very busy and I am not familiar with gdb... gdb aisexec $PID_OF_AISEXEC # where then, for

Re: [Pacemaker] Master/Slave resource cannot start

2009-08-12 Thread Diego Remolina
could you instead attach to it with gdb and see what it was doing? I will try, but cannot promise it will be soon, beginning of the semester is very busy and I am not familiar with gdb... that looks suspicious... are you invoking the shell or crm_shadow? This is probably when I type crm st

Re: [Pacemaker] Master/Slave resource cannot start

2009-08-12 Thread Andrew Beekhof
On Wed, Aug 12, 2009 at 2:23 PM, Diego Remolina wrote: >>> Aug 12 07:57:17 phys-file02 openais[9380]: [crm  ] info: >>> process_ais_conf: >>> Reading configure >>> Aug 12 07:57:17 phys-file02 openais[9380]: [MAIN ] info: >>> config_find_next: >>> Processing additional logging options... >>> Aug 12

Re: [Pacemaker] Master/Slave resource cannot start

2009-08-12 Thread Diego Remolina
Aug 12 07:57:17 phys-file02 openais[9380]: [crm ] info: process_ais_conf: Reading configure Aug 12 07:57:17 phys-file02 openais[9380]: [MAIN ] info: config_find_next: Processing additional logging options... Aug 12 07:57:17 phys-file02 openais[9380]: [MAIN ] info: get_config_opt: Found 'on' for o

Re: [Pacemaker] Master/Slave resource cannot start

2009-08-12 Thread Andrew Beekhof
On Wed, Aug 12, 2009 at 2:13 PM, Diego Remolina wrote: >> Can you define "not correctly" please? >> I'd rather not ignore such behavior. > > The machine would come up and not join the cluster. Checking the status of > openais would show as "Running". crm status would show: > > Connection to cluster

Re: [Pacemaker] Master/Slave resource cannot start

2009-08-12 Thread Diego Remolina
Can you define "not correctly" please? I'd rather not ignore such behavior. The machine would come up and not join the cluster. Checking the status of openais would show as "Running". crm status would show: Connection to cluster failed: connection failed A look at the log file shows: Aug 12

Re: [Pacemaker] Master/Slave resource cannot start

2009-08-11 Thread Andrew Beekhof
On Tue, Aug 11, 2009 at 6:21 PM, Diego Remolina wrote: >> Solution: >> 1) clone the pingd >> 2) Delete you colocation constraint. It is useless. >> 3) Make a location constatint the allows the ip address only run on a node >> that gets points from the pingd. > > I want to thank Michael for pointing

Re: [Pacemaker] Master/Slave resource cannot start

2009-08-11 Thread Diego Remolina
Solution: 1) clone the pingd 2) Delete you colocation constraint. It is useless. 3) Make a location constatint the allows the ip address only run on a node that gets points from the pingd. I want to thank Michael for pointing out my mistake. I have also migrated away from using heartbeat to op

Re: [Pacemaker] Master/Slave resource cannot start

2009-08-07 Thread Michael Schwartzkopff
Am Freitag, 7. August 2009 14:30:47 schrieb Diego Remolina: > > What is the failconter of the resource on node phys-file02? Please do a > > crm_mon -1f > > What does it say? > > [r...@phys-file02 ~]# crm_mon -1f > > > > Last updated: Fri Aug 7 08:29:20 2009 > Stack: Heartbeat > Curren

Re: [Pacemaker] Master/Slave resource cannot start

2009-08-07 Thread Diego Remolina
What is the failconter of the resource on node phys-file02? Please do a crm_mon -1f What does it say? [r...@phys-file02 ~]# crm_mon -1f Last updated: Fri Aug 7 08:29:20 2009 Stack: Heartbeat Current DC: phys-file02.physics.gatech.edu (db786ace-4c9b-4ba1-b272-95b4d81b40a9) - par

Re: [Pacemaker] Master/Slave resource cannot start

2009-08-07 Thread Michael Schwartzkopff
Am Freitag, 7. August 2009 14:09:20 schrieb Diego Remolina: > Hi, > > I am fairly new to pacemaker, and while I had things working correctly > for a while, in testing failovers and playing with my machines I got > them to a state where one resource cannot start (ms-drbd_export:1). > >

[Pacemaker] Master/Slave resource cannot start

2009-08-07 Thread Diego Remolina
Hi, I am fairly new to pacemaker, and while I had things working correctly for a while, in testing failovers and playing with my machines I got them to a state where one resource cannot start (ms-drbd_export:1). Last updated: Fri Aug 7 07:27:52 2009 Stack: Heartbeat Current DC: