no one of you an idea how to fix this problem?

-----Ursprüngliche Nachricht-----
Von: [email protected] 
[mailto:[email protected]] Im Auftrag von Sebastian Kösters
Gesendet: Montag, 12. Januar 2009 23:10
An: [email protected]
Betreff: [Linux-HA] Problem with linux-ha and drbd (ERROR: Return code 1 from 
/etc/ha.d/resource.d/Filesystem)

Hi,

today i noticed a problem on my two Heartbeat / DRBD Servers.

on each server there are 2 primary drbd devices

on th-dus-mqm:

drbd0 / drbd2

on th-fra-mqm:

drbd1 / drbd3

if th-dus-mqm fails, drbd0 and drbd2 failover to th-fra-mqm. That normally 
works fine.

Today i tried to stop heartbeat manually on both servers for testing:

/etc/inint.d/heartbeat stop

then i noticed this errors in /var/log/ha-log (in both servers):

---

heartbeat[2834]: 2009/01/12_22:35:08 info: Heartbeat shutdown in progress. 
(2834)
heartbeat[4630]: 2009/01/12_22:35:08 info: Giving up all HA resources.
ResourceManager[4643]:  2009/01/12_22:35:08 info: Releasing resource group: 
th-dus-mqm 10.10.121.130 92.254.37.53 drbddisk::drbd0 
Filesystem::/dev/drbd0::/du
s::ext3 drbddisk::drbd2 Filesystem::/dev/drbd2::/home/tbmx/dus::ext3 mqm_dus
ResourceManager[4643]:  2009/01/12_22:35:08 info: Running 
/etc/ha.d/resource.d/mqm_dus  stop
ResourceManager[4643]:  2009/01/12_22:35:09 info: Running 
/etc/ha.d/resource.d/Filesystem /dev/drbd2 /home/tbmx/dus ext3 stop
Filesystem[5005]:       2009/01/12_22:35:09 INFO: Running stop for /dev/drbd2 
on /home/tbmx/dus
Filesystem[4994]:       2009/01/12_22:35:09 INFO:  Success
ResourceManager[4643]:  2009/01/12_22:35:09 info: Running 
/etc/ha.d/resource.d/drbddisk drbd2 stop
ResourceManager[4643]:  2009/01/12_22:35:09 info: Running 
/etc/ha.d/resource.d/Filesystem /dev/drbd0 /dus ext3 stop
Filesystem[5107]:       2009/01/12_22:35:09 INFO: Running stop for /dev/drbd0 
on /dus
Filesystem[5096]:       2009/01/12_22:35:09 INFO:  Success
ResourceManager[4643]:  2009/01/12_22:35:09 info: Running 
/etc/ha.d/resource.d/drbddisk drbd0 stop
ResourceManager[4643]:  2009/01/12_22:35:09 info: Running 
/etc/ha.d/resource.d/IPaddr 92.254.37.53 stop
IPaddr[5200]:   2009/01/12_22:35:09 INFO:  Success
ResourceManager[4643]:  2009/01/12_22:35:09 info: Running 
/etc/ha.d/resource.d/IPaddr 10.10.121.130 stop
IPaddr[5258]:   2009/01/12_22:35:09 INFO:  Success
ResourceManager[5295]:  2009/01/12_22:35:09 info: Releasing resource group: 
th-fra-mqm 10.10.121.131 92.254.37.54 drbddisk::drbd1 
Filesystem::/dev/drbd1::/fr
a::ext3 drbddisk::drbd3 Filesystem::/dev/drbd3::/home/tbmx/fra::ext3 mqm_fra
ResourceManager[5295]:  2009/01/12_22:35:09 info: Running 
/etc/ha.d/resource.d/mqm_fra  stop
ResourceManager[5295]:  2009/01/12_22:35:15 info: Running 
/etc/ha.d/resource.d/Filesystem /dev/drbd3 /home/tbmx/fra ext3 stop
Filesystem[5553]:       2009/01/12_22:35:15 INFO: Running stop for /dev/drbd3 
on /home/tbmx/fra
Filesystem[5553]:       2009/01/12_22:35:15 INFO: Trying to unmount 
/home/tbmx/fra
Filesystem[5553]:       2009/01/12_22:35:15 INFO: unmounted /home/tbmx/fra 
successfully
Filesystem[5542]:       2009/01/12_22:35:15 INFO:  Success
ResourceManager[5295]:  2009/01/12_22:35:15 info: Running 
/etc/ha.d/resource.d/drbddisk drbd3 stop
ResourceManager[5295]:  2009/01/12_22:35:15 info: Running 
/etc/ha.d/resource.d/Filesystem /dev/drbd1 /fra ext3 stop
Filesystem[5671]:       2009/01/12_22:35:15 INFO: Running stop for /dev/drbd1 
on /fra
Filesystem[5671]:       2009/01/12_22:35:15 INFO: Trying to unmount /fra
Filesystem[5671]:       2009/01/12_22:35:15 ERROR: Couldn't unmount /fra; 
trying cleanup with SIGTERM
Filesystem[5671]:       2009/01/12_22:35:15 INFO: Some processes on /fra were 
signalled
Filesystem[5671]:       2009/01/12_22:35:16 ERROR: Couldn't unmount /fra; 
trying cleanup with SIGTERM
Filesystem[5671]:       2009/01/12_22:35:16 INFO: Some processes on /fra were 
signalled
Filesystem[5671]:       2009/01/12_22:35:17 ERROR: Couldn't unmount /fra; 
trying cleanup with SIGTERM
Filesystem[5671]:       2009/01/12_22:35:17 INFO: Some processes on /fra were 
signalled
Filesystem[5671]:       2009/01/12_22:35:18 ERROR: Couldn't unmount /fra; 
trying cleanup with SIGKILL
Filesystem[5671]:       2009/01/12_22:35:18 INFO: Some processes on /fra were 
signalled
Filesystem[5671]:       2009/01/12_22:35:19 ERROR: Couldn't unmount /fra; 
trying cleanup with SIGKILL
Filesystem[5671]:       2009/01/12_22:35:20 INFO: No processes on /fra were 
signalled
Filesystem[5671]:       2009/01/12_22:35:21 ERROR: Couldn't unmount /fra, 
giving up!
Filesystem[5660]:       2009/01/12_22:35:21 ERROR:  Generic error
ResourceManager[5295]:  2009/01/12_22:35:21 ERROR: Return code 1 from 
/etc/ha.d/resource.d/Filesystem
ResourceManager[5295]:  2009/01/12_22:35:22 info: Retrying failed stop 
operation [Filesystem::/dev/drbd1::/fra::ext3]
ResourceManager[5295]:  2009/01/12_22:35:22 info: Running 
/etc/ha.d/resource.d/Filesystem /dev/drbd1 /fra ext3 stop
Filesystem[5839]:       2009/01/12_22:35:22 INFO: Running stop for /dev/drbd1 
on /fra
Filesystem[5839]:       2009/01/12_22:35:22 INFO: Trying to unmount /fra
Filesystem[5839]:       2009/01/12_22:35:22 ERROR: Couldn't unmount /fra; 
trying cleanup with SIGTERM
Filesystem[5839]:       2009/01/12_22:35:22 INFO: No processes on /fra were 
signalled
Filesystem[5839]:       2009/01/12_22:35:23 ERROR: Couldn't unmount /fra; 
trying cleanup with SIGTERM
Filesystem[5839]:       2009/01/12_22:35:23 INFO: No processes on /fra were 
signalled
Filesystem[5839]:       2009/01/12_22:35:24 ERROR: Couldn't unmount /fra; 
trying cleanup with SIGTERM
Filesystem[5839]:       2009/01/12_22:35:24 INFO: No processes on /fra were 
signalled
Filesystem[5839]:       2009/01/12_22:35:25 ERROR: Couldn't unmount /fra; 
trying cleanup with SIGKILL
Filesystem[5839]:       2009/01/12_22:35:25 INFO: Some processes on /fra were 
signalled
Filesystem[5839]:       2009/01/12_22:35:26 ERROR: Couldn't unmount /fra; 
trying cleanup with SIGKILL
Filesystem[5839]:       2009/01/12_22:35:26 INFO: No processes on /fra were 
signalled
Filesystem[5839]:       2009/01/12_22:35:27 ERROR: Couldn't unmount /fra; 
trying cleanup with SIGKILL
Filesystem[5839]:       2009/01/12_22:35:28 INFO: No processes on /fra were 
signalled
Filesystem[5839]:       2009/01/12_22:35:29 ERROR: Couldn't unmount /fra, 
giving up!
Filesystem[5828]:       2009/01/12_22:35:29 ERROR:  Generic error
.......
ResourceManager[5295]:  2009/01/12_22:36:36 ERROR: Return code 1 from 
/etc/ha.d/resource.d/Filesystem
Filesystem[9851]:       2009/01/12_22:36:36 INFO:  Running OK
ResourceManager[5295]:  2009/01/12_22:36:36 CRIT: Resource STOP failure. Reboot 
required!
ResourceManager[5295]:  2009/01/12_22:36:36 CRIT: Killing heartbeat 
ungracefully! 

---

after that the server does a reboot. After the reboot everything is working 
fine again

i dont know why he is not able to unmount the device correct. Sometimes i can 
stop heartbeat without errors and sometimes not.

my haresources file:

---

th-dus-mqm 10.10.121.130 92.254.37.53 drbddisk::drbd0 
Filesystem::/dev/drbd0::/dus::ext3 drbddisk::drbd2 
Filesystem::/dev/drbd2::/home/tbmx/dus::ext3 mqm_dus
th-fra-mqm 10.10.121.131 92.254.37.54 drbddisk::drbd1 
Filesystem::/dev/drbd1::/fra::ext3 drbddisk::drbd3 
Filesystem::/dev/drbd3::/home/tbmx/fra::ext3 mqm_fra

---

my ha.cf:

---

node th-dus-mqm th-fra-mqm
ucast bond0.121 10.10.121.132
ucast bond0.121 10.10.121.133
auto_failback off
debugfile /var/log/ha-debug
logfile /var/log/ha-log
warntime 3
deadtime 6
initdead 60
keepalive 2

---

my drbd.conf:

---

resource drbd0 {
  protocol C;
  startup {
    become-primary-on th-dus-mqm;
  }
  syncer {
   rate 50M;
 }
  net {
    allow-two-primaries;
  }
  on th-dus-mqm {
    device     /dev/drbd0;
    disk       /dev/sda10;
    address    10.10.121.132:7766;
    meta-disk  internal;
  }
  on th-fra-mqm {
    device    /dev/drbd0;
    disk      /dev/sda10;
    address   10.10.121.133:7766;
    meta-disk internal;
  }
}
resource drbd1 {
  protocol C;
  startup {
    become-primary-on th-fra-mqm;
  }
  syncer {
    rate 50M;
  }
  net {
    allow-two-primaries;
  }
  on th-dus-mqm {
    device     /dev/drbd1;
    disk       /dev/sda11;
    address    10.10.121.132:7776;
    meta-disk  internal;
  }
  on th-fra-mqm {
    device    /dev/drbd1;
    disk      /dev/sda11;
    address   10.10.121.133:7776;
    meta-disk internal;
  }
}
resource drbd2 {
  protocol C;
  startup {
    become-primary-on th-dus-mqm;
  }
  syncer {
    rate 50M;
  }
  net {
    allow-two-primaries;
  }
  on th-dus-mqm {
    device     /dev/drbd2;
    disk       /dev/sda12;
    address    10.10.121.132:7786;
    meta-disk  internal;
  }
  on th-fra-mqm {
    device    /dev/drbd2;
    disk      /dev/sda12;
    address   10.10.121.133:7786;
    meta-disk internal;
  }
}
resource drbd3 {
  protocol C;
  startup {
    become-primary-on th-fra-mqm;
  }
  syncer {
    rate 50M;
  }
  net {
    allow-two-primaries;
  }
  on th-dus-mqm {
    device     /dev/drbd3;
    disk       /dev/sda13;
    address    10.10.121.132:7796;
    meta-disk  internal;
  }
  on th-fra-mqm {
    device    /dev/drbd3;
    disk      /dev/sda13;
    address   10.10.121.133:7796;
    meta-disk internal;
  }
}

---

I hope you guys can help me with my Problem.

Thanks in advanced.

Kind regards
Sebastian_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to