On 21 Feb 2014, at 10:55 pm, Lukas Grossar <[email protected]> wrote:
> Hi > > I'm currently building a 2 node DRBD backed PostgreSQL on Debian Wheezy > and I'm testing how Pacemaker reacts to specific failure scenarios. > > One thing I did test that currently drives me crazy is when I manually > stop PostgreSQL trough pg_ctl or just kill the master process to > simulate a crash the pgsql resource agent correctly detects the error > and restarts PostgreSQL. > > The problem is have arises when I later call 'crm resource cleanup > pgsql' to delete the failcount and the failed tasks the pgsql resources > shows up as Stopped, but in reality it is still running fine. I'm > having the same problem when I delete the failcount separately and then > do the cleanup. > > The problem seems to be that psql_monitor runs into a timeout: > Feb 21 12:47:59 vm-db-01 crmd: [6494]: WARN: cib_action_update: > rsc_op 44: pgsql_monitor_30000 on vm-db-01 timed out > > After the timeout pgsql is being restarted, and the interesting thing > is that I can delete the failed action from the timeout without a > problem. > > Does anyone have an idea what the problem could be in this case? Not without more logs. You'd probably want to turn on 'set -x' in the resource agent to see why it can't complete.
signature.asc
Description: Message signed with OpenPGP using GPGMail
_______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
