On 21 Feb 2014, at 10:55 pm, Lukas Grossar <[email protected]> 
wrote:

> Hi
> 
> I'm currently building a 2 node DRBD backed PostgreSQL on Debian Wheezy
> and I'm testing how Pacemaker reacts to specific failure scenarios.
> 
> One thing I did test that currently drives me crazy is when I manually
> stop PostgreSQL trough pg_ctl or just kill the master process to
> simulate a crash the pgsql resource agent correctly detects the error
> and restarts PostgreSQL.
> 
> The problem is have arises when I later call 'crm resource cleanup
> pgsql' to delete the failcount and the failed tasks the pgsql resources
> shows up as Stopped, but in reality it is still running fine. I'm
> having the same problem when I delete the failcount separately and then
> do the cleanup.
> 
> The problem seems to be that psql_monitor runs into a timeout:
> Feb 21 12:47:59 vm-db-01 crmd: [6494]: WARN: cib_action_update:
> rsc_op 44: pgsql_monitor_30000 on vm-db-01 timed out
> 
> After the timeout pgsql is being restarted, and the interesting thing
> is that I can delete the failed action from the timeout without a
> problem.
> 
> Does anyone have an idea what the problem could be in this case?

Not without more logs. You'd probably want to turn on 'set -x' in the resource 
agent to see why it can't complete.

Attachment: signature.asc
Description: Message signed with OpenPGP using GPGMail

_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to