[Pacemaker] Help with ocf:heartbeat:portblock

Errol Neal Mon, 02 Jul 2012 22:29:39 -0700

Hi All. I've been looking for examples of how to use the portblock RA and I'm 
not really finding what I need.


I have a semi-working Samba cluster using Pacemaker, CTDB, Samba and OCFS2 on 
Debian VMs. (XCP). I was previously using a clustered IP address clone (IPaddr2 
RA), but the multicast aspect introduced some weird network issues - so I'm 
trying to go with multiple unique IPs. Here is my configuration thus far:

http://pastebin.com/qSXuRfTh

I think I'm on the right track - but as I mentioned above, the cluster is 
"semi" working. It work for a short time then clients loose connectivity (in my 
lab). The errors are as follows:

root@nas1:~# crm_mon -1

<SNIP>
 Resource Group: ip1-group
     block1     (ocf::heartbeat:portblock):     Started nas1
     ip1        (ocf::heartbeat:IPaddr2):       Started nas1
     unblock1   (ocf::heartbeat:portblock):     Started nas1 (unmanaged) FAILED
 Resource Group: ip2-group
     block2     (ocf::heartbeat:portblock):     Started nas2
     ip2        (ocf::heartbeat:IPaddr2):       Started nas2
     unblock2   (ocf::heartbeat:portblock):     Started nas2 (unmanaged) FAILED
 Clone Set: dlm-clone [dlm]
     Started: [ nas1 nas2 ]
 Clone Set: o2cb-clone [o2cb]
     Started: [ nas1 nas2 ]
 Clone Set: sharedFS-clone [sharedFS]
     Started: [ nas1 nas2 ]

Failed actions:
    unblock1_stop_0 (node=nas1, call=40, rc=-2, status=Timed Out): unknown exec 
error
    unblock2_stop_0 (node=nas2, call=25, rc=-2, status=Timed Out): unknown exec 
error

===================
root@ns1:~# cat /var/log/daemon.log
<SNIP>
Jul  2 20:32:11 nas1 lrmd: [15349]: info: RA output: (unblock1:monitor:stderr) 
0 bytes (0 B) copied
Jul  2 20:32:11 nas1 lrmd: [15349]: info: RA output: (unblock1:monitor:stderr) 
, 0.019563 s, 0.0 kB/s
Jul  2 20:32:38 nas1 cib: [15347]: info: cib_stats: Processed 125 operations 
(2320.00us average, 0% utilization) in the last 10min
Jul  2 20:32:41 nas1 lrmd: [15349]: WARN: unblock1:monitor process (PID 28154) 
timed out (try 1).  Killing with signal SIGTERM (15).
<SNIP>

I haven't dug into the portblock RA script yet, but I can see that the command 
line for PID 28154  is:

dd of=/srv/samba/shares/data/tickle/172.24.100.15.new conv=fsync

The tickle dir is an OCFS2 volume. 

It seems that after that process starts, then the portblock resource fails and 
attempts to stop samba and ctdb. 

Any thoughts?



_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

[Pacemaker] Help with ocf:heartbeat:portblock

Reply via email to