Re: [Pacemaker] [Question and Problem] In vSphere5.1 environment, IO blocking of pengine occurs at the time of shared disk trouble for a long time.

renayama19661014 Tue, 14 May 2013 22:20:19 -0700

Hi Andrew,

Thank you for comments.


> > The guest located it to the shared disk.
> 
> What is on the shared disk?  The whole OS or app-specific data (i.e. nothing 
> pacemaker needs directly)?

Shared disk has all the OS and the all data.
The placement of this shared disk is similar in KVM where the problem does not 
occur.
 
 * We understand that we are different in movement in the difference of the 
hyper visor.
 * However, it seems to be necessary to evade this problem to use Pacemaker in 
vSphere5.1 environment.

Best Regards,
Hideo Yamauchi.


--- On Wed, 2013/5/15, Andrew Beekhof <and...@beekhof.net> wrote:

> 
> On 13/05/2013, at 4:14 PM, renayama19661...@ybb.ne.jp wrote:
> 
> > Hi All,
> > 
> > We constituted a simple cluster in environment of vSphere5.1.
> > 
> > We composed it of two ESXi servers and shared disk.
> > 
> > The guest located it to the shared disk.
> 
> What is on the shared disk?  The whole OS or app-specific data (i.e. nothing 
> pacemaker needs directly)?
> 
> > 
> > 
> > Step 1) Constitute a cluster.(A DC node is an active node.)
> > 
> > ============
> > Last updated: Mon May 13 14:16:09 2013
> > Stack: Heartbeat
> > Current DC: pgsr01 (85a81130-4fed-4932-ab4c-21ac2320186f) - partition with 
> > quorum
> > Version: 1.0.13-30bb726
> > 2 Nodes configured, unknown expected votes
> > 2 Resources configured.
> > ============
> > 
> > Online: [ pgsr01 pgsr02 ]
> > 
> > Resource Group: test-group
> >     Dummy1     (ocf::pacemaker:Dummy): Started pgsr01
> >     Dummy2     (ocf::pacemaker:Dummy): Started pgsr01
> > Clone Set: clnPingd
> >     Started: [ pgsr01 pgsr02 ]
> > 
> > Node Attributes:
> > * Node pgsr01:
> >    + default_ping_set                  : 100       
> > * Node pgsr02:
> >    + default_ping_set                  : 100       
> > 
> > Migration summary:
> > * Node pgsr01: 
> > * Node pgsr02: 
> > 
> > 
> > Step 2) Strace does the pengine process of the DC node.
> > 
> > [root@pgsr01 ~]# ps -ef |grep heartbeat
> > root      2072     1  0 13:56 ?        00:00:00 heartbeat: master control 
> > process
> > root      2075  2072  0 13:56 ?        00:00:00 heartbeat: FIFO reader      
> >   
> > root      2076  2072  0 13:56 ?        00:00:00 heartbeat: write: bcast 
> > eth1  
> > root      2077  2072  0 13:56 ?        00:00:00 heartbeat: read: bcast eth1 
> >   
> > root      2078  2072  0 13:56 ?        00:00:00 heartbeat: write: bcast 
> > eth2  
> > root      2079  2072  0 13:56 ?        00:00:00 heartbeat: read: bcast eth2 
> >   
> > 496       2082  2072  0 13:57 ?        00:00:00 /usr/lib64/heartbeat/ccm
> > 496       2083  2072  0 13:57 ?        00:00:00 /usr/lib64/heartbeat/cib
> > root      2084  2072  0 13:57 ?        00:00:00 /usr/lib64/heartbeat/lrmd -r
> > root      2085  2072  0 13:57 ?        00:00:00 
> > /usr/lib64/heartbeat/stonithd
> > 496       2086  2072  0 13:57 ?        00:00:00 /usr/lib64/heartbeat/attrd
> > 496       2087  2072  0 13:57 ?        00:00:00 /usr/lib64/heartbeat/crmd
> > 496       2089  2087  0 13:57 ?        00:00:00 /usr/lib64/heartbeat/pengine
> > root      2182     1  0 14:15 ?        00:00:00 /usr/lib64/heartbeat/pingd 
> > -D -p /var/run//pingd-default_ping_set -a default_ping_set -d 5s -m 100 -i 
> > 1 -h 192.168.101.254
> > root      2287  1973  0 14:16 pts/0    00:00:00 grep heartbea
> > 
> > [root@pgsr01 ~]# strace -p 2089
> > Process 2089 attached - interrupt to quit
> > restart_syscall(<... resuming interrupted call ...>) = 0
> > times({tms_utime=5, tms_stime=6, tms_cutime=0, tms_cstime=0}) = 429527557
> > recvfrom(5, 0xa93ff7, 953, 64, 0, 0)    = -1 EAGAIN (Resource temporarily 
> > unavailable)
> > poll([{fd=5, events=0}], 1, 0)          = 0 (Timeout)
> > recvfrom(5, 0xa93ff7, 953, 64, 0, 0)    = -1 EAGAIN (Resource temporarily 
> > unavailable)
> > poll([{fd=5, events=0}], 1, 0)          = 0 (Timeout)
> > (snip)
> > 
> > 
> > Step 3) Disconnect the shared disk which an active node was placed.
> > 
> > Step 4) Cut off pingd of the standby node. 
> >        The score of pingd is reflected definitely, but handling of pengine 
> >blocks it.
> > 
> > ~ # esxcfg-vswitch -N vmnic1 -p "ap-db" vSwitch1
> > ~ # esxcfg-vswitch -N vmnic2 -p "ap-db" vSwitch1
> > 
> > 
> > (snip)
> > brk(0xd05000)                           = 0xd05000
> > brk(0xeed000)                           = 0xeed000
> > brk(0xf2d000)                           = 0xf2d000
> > fstat(6, {st_mode=S_IFREG|0600, st_size=0, ...}) = 0
> > mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 
> > 0x7f86a255a000
> > write(6, "BZh51AY&SY\327\373\370\203\0\t(_\200UPX\3\377\377%cT 
> > \277\377\377"..., 2243) = 2243
> > brk(0xb1d000)                           = 0xb1d000
> > fsync(6                                ------------------------------> 
> > BLOCKED
> > (snip)
> > 
> > 
> > ============
> > Last updated: Mon May 13 14:19:15 2013
> > Stack: Heartbeat
> > Current DC: pgsr01 (85a81130-4fed-4932-ab4c-21ac2320186f) - partition with 
> > quorum
> > Version: 1.0.13-30bb726
> > 2 Nodes configured, unknown expected votes
> > 2 Resources configured.
> > ============
> > 
> > Online: [ pgsr01 pgsr02 ]
> > 
> > Resource Group: test-group
> >     Dummy1     (ocf::pacemaker:Dummy): Started pgsr01
> >     Dummy2     (ocf::pacemaker:Dummy): Started pgsr01
> > Clone Set: clnPingd
> >     Started: [ pgsr01 pgsr02 ]
> > 
> > Node Attributes:
> > * Node pgsr01:
> >    + default_ping_set                  : 100       
> > * Node pgsr02:
> >    + default_ping_set                  : 0             : Connectivity is 
> >lost
> > 
> > Migration summary:
> > * Node pgsr01: 
> > * Node pgsr02: 
> > 
> > 
> > Step 4) Reconnect communication of pingd of the standby node.
> >        The score of pingd is reflected definitely, but handling of pengine 
> >blocks it.
> > 
> > 
> > ~ # esxcfg-vswitch -M vmnic1 -p "ap-db" vSwitch1
> > ~ # esxcfg-vswitch -M vmnic2 -p "ap-db" vSwitch1
> > 
> > ============
> > Last updated: Mon May 13 14:19:40 2013
> > Stack: Heartbeat
> > Current DC: pgsr01 (85a81130-4fed-4932-ab4c-21ac2320186f) - partition with 
> > quorum
> > Version: 1.0.13-30bb726
> > 2 Nodes configured, unknown expected votes
> > 2 Resources configured.
> > ============
> > 
> > Online: [ pgsr01 pgsr02 ]
> > 
> > Resource Group: test-group
> >     Dummy1     (ocf::pacemaker:Dummy): Started pgsr01
> >     Dummy2     (ocf::pacemaker:Dummy): Started pgsr01
> > Clone Set: clnPingd
> >     Started: [ pgsr01 pgsr02 ]
> > 
> > Node Attributes:
> > * Node pgsr01:
> >    + default_ping_set                  : 100       
> > * Node pgsr02:
> >    + default_ping_set                  : 100       
> > 
> > Migration summary:
> > * Node pgsr01: 
> > * Node pgsr02: 
> > 
> > 
> > --------- A block state of pengine continues -----
> > 
> > Step 5) Cut off pingd of the active node. 
> >        The score of pingd is reflected definitely, but handling of pengine 
> >blocks it.
> > 
> > 
> > ~ # esxcfg-vswitch -N vmnic1 -p "ap-db" vSwitch1
> > ~ # esxcfg-vswitch -N vmnic2 -p "ap-db" vSwitch1
> > 
> > 
> > ============
> > Last updated: Mon May 13 14:20:32 2013
> > Stack: Heartbeat
> > Current DC: pgsr01 (85a81130-4fed-4932-ab4c-21ac2320186f) - partition with 
> > quorum
> > Version: 1.0.13-30bb726
> > 2 Nodes configured, unknown expected votes
> > 2 Resources configured.
> > ============
> > 
> > Online: [ pgsr01 pgsr02 ]
> > 
> > Resource Group: test-group
> >     Dummy1     (ocf::pacemaker:Dummy): Started pgsr01
> >     Dummy2     (ocf::pacemaker:Dummy): Started pgsr01
> > Clone Set: clnPingd
> >     Started: [ pgsr01 pgsr02 ]
> > 
> > Node Attributes:
> > * Node pgsr01:
> >    + default_ping_set                  : 0             : Connectivity is 
> >lost
> > * Node pgsr02:
> >    + default_ping_set                  : 100       
> > 
> > Migration summary:
> > * Node pgsr01: 
> > * Node pgsr02: 
> > 
> > --------- A block state of pengine continues -----
> > 
> > 
> > After that the movement to the standby node of the resource does not happen 
> > because in condition transition is not made because a block state of 
> > pengine continues.
> > In the vSphere environment, time considerably passes, and blocking is 
> > canceled, and transition is generated.
> > * The IO blocking of pengine seems to occur repeatedly
> > * Other processes may be blocked, too.
> > * It took it from trouble to FO completion more than one hour.
> > 
> > This problem shows that resource movement may not occur after disk trouble 
> > in vSphere environment.
> > 
> > Because our user thinks that I use Pacemaker in vSphere environment, the 
> > solution to this problem is necessary.
> > 
> > Do not you know the example which solved a similar problem on vSphere?
> > 
> > We think that it is necessary to evade a block of pengine if there is not a 
> > solution example.
> > 
> > For example...
> > 1. crmd watches a request to pengine with a timer...
> > 2. pengine writes in it with a timer and watches processing....
> > ..etc...
> > 
> > * This problem does not seem to occur in KVM.
> > * There is the possibility of the difference of the hyper visor.
> > * In addition, even an actual machine of Linux did not generate the problem.
> > 
> > 
> > Best Regards,
> > Hideo Yamauchi.
> > 
> > 
> > 
> > _______________________________________________
> > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> > 
> > Project Home: http://www.clusterlabs.org
> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs: http://bugs.clusterlabs.org
> 
> 

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] [Question and Problem] In vSphere5.1 environment, IO blocking of pengine occurs at the time of shared disk trouble for a long time.

Reply via email to