[ 
https://issues.apache.org/jira/browse/FLINK-28078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17558345#comment-17558345
 ] 

Matthias Pohl commented on FLINK-28078:
---------------------------------------

I did a comparison of a successful run and the failed run. I'm getting the 
feeling that we're running into some race condition on the {{LeaderLatch}} 
implementation side.

Here are the logs of a successful run starting from the log message where the 
latch zNode is checked for existence after being created:
{code:java}
1376 [SyncThread:0] DEBUG org.apache.zookeeper.server.FinalRequestProcessor [] 
- Processing request:: sessionid:0x100009ef1b30000 type:exists cxid:0x1b 
zxid:0xfffffffffffffffe txntype:unknown reqpath:/flink/default/latch
1376 [SyncThread:0] DEBUG org.apache.zookeeper.server.FinalRequestProcessor [] 
- sessionid:0x100009ef1b30000 type:exists cxid:0x1b zxid:0xfffffffffffffffe 
txntype:unknown reqpath:/flink/default/latch
1377 [SyncThread:0] DEBUG org.apache.zookeeper.server.FinalRequestProcessor [] 
- Processing request:: sessionid:0x100009ef1b30000 type:createContainer 
cxid:0x1c zxid:0xa txntype:19 reqpath:n/a
1378 [SyncThread:0] DEBUG org.apache.zookeeper.server.FinalRequestProcessor [] 
- sessionid:0x100009ef1b30000 type:createContainer cxid:0x1c zxid:0xa 
txntype:19 reqpath:n/a
1380 [SyncThread:0] DEBUG org.apache.zookeeper.server.FinalRequestProcessor [] 
- Processing request:: sessionid:0x100009ef1b30000 type:getChildren2 cxid:0x1d 
zxid:0xfffffffffffffffe txntype:unknown reqpath:/flink/default
1380 [SyncThread:0] DEBUG org.apache.zookeeper.server.FinalRequestProcessor [] 
- sessionid:0x100009ef1b30000 type:getChildren2 cxid:0x1d 
zxid:0xfffffffffffffffe txntype:unknown reqpath:/flink/default
1380 [SyncThread:0] DEBUG org.apache.zookeeper.server.FinalRequestProcessor [] 
- Processing request:: sessionid:0x100009ef1b30000 type:getChildren2 cxid:0x1e 
zxid:0xfffffffffffffffe txntype:unknown reqpath:/flink/default
1380 [SyncThread:0] DEBUG org.apache.zookeeper.server.FinalRequestProcessor [] 
- sessionid:0x100009ef1b30000 type:getChildren2 cxid:0x1e 
zxid:0xfffffffffffffffe txntype:unknown reqpath:/flink/default
1380 [SyncThread:0] DEBUG org.apache.zookeeper.server.FinalRequestProcessor [] 
- Processing request:: sessionid:0x100009ef1b30000 type:getChildren2 cxid:0x1f 
zxid:0xfffffffffffffffe txntype:unknown reqpath:/flink/default
1380 [SyncThread:0] DEBUG org.apache.zookeeper.server.FinalRequestProcessor [] 
- sessionid:0x100009ef1b30000 type:getChildren2 cxid:0x1f 
zxid:0xfffffffffffffffe txntype:unknown reqpath:/flink/default
1381 [SyncThread:0] DEBUG org.apache.zookeeper.server.FinalRequestProcessor [] 
- Processing request:: sessionid:0x100009ef1b30000 type:getChildren cxid:0x20 
zxid:0xfffffffffffffffe txntype:unknown reqpath:/flink/default/latch
1381 [SyncThread:0] DEBUG org.apache.zookeeper.server.FinalRequestProcessor [] 
- sessionid:0x100009ef1b30000 type:getChildren cxid:0x20 
zxid:0xfffffffffffffffe txntype:unknown reqpath:/flink/default/latch
1385 [SyncThread:0] DEBUG org.apache.zookeeper.server.FinalRequestProcessor [] 
- Processing request:: sessionid:0x100009ef1b30000 type:create2 cxid:0x21 
zxid:0xb txntype:15 reqpath:n/a
1385 [SyncThread:0] DEBUG org.apache.zookeeper.server.FinalRequestProcessor [] 
- sessionid:0x100009ef1b30000 type:create2 cxid:0x21 zxid:0xb txntype:15 
reqpath:n/a
1385 [SyncThread:0] DEBUG org.apache.zookeeper.server.FinalRequestProcessor [] 
- Processing request:: sessionid:0x100009ef1b30000 type:exists cxid:0x22 
zxid:0xfffffffffffffffe txntype:unknown reqpath:/flink
1385 [SyncThread:0] DEBUG org.apache.zookeeper.server.FinalRequestProcessor [] 
- sessionid:0x100009ef1b30000 type:exists cxid:0x22 zxid:0xfffffffffffffffe 
txntype:unknown reqpath:/flink
1386 [SyncThread:0] DEBUG org.apache.zookeeper.server.FinalRequestProcessor [] 
- Processing request:: sessionid:0x100009ef1b30000 type:exists cxid:0x23 
zxid:0xfffffffffffffffe txntype:unknown reqpath:/flink/default
1386 [SyncThread:0] DEBUG org.apache.zookeeper.server.FinalRequestProcessor [] 
- sessionid:0x100009ef1b30000 type:exists cxid:0x23 zxid:0xfffffffffffffffe 
txntype:unknown reqpath:/flink/default
1387 [SyncThread:0] DEBUG org.apache.zookeeper.server.FinalRequestProcessor [] 
- Processing request:: sessionid:0x100009ef1b30000 type:getChildren2 cxid:0x24 
zxid:0xfffffffffffffffe txntype:unknown reqpath:/flink/default/latch
1387 [SyncThread:0] DEBUG org.apache.zookeeper.server.FinalRequestProcessor [] 
- sessionid:0x100009ef1b30000 type:getChildren2 cxid:0x24 
zxid:0xfffffffffffffffe txntype:unknown reqpath:/flink/default/latch
1387 [SyncThread:0] DEBUG org.apache.zookeeper.server.FinalRequestProcessor [] 
- Processing request:: sessionid:0x100009ef1b30000 type:exists cxid:0x25 
zxid:0xfffffffffffffffe txntype:unknown reqpath:/flink/default/latch
1387 [SyncThread:0] DEBUG org.apache.zookeeper.server.FinalRequestProcessor [] 
- sessionid:0x100009ef1b30000 type:exists cxid:0x25 zxid:0xfffffffffffffffe 
txntype:unknown reqpath:/flink/default/latch
1388 [SyncThread:0] DEBUG org.apache.zookeeper.server.FinalRequestProcessor [] 
- Processing request:: sessionid:0x100009ef1b30000 type:getChildren cxid:0x26 
zxid:0xfffffffffffffffe txntype:unknown reqpath:/flink/default/latch
1388 [SyncThread:0] DEBUG org.apache.zookeeper.server.FinalRequestProcessor [] 
- sessionid:0x100009ef1b30000 type:getChildren cxid:0x26 
zxid:0xfffffffffffffffe txntype:unknown reqpath:/flink/default/latch
1390 [SyncThread:0] DEBUG org.apache.zookeeper.server.FinalRequestProcessor [] 
- Processing request:: sessionid:0x100009ef1b30000 type:create2 cxid:0x27 
zxid:0xc txntype:15 reqpath:n/a
1391 [SyncThread:0] DEBUG org.apache.zookeeper.server.FinalRequestProcessor [] 
- sessionid:0x100009ef1b30000 type:create2 cxid:0x27 zxid:0xc txntype:15 
reqpath:n/a
1391 [SyncThread:0] DEBUG org.apache.zookeeper.server.FinalRequestProcessor [] 
- Processing request:: sessionid:0x100009ef1b30000 type:exists cxid:0x28 
zxid:0xfffffffffffffffe txntype:unknown reqpath:/flink
1391 [SyncThread:0] DEBUG org.apache.zookeeper.server.FinalRequestProcessor [] 
- sessionid:0x100009ef1b30000 type:exists cxid:0x28 zxid:0xfffffffffffffffe 
txntype:unknown reqpath:/flink
1391 [SyncThread:0] DEBUG org.apache.zookeeper.server.FinalRequestProcessor [] 
- Processing request:: sessionid:0x100009ef1b30000 type:exists cxid:0x29 
zxid:0xfffffffffffffffe txntype:unknown reqpath:/flink/default
1391 [SyncThread:0] DEBUG org.apache.zookeeper.server.FinalRequestProcessor [] 
- sessionid:0x100009ef1b30000 type:exists cxid:0x29 zxid:0xfffffffffffffffe 
txntype:unknown reqpath:/flink/default
1392 [ForkJoinPool-1-worker-9-EventThread] DEBUG 
org.apache.flink.runtime.leaderelection.ZooKeeperMultipleComponentLeaderElectionDriver
 [] - ZooKeeperMultipleComponentLeaderElectionDriver obtained the leadership.
1392 [SyncThread:0] DEBUG org.apache.zookeeper.server.FinalRequestProcessor [] 
- Processing request:: sessionid:0x100009ef1b30000 type:getChildren2 cxid:0x2a 
zxid:0xfffffffffffffffe txntype:unknown reqpath:/flink/default/latch
1392 [SyncThread:0] DEBUG org.apache.zookeeper.server.FinalRequestProcessor [] 
- sessionid:0x100009ef1b30000 type:getChildren2 cxid:0x2a 
zxid:0xfffffffffffffffe txntype:unknown reqpath:/flink/default/latch
1393 [SyncThread:0] DEBUG org.apache.zookeeper.server.FinalRequestProcessor [] 
- Processing request:: sessionid:0x100009ef1b30000 type:exists cxid:0x2b 
zxid:0xfffffffffffffffe txntype:unknown reqpath:/flink/default/latch
1393 [SyncThread:0] DEBUG org.apache.zookeeper.server.FinalRequestProcessor [] 
- sessionid:0x100009ef1b30000 type:exists cxid:0x2b zxid:0xfffffffffffffffe 
txntype:unknown reqpath:/flink/default/latch
1393 [SyncThread:0] DEBUG org.apache.zookeeper.server.FinalRequestProcessor [] 
- Processing request:: sessionid:0x100009ef1b30000 type:getChildren cxid:0x2c 
zxid:0xfffffffffffffffe txntype:unknown reqpath:/flink/default/latch
1393 [SyncThread:0] DEBUG org.apache.zookeeper.server.FinalRequestProcessor [] 
- sessionid:0x100009ef1b30000 type:getChildren cxid:0x2c 
zxid:0xfffffffffffffffe txntype:unknown reqpath:/flink/default/latch
1394 [SyncThread:0] DEBUG org.apache.zookeeper.server.FinalRequestProcessor [] 
- Processing request:: sessionid:0x100009ef1b30000 type:create2 cxid:0x2d 
zxid:0xd txntype:15 reqpath:n/a
1395 [SyncThread:0] DEBUG org.apache.zookeeper.server.FinalRequestProcessor [] 
- sessionid:0x100009ef1b30000 type:create2 cxid:0x2d zxid:0xd txntype:15 
reqpath:n/a
1421 [SyncThread:0] DEBUG org.apache.zookeeper.server.FinalRequestProcessor [] 
- Processing request:: sessionid:0x100009ef1b30000 type:getData cxid:0x2e 
zxid:0xfffffffffffffffe txntype:unknown 
reqpath:/flink/default/latch/_c_2b57761c-31e0-458d-9466-b409010d3d14-latch-0000000000
1421 [SyncThread:0] DEBUG org.apache.zookeeper.server.FinalRequestProcessor [] 
- sessionid:0x100009ef1b30000 type:getData cxid:0x2e zxid:0xfffffffffffffffe 
txntype:unknown 
reqpath:/flink/default/latch/_c_2b57761c-31e0-458d-9466-b409010d3d14-latch-0000000000
1421 [SyncThread:0] DEBUG org.apache.zookeeper.server.FinalRequestProcessor [] 
- Processing request:: sessionid:0x100009ef1b30000 type:getChildren2 cxid:0x2f 
zxid:0xfffffffffffffffe txntype:unknown reqpath:/flink/default/latch
1421 [SyncThread:0] DEBUG org.apache.zookeeper.server.FinalRequestProcessor [] 
- sessionid:0x100009ef1b30000 type:getChildren2 cxid:0x2f 
zxid:0xfffffffffffffffe txntype:unknown reqpath:/flink/default/latch
1422 [SyncThread:0] DEBUG org.apache.zookeeper.server.FinalRequestProcessor [] 
- Processing request:: sessionid:0x100009ef1b30000 type:getData cxid:0x30 
zxid:0xfffffffffffffffe txntype:unknown 
reqpath:/flink/default/latch/_c_1e836e28-a9d1-4bea-97f3-222034044247-latch-0000000001
1422 [SyncThread:0] DEBUG org.apache.zookeeper.server.FinalRequestProcessor [] 
- sessionid:0x100009ef1b30000 type:getData cxid:0x30 zxid:0xfffffffffffffffe 
txntype:unknown 
reqpath:/flink/default/latch/_c_1e836e28-a9d1-4bea-97f3-222034044247-latch-0000000001
1497 [ForkJoinPool-1-worker-9] INFO  
org.apache.flink.runtime.leaderelection.ZooKeeperMultipleComponentLeaderElectionDriver
 [] - Closing ZooKeeperMultipleComponentLeaderElectionDriver.
1501 [SyncThread:0] DEBUG org.apache.zookeeper.server.FinalRequestProcessor [] 
- Processing request:: sessionid:0x100009ef1b30000 type:checkWatches cxid:0x31 
zxid:0xfffffffffffffffe txntype:unknown reqpath:/flink/default
1501 [SyncThread:0] DEBUG org.apache.zookeeper.server.FinalRequestProcessor [] 
- sessionid:0x100009ef1b30000 type:checkWatches cxid:0x31 
zxid:0xfffffffffffffffe txntype:unknown reqpath:/flink/default
1505 [SyncThread:0] DEBUG org.apache.zookeeper.server.FinalRequestProcessor [] 
- Processing request:: sessionid:0x100009ef1b30000 type:delete cxid:0x32 
zxid:0xe txntype:2 reqpath:n/a
1506 [SyncThread:0] DEBUG org.apache.zookeeper.server.FinalRequestProcessor [] 
- sessionid:0x100009ef1b30000 type:delete cxid:0x32 zxid:0xe txntype:2 
reqpath:n/a
1506 [SyncThread:0] DEBUG org.apache.zookeeper.server.FinalRequestProcessor [] 
- Processing request:: sessionid:0x100009ef1b30000 type:getChildren2 cxid:0x33 
zxid:0xfffffffffffffffe txntype:unknown reqpath:/flink/default/latch
1506 [SyncThread:0] DEBUG org.apache.zookeeper.server.FinalRequestProcessor [] 
- sessionid:0x100009ef1b30000 type:getChildren2 cxid:0x33 
zxid:0xfffffffffffffffe txntype:unknown reqpath:/flink/default/latch
1507 [ForkJoinPool-1-worker-9-EventThread] DEBUG 
org.apache.flink.runtime.leaderelection.ZooKeeperMultipleComponentLeaderElectionDriver
 [] - ZooKeeperMultipleComponentLeaderElectionDriver obtained the leadership.
1507 [ForkJoinPool-1-worker-9] INFO  
org.apache.flink.runtime.leaderelection.ZooKeeperMultipleComponentLeaderElectionDriver
 [] - Closing ZooKeeperMultipleComponentLeaderElectionDriver.
1508 [SyncThread:0] DEBUG org.apache.zookeeper.server.FinalRequestProcessor [] 
- Processing request:: sessionid:0x100009ef1b30000 type:checkWatches cxid:0x34 
zxid:0xfffffffffffffffe txntype:unknown reqpath:/flink/default
1508 [SyncThread:0] DEBUG org.apache.zookeeper.server.FinalRequestProcessor [] 
- sessionid:0x100009ef1b30000 type:checkWatches cxid:0x34 
zxid:0xfffffffffffffffe txntype:unknown reqpath:/flink/default
1508 [SyncThread:0] DEBUG org.apache.zookeeper.server.FinalRequestProcessor [] 
- Processing request:: sessionid:0x100009ef1b30000 type:delete cxid:0x35 
zxid:0xf txntype:2 reqpath:n/a
1508 [SyncThread:0] DEBUG org.apache.zookeeper.server.FinalRequestProcessor [] 
- sessionid:0x100009ef1b30000 type:delete cxid:0x35 zxid:0xf txntype:2 
reqpath:n/a
1509 [SyncThread:0] DEBUG org.apache.zookeeper.server.FinalRequestProcessor [] 
- Processing request:: sessionid:0x100009ef1b30000 type:getChildren2 cxid:0x36 
zxid:0xfffffffffffffffe txntype:unknown reqpath:/flink/default/latch
1509 [SyncThread:0] DEBUG org.apache.zookeeper.server.FinalRequestProcessor [] 
- sessionid:0x100009ef1b30000 type:getChildren2 cxid:0x36 
zxid:0xfffffffffffffffe txntype:unknown reqpath:/flink/default/latch
1509 [ForkJoinPool-1-worker-9-EventThread] DEBUG 
org.apache.flink.runtime.leaderelection.ZooKeeperMultipleComponentLeaderElectionDriver
 [] - ZooKeeperMultipleComponentLeaderElectionDriver obtained the leadership.
1510 [ForkJoinPool-1-worker-9] INFO  
org.apache.flink.runtime.leaderelection.ZooKeeperMultipleComponentLeaderElectionDriver
 [] - Closing ZooKeeperMultipleComponentLeaderElectionDriver.
1510 [SyncThread:0] DEBUG org.apache.zookeeper.server.FinalRequestProcessor [] 
- Processing request:: sessionid:0x100009ef1b30000 type:checkWatches cxid:0x37 
zxid:0xfffffffffffffffe txntype:unknown reqpath:/flink/default
1510 [SyncThread:0] DEBUG org.apache.zookeeper.server.FinalRequestProcessor [] 
- sessionid:0x100009ef1b30000 type:checkWatches cxid:0x37 
zxid:0xfffffffffffffffe txntype:unknown reqpath:/flink/default
1510 [SyncThread:0] DEBUG org.apache.zookeeper.server.FinalRequestProcessor [] 
- Processing request:: sessionid:0x100009ef1b30000 type:delete cxid:0x38 
zxid:0x10 txntype:2 reqpath:n/a
1511 [SyncThread:0] DEBUG org.apache.zookeeper.server.FinalRequestProcessor [] 
- sessionid:0x100009ef1b30000 type:delete cxid:0x38 zxid:0x10 txntype:2 
reqpath:n/a
1511 [SyncThread:0] DEBUG org.apache.zookeeper.server.FinalRequestProcessor [] 
- Processing request:: sessionid:0x100009ef1b30000 type:checkWatches cxid:0x39 
zxid:0xfffffffffffffffe txntype:unknown reqpath:/zookeeper/config
1511 [SyncThread:0] DEBUG org.apache.zookeeper.server.FinalRequestProcessor [] 
- sessionid:0x100009ef1b30000 type:checkWatches cxid:0x39 
zxid:0xfffffffffffffffe txntype:unknown reqpath:/zookeeper/config
1513 [SyncThread:0] DEBUG org.apache.zookeeper.server.FinalRequestProcessor [] 
- Processing request:: sessionid:0x100009ef1b30000 type:closeSession cxid:0x3a 
zxid:0x11 txntype:-11 reqpath:n/a
1513 [SyncThread:0] DEBUG org.apache.zookeeper.server.FinalRequestProcessor [] 
- sessionid:0x100009ef1b30000 type:closeSession cxid:0x3a zxid:0x11 txntype:-11 
reqpath:n/a
{code}
and here are the merged (i.e. I integrated the test-sides logs into the 
zookeeper-server logs based on the time) logs:
{code:java}
16:17:07,824 [Curator-ConnectionStateManager-0] DEBUG 
org.apache.flink.runtime.leaderelection.ZooKeeperMultipleComponentLeaderElectionDriver
 [] - Connected to ZooKeeper quorum. Leader election can start.
16:17:07,824 [Curator-ConnectionStateManager-0] DEBUG 
org.apache.flink.runtime.leaderelection.ZooKeeperMultipleComponentLeaderElectionDriver
 [] - Connected to ZooKeeper quorum. Leader election can start.
16:17:07,829 [        SyncThread:0] DEBUG 
org.apache.zookeeper.server.FinalRequestProcessor            [] - Processing 
request:: sessionid:0x100cf6d9cf60000 type:exists cxid:0x10 
zxid:0xfffffffffffffffe txntype:unknown reqpath:/flink/default/latch
16:17:07,829 [        SyncThread:0] DEBUG 
org.apache.zookeeper.server.FinalRequestProcessor            [] - 
sessionid:0x100cf6d9cf60000 type:exists cxid:0x10 zxid:0xfffffffffffffffe 
txntype:unknown reqpath:/flink/default/latch
16:17:07,831 [        SyncThread:0] DEBUG 
org.apache.zookeeper.server.FinalRequestProcessor            [] - Processing 
request:: sessionid:0x100cf6d9cf60000 type:getData cxid:0x11 
zxid:0xfffffffffffffffe txntype:unknown reqpath:/flink/default
16:17:07,832 [        SyncThread:0] DEBUG 
org.apache.zookeeper.server.FinalRequestProcessor            [] - 
sessionid:0x100cf6d9cf60000 type:getData cxid:0x11 zxid:0xfffffffffffffffe 
txntype:unknown reqpath:/flink/default
16:17:07,837 [        SyncThread:0] DEBUG 
org.apache.zookeeper.server.FinalRequestProcessor            [] - Processing 
request:: sessionid:0x100cf6d9cf60000 type:getChildren2 cxid:0x12 
zxid:0xfffffffffffffffe txntype:unknown reqpath:/flink/default
16:17:07,837 [        SyncThread:0] DEBUG 
org.apache.zookeeper.server.FinalRequestProcessor            [] - 
sessionid:0x100cf6d9cf60000 type:getChildren2 cxid:0x12 zxid:0xfffffffffffffffe 
txntype:unknown reqpath:/flink/default
16:17:07,838 [        SyncThread:0] DEBUG 
org.apache.zookeeper.server.FinalRequestProcessor            [] - Processing 
request:: sessionid:0x100cf6d9cf60000 type:createContainer cxid:0x13 zxid:0x7 
txntype:19 reqpath:n/a
16:17:07,838 [        SyncThread:0] DEBUG 
org.apache.zookeeper.server.FinalRequestProcessor            [] - 
sessionid:0x100cf6d9cf60000 type:createContainer cxid:0x13 zxid:0x7 txntype:19 
reqpath:n/a
16:17:07,839 [        SyncThread:0] DEBUG 
org.apache.zookeeper.server.FinalRequestProcessor            [] - Processing 
request:: sessionid:0x100cf6d9cf60000 type:create2 cxid:0x14 zxid:0x8 
txntype:15 reqpath:n/a
16:17:07,839 [        SyncThread:0] DEBUG 
org.apache.zookeeper.server.FinalRequestProcessor            [] - 
sessionid:0x100cf6d9cf60000 type:create2 cxid:0x14 zxid:0x8 txntype:15 
reqpath:n/a
16:17:07,839 [        SyncThread:0] DEBUG 
org.apache.zookeeper.server.FinalRequestProcessor            [] - Processing 
request:: sessionid:0x100cf6d9cf60000 type:getData cxid:0x15 
zxid:0xfffffffffffffffe txntype:unknown reqpath:/flink/default
16:17:07,839 [        SyncThread:0] DEBUG 
org.apache.zookeeper.server.FinalRequestProcessor            [] - 
sessionid:0x100cf6d9cf60000 type:getData cxid:0x15 zxid:0xfffffffffffffffe 
txntype:unknown reqpath:/flink/default
16:17:07,839 [        SyncThread:0] DEBUG 
org.apache.zookeeper.server.FinalRequestProcessor            [] - Processing 
request:: sessionid:0x100cf6d9cf60000 type:getChildren2 cxid:0x16 
zxid:0xfffffffffffffffe txntype:unknown reqpath:/flink/default
16:17:07,839 [        SyncThread:0] DEBUG 
org.apache.zookeeper.server.FinalRequestProcessor            [] - 
sessionid:0x100cf6d9cf60000 type:getChildren2 cxid:0x16 zxid:0xfffffffffffffffe 
txntype:unknown reqpath:/flink/default
16:17:07,846 [        SyncThread:0] DEBUG 
org.apache.zookeeper.server.FinalRequestProcessor            [] - Processing 
request:: sessionid:0x100cf6d9cf60000 type:create2 cxid:0x17 zxid:0x9 
txntype:15 reqpath:n/a
16:17:07,846 [        SyncThread:0] DEBUG 
org.apache.zookeeper.server.FinalRequestProcessor            [] - 
sessionid:0x100cf6d9cf60000 type:create2 cxid:0x17 zxid:0x9 txntype:15 
reqpath:n/a
16:17:07,846 [        SyncThread:0] DEBUG 
org.apache.zookeeper.server.FinalRequestProcessor            [] - Processing 
request:: sessionid:0x100cf6d9cf60000 type:getChildren cxid:0x18 
zxid:0xfffffffffffffffe txntype:unknown reqpath:/flink/default/latch
16:17:07,846 [        SyncThread:0] DEBUG 
org.apache.zookeeper.server.FinalRequestProcessor            [] - 
sessionid:0x100cf6d9cf60000 type:getChildren cxid:0x18 zxid:0xfffffffffffffffe 
txntype:unknown reqpath:/flink/default/latch
16:17:07,846 [        SyncThread:0] DEBUG 
org.apache.zookeeper.server.FinalRequestProcessor            [] - Processing 
request:: sessionid:0x100cf6d9cf60000 type:getChildren2 cxid:0x19 
zxid:0xfffffffffffffffe txntype:unknown reqpath:/flink/default
16:17:07,846 [        SyncThread:0] DEBUG 
org.apache.zookeeper.server.FinalRequestProcessor            [] - 
sessionid:0x100cf6d9cf60000 type:getChildren2 cxid:0x19 zxid:0xfffffffffffffffe 
txntype:unknown reqpath:/flink/default
16:17:07,846 [        SyncThread:0] DEBUG 
org.apache.zookeeper.server.FinalRequestProcessor            [] - Processing 
request:: sessionid:0x100cf6d9cf60000 type:getChildren2 cxid:0x1a 
zxid:0xfffffffffffffffe txntype:unknown reqpath:/flink/default
16:17:07,846 [        SyncThread:0] DEBUG 
org.apache.zookeeper.server.FinalRequestProcessor            [] - 
sessionid:0x100cf6d9cf60000 type:getChildren2 cxid:0x1a zxid:0xfffffffffffffffe 
txntype:unknown reqpath:/flink/default
16:17:07,846 [        SyncThread:0] DEBUG 
org.apache.zookeeper.server.FinalRequestProcessor            [] - Processing 
request:: sessionid:0x100cf6d9cf60000 type:getChildren2 cxid:0x1b 
zxid:0xfffffffffffffffe txntype:unknown reqpath:/flink/default/latch
16:17:07,846 [        SyncThread:0] DEBUG 
org.apache.zookeeper.server.FinalRequestProcessor            [] - 
sessionid:0x100cf6d9cf60000 type:getChildren2 cxid:0x1b zxid:0xfffffffffffffffe 
txntype:unknown reqpath:/flink/default/latch
16:17:07,848 [ForkJoinPool-45-worker-25-EventThread] DEBUG 
org.apache.flink.runtime.leaderelection.ZooKeeperMultipleComponentLeaderElectionDriver
 [] - ZooKeeperMultipleComponentLeaderElectionDriver obtained the leadership.
16:17:07,854 [        SyncThread:0] DEBUG 
org.apache.zookeeper.server.FinalRequestProcessor            [] - Processing 
request:: sessionid:0x100cf6d9cf60000 type:getChildren2 cxid:0x1c 
zxid:0xfffffffffffffffe txntype:unknown reqpath:/flink/default/latch
16:17:07,854 [        SyncThread:0] DEBUG 
org.apache.zookeeper.server.FinalRequestProcessor            [] - 
sessionid:0x100cf6d9cf60000 type:getChildren2 cxid:0x1c zxid:0xfffffffffffffffe 
txntype:unknown reqpath:/flink/default/latch
16:17:07,848 [ForkJoinPool-45-worker-25-EventThread] DEBUG 
org.apache.flink.runtime.leaderelection.ZooKeeperMultipleComponentLeaderElectionDriver
 [] - ZooKeeperMultipleComponentLeaderElectionDriver obtained the leadership.
16:17:07,860 [ForkJoinPool-45-worker-25] INFO  
org.apache.flink.runtime.leaderelection.ZooKeeperMultipleComponentLeaderElectionDriver
 [] - Closing ZooKeeperMultipleComponentLeaderElectionDriver.
16:17:07,860 [        SyncThread:0] DEBUG 
org.apache.zookeeper.server.FinalRequestProcessor            [] - Processing 
request:: sessionid:0x100cf6d9cf60000 type:create2 cxid:0x1d zxid:0xa 
txntype:15 reqpath:n/a
16:17:07,860 [        SyncThread:0] DEBUG 
org.apache.zookeeper.server.FinalRequestProcessor            [] - 
sessionid:0x100cf6d9cf60000 type:create2 cxid:0x1d zxid:0xa txntype:15 
reqpath:n/a
16:17:07,862 [        SyncThread:0] DEBUG 
org.apache.zookeeper.server.FinalRequestProcessor            [] - Processing 
request:: sessionid:0x100cf6d9cf60000 type:checkWatches cxid:0x1e 
zxid:0xfffffffffffffffe txntype:unknown reqpath:/flink/default
16:17:07,862 [        SyncThread:0] DEBUG 
org.apache.zookeeper.server.FinalRequestProcessor            [] - 
sessionid:0x100cf6d9cf60000 type:checkWatches cxid:0x1e zxid:0xfffffffffffffffe 
txntype:unknown reqpath:/flink/default
16:17:07,863 [        SyncThread:0] DEBUG 
org.apache.zookeeper.server.FinalRequestProcessor            [] - Processing 
request:: sessionid:0x100cf6d9cf60000 type:delete cxid:0x1f zxid:0xb txntype:2 
reqpath:n/a
16:17:07,863 [        SyncThread:0] DEBUG 
org.apache.zookeeper.server.FinalRequestProcessor            [] - 
sessionid:0x100cf6d9cf60000 type:delete cxid:0x1f zxid:0xb txntype:2 reqpath:n/a
16:17:07,863 [        SyncThread:0] DEBUG 
org.apache.zookeeper.server.FinalRequestProcessor            [] - Processing 
request:: sessionid:0x100cf6d9cf60000 type:getData cxid:0x20 
zxid:0xfffffffffffffffe txntype:unknown 
reqpath:/flink/default/latch/_c_ddb7327c-1775-4084-9fbd-40e263758697-latch-0000000000
16:17:07,863 [        SyncThread:0] DEBUG 
org.apache.zookeeper.server.FinalRequestProcessor            [] - 
sessionid:0x100cf6d9cf60000 type:getData cxid:0x20 zxid:0xfffffffffffffffe 
txntype:unknown 
reqpath:/flink/default/latch/_c_ddb7327c-1775-4084-9fbd-40e263758697-latch-0000000000
16:17:07,864 [        SyncThread:0] DEBUG 
org.apache.zookeeper.server.FinalRequestProcessor            [] - Processing 
request:: sessionid:0x100cf6d9cf60000 type:getChildren2 cxid:0x21 
zxid:0xfffffffffffffffe txntype:unknown reqpath:/flink/default/latch
16:17:07,864 [        SyncThread:0] DEBUG 
org.apache.zookeeper.server.FinalRequestProcessor            [] - 
sessionid:0x100cf6d9cf60000 type:getChildren2 cxid:0x21 zxid:0xfffffffffffffffe 
txntype:unknown reqpath:/flink/default/latch
16:17:07,866 [        SyncThread:0] DEBUG 
org.apache.zookeeper.server.FinalRequestProcessor            [] - Processing 
request:: sessionid:0x100cf6d9cf60000 type:delete cxid:0x22 zxid:0xc txntype:2 
reqpath:n/a
16:17:07,866 [        SyncThread:0] DEBUG 
org.apache.zookeeper.server.FinalRequestProcessor            [] - 
sessionid:0x100cf6d9cf60000 type:delete cxid:0x22 zxid:0xc txntype:2 reqpath:n/a
16:17:07,869 [        SyncThread:0] DEBUG 
org.apache.zookeeper.server.FinalRequestProcessor            [] - Processing 
request:: sessionid:0x100cf6d9cf60000 type:create2 cxid:0x23 zxid:0xd 
txntype:15 reqpath:n/a
16:17:07,869 [        SyncThread:0] DEBUG 
org.apache.zookeeper.server.FinalRequestProcessor            [] - 
sessionid:0x100cf6d9cf60000 type:create2 cxid:0x23 zxid:0xd txntype:15 
reqpath:n/a
16:17:07,869 [        SyncThread:0] DEBUG 
org.apache.zookeeper.server.FinalRequestProcessor            [] - Processing 
request:: sessionid:0x100cf6d9cf60000 type:getData cxid:0x24 
zxid:0xfffffffffffffffe txntype:unknown 
reqpath:/flink/default/latch/_c_6eb174e9-bb77-4a73-9604-531242c11c0e-latch-0000000001
16:17:07,869 [        SyncThread:0] DEBUG 
org.apache.zookeeper.server.FinalRequestProcessor            [] - 
sessionid:0x100cf6d9cf60000 type:getData cxid:0x24 zxid:0xfffffffffffffffe 
txntype:unknown 
reqpath:/flink/default/latch/_c_6eb174e9-bb77-4a73-9604-531242c11c0e-latch-0000000001
16:17:07,871 [        SyncThread:0] DEBUG 
org.apache.zookeeper.server.FinalRequestProcessor            [] - Processing 
request:: sessionid:0x100cf6d9cf60000 type:getChildren2 cxid:0x25 
zxid:0xfffffffffffffffe txntype:unknown reqpath:/flink/default/latch
16:17:07,871 [        SyncThread:0] DEBUG 
org.apache.zookeeper.server.FinalRequestProcessor            [] - 
sessionid:0x100cf6d9cf60000 type:getChildren2 cxid:0x25 zxid:0xfffffffffffffffe 
txntype:unknown reqpath:/flink/default/latch
16:17:07,875 [        SyncThread:0] DEBUG 
org.apache.zookeeper.server.FinalRequestProcessor            [] - Processing 
request:: sessionid:0x100cf6d9cf60000 type:delete cxid:0x26 zxid:0xe txntype:2 
reqpath:n/a
16:17:07,875 [        SyncThread:0] DEBUG 
org.apache.zookeeper.server.FinalRequestProcessor            [] - 
sessionid:0x100cf6d9cf60000 type:delete cxid:0x26 zxid:0xe txntype:2 reqpath:n/a
16:17:07,876 [        SyncThread:0] DEBUG 
org.apache.zookeeper.server.FinalRequestProcessor            [] - Processing 
request:: sessionid:0x100cf6d9cf60000 type:create2 cxid:0x27 zxid:0xf 
txntype:15 reqpath:n/a
16:17:07,876 [        SyncThread:0] DEBUG 
org.apache.zookeeper.server.FinalRequestProcessor            [] - 
sessionid:0x100cf6d9cf60000 type:create2 cxid:0x27 zxid:0xf txntype:15 
reqpath:n/a
16:17:07,876 [        SyncThread:0] DEBUG 
org.apache.zookeeper.server.FinalRequestProcessor            [] - Processing 
request:: sessionid:0x100cf6d9cf60000 type:getData cxid:0x28 
zxid:0xfffffffffffffffe txntype:unknown 
reqpath:/flink/default/latch/_c_8008ceb9-e0b9-459b-8fa5-44428fa31e29-latch-0000000002
16:17:07,876 [        SyncThread:0] DEBUG 
org.apache.zookeeper.server.FinalRequestProcessor            [] - 
sessionid:0x100cf6d9cf60000 type:getData cxid:0x28 zxid:0xfffffffffffffffe 
txntype:unknown 
reqpath:/flink/default/latch/_c_8008ceb9-e0b9-459b-8fa5-44428fa31e29-latch-0000000002
[...]
{code}
The LeaderLatch implementation works like that (based on the code):
 # {{LeaderLatch}} is started pointing to a given zNode path 
({{{}/flink/default/latch/{}}} in our case).
 # The instance will get {{reset()}} initially setting the leadership for this 
latch to {{false}} and triggering a child creation
 # Child creation is done "withProtection" (i.e. a random String will be 
prefixed to the child) in mode {{EPHEMERAL_SEQUENTIAL}} (i.e. the child will be 
deleted on connection-loss and a monotonically increasing number is added as a 
suffix) with a callback that triggers {{getChildren}} in case of success and 
selecting the leader based on the suffix of the child nodes names (lower ID 
wins). Hence, an order between the children is established based on the ID.
 # {{getChildren}} callback triggers leadership detection in the local 
{{LeaderLatch}} instance.
 ## For the child with the lowest ID, the leadership is triggered in the 
callback
 ## If the current {{LeaderLatch}} doesn't correspond to the child with the 
lowest ID, a watcher is set up watching the direct predecessor (based on the 
ID-based ordering I mentioned before) for deletion. If the deletion happens, 
the watcher is triggered (only once, i.e. not repeatedly) and reinitiates the 
leadership detection through {{getChildren}} (4.) if the {{LeaderLatch}} isn't 
closed, yet. This is what we observe in the successful test run with the 
watcher being triggered. 

There is another code path if the predecessor node doesn't exist in 
{{getChildren}} (4.). In that case, an entire {{reset()}} (2.) is triggered 
resulting in the deletion of the current's {{LeaderLatch}}'s zNode and the 
recreation of the child zNode. It appears that we're ending up in a this code 
path for the unsuccessful test run.

> ZooKeeperMultipleComponentLeaderElectionDriverTest.testLeaderElectionWithMultipleDrivers
>  runs into timeout
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: FLINK-28078
>                 URL: https://issues.apache.org/jira/browse/FLINK-28078
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Coordination
>    Affects Versions: 1.16.0
>            Reporter: Matthias Pohl
>            Assignee: Matthias Pohl
>            Priority: Major
>              Labels: test-stability
>
> [Build 
> #36189|https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=36189&view=logs&j=0da23115-68bb-5dcd-192c-bd4c8adebde1&t=24c3384f-1bcb-57b3-224f-51bf973bbee8&l=10455]
>  got stuck in 
> {{ZooKeeperMultipleComponentLeaderElectionDriverTest.testLeaderElectionWithMultipleDrivers}}
> {code}
> "ForkJoinPool-45-worker-25" #525 daemon prio=5 os_prio=0 
> tid=0x00007fc74d9e3800 nid=0x62c8 waiting on condition [0x00007fc6ff2f2000]
> May 30 16:36:10    java.lang.Thread.State: WAITING (parking)
> May 30 16:36:10       at sun.misc.Unsafe.park(Native Method)
> May 30 16:36:10       - parking to wait for  <0x00000000c2571b80> (a 
> java.util.concurrent.CompletableFuture$Signaller)
> May 30 16:36:10       at 
> java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> May 30 16:36:10       at 
> java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1707)
> May 30 16:36:10       at 
> java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3313)
> May 30 16:36:10       at 
> java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1742)
> May 30 16:36:10       at 
> java.util.concurrent.CompletableFuture.join(CompletableFuture.java:1947)
> May 30 16:36:10       at 
> org.apache.flink.runtime.leaderelection.ZooKeeperMultipleComponentLeaderElectionDriverTest.testLeaderElectionWithMultipleDrivers(ZooKeeperMultipleComponentLeaderElectionDriverTest.java:256)
> May 30 16:36:10       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method)
> May 30 16:36:10       at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> May 30 16:36:10       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> May 30 16:36:10       at java.lang.reflect.Method.invoke(Method.java:498)
> [...]
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to