[ 
https://issues.apache.org/jira/browse/FLINK-25981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17488138#comment-17488138
 ] 

Till Rohrmann commented on FLINK-25981:
---------------------------------------

Hmm, from the logs it looks as if there is no new leader elected after the 
first election driver is closed. Unfortunately, the logs for ZooKeeper and 
Curator are disabled. Therefore, there is not a lot more to extract from the 
logs. I've tried reproducing the problem locally. This was unsuccessful so far. 
Maybe you can upload the logs for the failed run [~mapohl] with ZooKeeper and 
Curator logging enabled.

> ZooKeeperMultipleComponentLeaderElectionDriverTest failed
> ---------------------------------------------------------
>
>                 Key: FLINK-25981
>                 URL: https://issues.apache.org/jira/browse/FLINK-25981
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Coordination
>    Affects Versions: 1.15.0
>            Reporter: Matthias Pohl
>            Priority: Major
>              Labels: test-stability
>
> We experienced a [build 
> failure|https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=30783&view=logs&j=a57e0635-3fad-5b08-57c7-a4142d7d6fa9&t=2ef0effc-1da1-50e5-c2bd-aab434b1c5b7&l=15997]
>  in 
> {{ZooKeeperMultipleComponentLeaderElectionDriverTest.testLeaderElectionWithMultipleDrivers}}.
>  The test halted when waiting for the next leader in 
> [ZooKeeperMultipleComponentLeaderElectionDriverTest:256|https://github.com/apache/flink/blob/e8742f7f5cac34852d0e621036e1614bbdfe8ec3/flink-runtime/src/test/java/org/apache/flink/runtime/leaderelection/ZooKeeperMultipleComponentLeaderElectionDriverTest.java#L256]
> {code}
> Feb 04 18:02:54 "main" #1 prio=5 os_prio=0 tid=0x00007fab0800b800 nid=0xe07 
> waiting on condition [0x00007fab12574000]
> Feb 04 18:02:54    java.lang.Thread.State: WAITING (parking)
> Feb 04 18:02:54       at sun.misc.Unsafe.park(Native Method)
> Feb 04 18:02:54       - parking to wait for  <0x000000008065c5c8> (a 
> java.util.concurrent.CompletableFuture$Signaller)
> Feb 04 18:02:54       at 
> java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> Feb 04 18:02:54       at 
> java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1707)
> Feb 04 18:02:54       at 
> java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323)
> Feb 04 18:02:54       at 
> java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1742)
> Feb 04 18:02:54       at 
> java.util.concurrent.CompletableFuture.join(CompletableFuture.java:1947)
> Feb 04 18:02:54       at 
> org.apache.flink.runtime.leaderelection.ZooKeeperMultipleComponentLeaderElectionDriverTest.testLeaderElectionWithMultipleDrivers(ZooKeeperMultipleComponentLeaderElectionDriverTest.java:256)
> [...]
> {code}
> The extended Maven logs indicate that the timeout happened while waiting for 
> the second leader to be selected.
> {code}
> Test 
> org.apache.flink.runtime.leaderelection.ZooKeeperMultipleComponentLeaderElectionDriverTest.testLeaderElectionWithMultipleDrivers
>  is running.
> --------------------------------------------------------------------------------
> 17:15:10,437 [           Thread-16] INFO  
> org.apache.curator.test.TestingZooKeeperMain                 [] - Starting 
> server
> 17:15:10,450 [                main] INFO  
> org.apache.flink.runtime.util.ZooKeeperUtils                 [] - Enforcing 
> default ACL for ZK connections
> 17:15:10,451 [                main] INFO  
> org.apache.flink.runtime.util.ZooKeeperUtils                 [] - Using 
> '/flink/default' as Zookeeper namespace.
> 17:15:10,452 [                main] INFO  
> org.apache.flink.shaded.curator5.org.apache.curator.framework.imps.CuratorFrameworkImpl
>  [] - Starting
> 17:15:10,455 [                main] INFO  
> org.apache.flink.shaded.curator5.org.apache.curator.framework.imps.CuratorFrameworkImpl
>  [] - Default schema
> 17:15:10,462 [    main-EventThread] INFO  
> org.apache.flink.shaded.curator5.org.apache.curator.framework.state.ConnectionStateManager
>  [] - State change: CONNECTED
> 17:15:10,467 [    main-EventThread] INFO  
> org.apache.flink.shaded.curator5.org.apache.curator.framework.imps.EnsembleTracker
>  [] - New config event received: {}
> 17:15:10,482 [Curator-ConnectionStateManager-0] DEBUG 
> org.apache.flink.runtime.leaderelection.ZooKeeperMultipleComponentLeaderElectionDriver
>  [] - Connected to ZooKeeper quorum. Leader election can start.
> 17:15:10,483 [Curator-ConnectionStateManager-0] DEBUG 
> org.apache.flink.runtime.leaderelection.ZooKeeperMultipleComponentLeaderElectionDriver
>  [] - Connected to ZooKeeper quorum. Leader election can start.
> 17:15:10,483 [Curator-ConnectionStateManager-0] DEBUG 
> org.apache.flink.runtime.leaderelection.ZooKeeperMultipleComponentLeaderElectionDriver
>  [] - Connected to ZooKeeper quorum. Leader election can start.
> 17:15:10,484 [    main-EventThread] INFO  
> org.apache.flink.shaded.curator5.org.apache.curator.framework.imps.EnsembleTracker
>  [] - New config event received: {}
> 17:15:10,562 [    main-EventThread] DEBUG 
> org.apache.flink.runtime.leaderelection.ZooKeeperMultipleComponentLeaderElectionDriver
>  [] - ZooKeeperMultipleComponentLeaderElectionDriver obtained the leadership.
> 17:15:10,600 [                main] INFO  
> org.apache.flink.runtime.leaderelection.ZooKeeperMultipleComponentLeaderElectionDriver
>  [] - Closing ZooKeeperMultipleComponentLeaderElectionDriver.
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to