[ https://issues.apache.org/jira/browse/FLINK-2213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15156048#comment-15156048 ]
ASF GitHub Bot commented on FLINK-2213: --------------------------------------- Github user rmetzger commented on the pull request: https://github.com/apache/flink/pull/1588#issuecomment-186835009 The expected output of the test is the following ``` Test testQueryCluster(org.apache.flink.yarn.YARNSessionFIFOITCase) is running. -------------------------------------------------------------------------------- 20:12:25,692 INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at testing-worker-linux-docker-8281db7b-3371-linux-16/172.17.6.245:8032 20:12:25,701 INFO org.apache.hadoop.yarn.webapp.WebApps - Registered webapp guice modules 20:12:25,712 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl - Registering with RM using finished containers :[] 20:12:25,712 INFO org.apache.hadoop.yarn.util.RackResolver - Resolved testing-worker-linux-docker-8281db7b-3371-linux-16.prod.travis-ci.org to /default-rack 20:12:25,712 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService - NodeManager from node testing-worker-linux-docker-8281db7b-3371-linux-16.prod.travis-ci.org(cmPort: 59877 httpPort: 39611) registered with capability: <memory:4096, vCores:666>, assigned nodeId testing-worker-linux-docker-8281db7b-3371-linux-16.prod.travis-ci.org:59877 20:12:25,712 INFO org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl - testing-worker-linux-docker-8281db7b-3371-linux-16.prod.travis-ci.org:59877 Node Transitioned from NEW to RUNNING 20:12:25,712 INFO org.apache.hadoop.yarn.server.nodemanager.security.NMContainerTokenSecretManager - Rolling master-key for container-tokens, got key with id 1069868518 20:12:25,712 INFO org.apache.hadoop.yarn.server.nodemanager.security.NMTokenSecretManagerInNM - Rolling master-key for nm-tokens, got key with id :-1168902475 20:12:25,712 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl - Registered with ResourceManager as testing-worker-linux-docker-8281db7b-3371-linux-16.prod.travis-ci.org:59877 with total resource of <memory:4096, vCores:666> 20:12:25,712 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl - Notifying ContainerManager to unblock new container-requests 20:12:25,939 INFO org.apache.flink.yarn.YARNSessionFIFOITCase - Starting testQueryCluster() 20:12:25,939 INFO org.apache.flink.yarn.YarnTestBase - Running with args [-q] 20:12:25,993 INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at /0.0.0.0:8032 20:12:26,940 INFO org.apache.flink.yarn.YarnTestBase - Found expected output in redirected streams 20:12:26,940 INFO org.apache.flink.yarn.YarnTestBase - RunWithArgs: request runner to stop 20:12:26,940 WARN org.apache.flink.yarn.YarnTestBase - RunWithArgs runner stopped. 20:12:26,940 INFO org.apache.flink.yarn.YarnTestBase - Sending stdout content through logger: NodeManagers in the Cluster 2|Property |Value +---------------------------------------+ |NodeID |testing-worker-linux-docker-8281db7b-3371-linux-16.prod.travis-ci.org:59877 |Memory |4096 MB |vCores |666 |HealthReport | |Containers |0 +---------------------------------------+ |NodeID |testing-worker-linux-docker-8281db7b-3371-linux-16.prod.travis-ci.org:44161 |Memory |4096 MB |vCores |666 |HealthReport | |Containers |0 +---------------------------------------+ Summary: totalMemory 8192 totalCores 1332 20:12:26,940 INFO org.apache.flink.yarn.YarnTestBase - Sending stderr content through logger: 20:12:26,940 INFO org.apache.flink.yarn.YarnTestBase - Test was successful 20:12:26,940 INFO org.apache.flink.yarn.YARNSessionFIFOITCase - Finished testQueryCluster() 20:12:27,443 INFO org.apache.flink.yarn.YARNSessionFIFOITCase - ``` but when its failing, its outputting ``` Test testQueryCluster(org.apache.flink.yarn.YARNSessionFIFOITCase) is running. -------------------------------------------------------------------------------- 21:50:20,684 INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at testing-worker-linux-docker-1305e497-3358-linux-10/172.17.4.146:8032 21:50:22,041 INFO org.apache.flink.yarn.YARNSessionFIFOITCase - Starting testQueryCluster() 21:50:22,041 INFO org.apache.flink.yarn.YarnTestBase - Running with args [-q] 21:50:22,212 INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at /0.0.0.0:8032 21:50:22,559 INFO org.mortbay.log - Started HttpServer2$SelectChannelConnectorWithSafeStartup@testing-worker-linux-docker-1305e497-3358-linux-10:45772 21:50:22,576 INFO org.apache.hadoop.yarn.webapp.WebApps - Web app /node started at 45772 21:50:22,577 INFO org.mortbay.log - Started HttpServer2$SelectChannelConnectorWithSafeStartup@testing-worker-linux-docker-1305e497-3358-linux-10:58616 21:50:22,577 INFO org.apache.hadoop.yarn.webapp.WebApps - Web app /node started at 58616 21:50:22,664 INFO org.apache.hadoop.yarn.webapp.WebApps - Registered webapp guice modules 21:50:22,665 INFO org.apache.hadoop.yarn.webapp.WebApps - Registered webapp guice modules 21:50:22,691 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl - Sending out 0 NM container statuses: [] 21:50:22,696 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl - Registering with RM using containers :[] 21:50:22,746 INFO org.apache.hadoop.yarn.util.RackResolver - Resolved testing-worker-linux-docker-1305e497-3358-linux-10.prod.travis-ci.org to /default-rack 21:50:22,753 INFO org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl - testing-worker-linux-docker-1305e497-3358-linux-10.prod.travis-ci.org:42997 Node Transitioned from NEW to RUNNING 21:50:22,758 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl - Sending out 0 NM container statuses: [] 21:50:22,758 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl - Registering with RM using containers :[] 21:50:22,758 INFO org.apache.hadoop.yarn.util.RackResolver - Resolved testing-worker-linux-docker-1305e497-3358-linux-10.prod.travis-ci.org to /default-rack 21:50:22,753 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService - NodeManager from node testing-worker-linux-docker-1305e497-3358-linux-10.prod.travis-ci.org(cmPort: 42997 httpPort: 58616) registered with capability: <memory:4096, vCores:666>, assigned nodeId testing-worker-linux-docker-1305e497-3358-linux-10.prod.travis-ci.org:42997 21:50:22,758 INFO org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl - testing-worker-linux-docker-1305e497-3358-linux-10.prod.travis-ci.org:50661 Node Transitioned from NEW to RUNNING 21:50:22,759 INFO org.apache.hadoop.yarn.server.nodemanager.security.NMContainerTokenSecretManager - Rolling master-key for container-tokens, got key with id -1878662703 21:50:22,758 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService - NodeManager from node testing-worker-linux-docker-1305e497-3358-linux-10.prod.travis-ci.org(cmPort: 50661 httpPort: 45772) registered with capability: <memory:4096, vCores:666>, assigned nodeId testing-worker-linux-docker-1305e497-3358-linux-10.prod.travis-ci.org:50661 21:50:22,760 INFO org.apache.hadoop.yarn.server.nodemanager.security.NMContainerTokenSecretManager - Rolling master-key for container-tokens, got key with id -1878662703 21:50:22,760 INFO org.apache.hadoop.yarn.server.nodemanager.security.NMTokenSecretManagerInNM - Rolling master-key for nm-tokens, got key with id :-748258147 21:50:22,761 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl - Registered with ResourceManager as testing-worker-linux-docker-1305e497-3358-linux-10.prod.travis-ci.org:42997 with total resource of <memory:4096, vCores:666> 21:50:22,761 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl - Notifying ContainerManager to unblock new container-requests 21:50:22,761 INFO org.apache.hadoop.yarn.server.nodemanager.security.NMTokenSecretManagerInNM - Rolling master-key for nm-tokens, got key with id :-748258147 21:50:22,762 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl - Registered with ResourceManager as testing-worker-linux-docker-1305e497-3358-linux-10.prod.travis-ci.org:50661 with total resource of <memory:4096, vCores:666> 21:50:22,762 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl - Notifying ContainerManager to unblock new container-requests 21:50:23,077 INFO org.apache.flink.yarn.YarnTestBase - Runner stopped earlier than expected with return value = 0 21:50:23,077 INFO org.apache.flink.yarn.YarnTestBase - Sending stdout content through logger: NodeManagers in the Cluster 0|Property |Value +---------------------------------------+ Summary: totalMemory 0 totalCores 0 21:50:23,077 INFO org.apache.flink.yarn.YarnTestBase - Sending stderr content through logger: 21:50:23,580 ERROR org.apache.flink.yarn.YARNSessionFIFOITCase - -------------------------------------------------------------------------------- Test testQueryCluster(org.apache.flink.yarn.YARNSessionFIFOITCase) failed with: java.lang.AssertionError: During the timeout period of 180 seconds the expected string did not show up ``` Looking through the logs, I think the issue is the following: When the `testQueryCluster()` is executed, the NodeManagers are not yet registered with YARN. That's why the number of nodemanagers in the test is 0. The issue is occurring after you change, because the test execution order changed. Before your change, the `testQueryCluster()` test was executed after other tests, so the NM's were always registered. Since you removed many tests from the FIFOITCase, the `testQueryCluster()` is the first test to be executed. Apparently, the test setup is not waiting until all NM's are connected. I think you can solve the issue using the `waitForNodeManagersToConnect` from the MiniYARNCluster. > Configure number of vcores > -------------------------- > > Key: FLINK-2213 > URL: https://issues.apache.org/jira/browse/FLINK-2213 > Project: Flink > Issue Type: Improvement > Components: YARN Client > Affects Versions: 0.10.0 > Reporter: Ufuk Celebi > Assignee: Klou > Fix For: 1.0.0 > > > Currently, the number of vcores per YARN container is set to 1. > It is desirable to allow configuring this value. As a simple heuristic it > makes sense to at least set it to the number of slots per container. -- This message was sent by Atlassian JIRA (v6.3.4#6332)