[jira] [Commented] (FLINK-2213) Configure number of vcores

ASF GitHub Bot (JIRA) Sun, 21 Feb 2016 06:59:10 -0800

    [ 
https://issues.apache.org/jira/browse/FLINK-2213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15156048#comment-15156048
 ]


ASF GitHub Bot commented on FLINK-2213:
---------------------------------------

Github user rmetzger commented on the pull request:

    https://github.com/apache/flink/pull/1588#issuecomment-186835009
  
    The expected output of the test is the following
    
    ```
    Test testQueryCluster(org.apache.flink.yarn.YARNSessionFIFOITCase) is 
running.
    
--------------------------------------------------------------------------------
    20:12:25,692 INFO  org.apache.hadoop.yarn.client.RMProxy                    
     - Connecting to ResourceManager at 
testing-worker-linux-docker-8281db7b-3371-linux-16/172.17.6.245:8032
    20:12:25,701 INFO  org.apache.hadoop.yarn.webapp.WebApps                    
     - Registered webapp guice modules
    20:12:25,712 INFO  
org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl  - Registering 
with RM using finished containers :[]
    20:12:25,712 INFO  org.apache.hadoop.yarn.util.RackResolver                 
     - Resolved 
testing-worker-linux-docker-8281db7b-3371-linux-16.prod.travis-ci.org to 
/default-rack
    20:12:25,712 INFO  
org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService  - 
NodeManager from node 
testing-worker-linux-docker-8281db7b-3371-linux-16.prod.travis-ci.org(cmPort: 
59877 httpPort: 39611) registered with capability: <memory:4096, vCores:666>, 
assigned nodeId 
testing-worker-linux-docker-8281db7b-3371-linux-16.prod.travis-ci.org:59877
    20:12:25,712 INFO  
org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl  - 
testing-worker-linux-docker-8281db7b-3371-linux-16.prod.travis-ci.org:59877 
Node Transitioned from NEW to RUNNING
    20:12:25,712 INFO  
org.apache.hadoop.yarn.server.nodemanager.security.NMContainerTokenSecretManager
  - Rolling master-key for container-tokens, got key with id 1069868518
    20:12:25,712 INFO  
org.apache.hadoop.yarn.server.nodemanager.security.NMTokenSecretManagerInNM  - 
Rolling master-key for nm-tokens, got key with id :-1168902475
    20:12:25,712 INFO  
org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl  - Registered 
with ResourceManager as 
testing-worker-linux-docker-8281db7b-3371-linux-16.prod.travis-ci.org:59877 
with total resource of <memory:4096, vCores:666>
    20:12:25,712 INFO  
org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl  - Notifying 
ContainerManager to unblock new container-requests
    20:12:25,939 INFO  org.apache.flink.yarn.YARNSessionFIFOITCase              
     - Starting testQueryCluster()
    20:12:25,939 INFO  org.apache.flink.yarn.YarnTestBase                       
     - Running with args [-q]
    20:12:25,993 INFO  org.apache.hadoop.yarn.client.RMProxy                    
     - Connecting to ResourceManager at /0.0.0.0:8032
    20:12:26,940 INFO  org.apache.flink.yarn.YarnTestBase                       
     - Found expected output in redirected streams
    20:12:26,940 INFO  org.apache.flink.yarn.YarnTestBase                       
     - RunWithArgs: request runner to stop
    20:12:26,940 WARN  org.apache.flink.yarn.YarnTestBase                       
     - RunWithArgs runner stopped.
    20:12:26,940 INFO  org.apache.flink.yarn.YarnTestBase                       
     - Sending stdout content through logger: 
    
    NodeManagers in the Cluster 2|Property         |Value          
    +---------------------------------------+
    |NodeID           
|testing-worker-linux-docker-8281db7b-3371-linux-16.prod.travis-ci.org:59877 
    |Memory           |4096 MB          
    |vCores           |666              
    |HealthReport     |                 
    |Containers       |0                
    +---------------------------------------+
    |NodeID           
|testing-worker-linux-docker-8281db7b-3371-linux-16.prod.travis-ci.org:44161 
    |Memory           |4096 MB          
    |vCores           |666              
    |HealthReport     |                 
    |Containers       |0                
    +---------------------------------------+
    Summary: totalMemory 8192 totalCores 1332
    
    
    
    
    20:12:26,940 INFO  org.apache.flink.yarn.YarnTestBase                       
     - Sending stderr content through logger: 
    
    
    
    
    20:12:26,940 INFO  org.apache.flink.yarn.YarnTestBase                       
     - Test was successful
    20:12:26,940 INFO  org.apache.flink.yarn.YARNSessionFIFOITCase              
     - Finished testQueryCluster()
    20:12:27,443 INFO  org.apache.flink.yarn.YARNSessionFIFOITCase              
     - 
    ```
    
    but when its failing, its outputting
    
    ```
    Test testQueryCluster(org.apache.flink.yarn.YARNSessionFIFOITCase) is 
running.
    
--------------------------------------------------------------------------------
    21:50:20,684 INFO  org.apache.hadoop.yarn.client.RMProxy                    
     - Connecting to ResourceManager at 
testing-worker-linux-docker-1305e497-3358-linux-10/172.17.4.146:8032
    21:50:22,041 INFO  org.apache.flink.yarn.YARNSessionFIFOITCase              
     - Starting testQueryCluster()
    21:50:22,041 INFO  org.apache.flink.yarn.YarnTestBase                       
     - Running with args [-q]
    21:50:22,212 INFO  org.apache.hadoop.yarn.client.RMProxy                    
     - Connecting to ResourceManager at /0.0.0.0:8032
    21:50:22,559 INFO  org.mortbay.log                                          
     - Started 
HttpServer2$SelectChannelConnectorWithSafeStartup@testing-worker-linux-docker-1305e497-3358-linux-10:45772
    21:50:22,576 INFO  org.apache.hadoop.yarn.webapp.WebApps                    
     - Web app /node started at 45772
    21:50:22,577 INFO  org.mortbay.log                                          
     - Started 
HttpServer2$SelectChannelConnectorWithSafeStartup@testing-worker-linux-docker-1305e497-3358-linux-10:58616
    21:50:22,577 INFO  org.apache.hadoop.yarn.webapp.WebApps                    
     - Web app /node started at 58616
    21:50:22,664 INFO  org.apache.hadoop.yarn.webapp.WebApps                    
     - Registered webapp guice modules
    21:50:22,665 INFO  org.apache.hadoop.yarn.webapp.WebApps                    
     - Registered webapp guice modules
    21:50:22,691 INFO  
org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl  - Sending out 
0 NM container statuses: []
    21:50:22,696 INFO  
org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl  - Registering 
with RM using containers :[]
    21:50:22,746 INFO  org.apache.hadoop.yarn.util.RackResolver                 
     - Resolved 
testing-worker-linux-docker-1305e497-3358-linux-10.prod.travis-ci.org to 
/default-rack
    21:50:22,753 INFO  
org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl  - 
testing-worker-linux-docker-1305e497-3358-linux-10.prod.travis-ci.org:42997 
Node Transitioned from NEW to RUNNING
    21:50:22,758 INFO  
org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl  - Sending out 
0 NM container statuses: []
    21:50:22,758 INFO  
org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl  - Registering 
with RM using containers :[]
    21:50:22,758 INFO  org.apache.hadoop.yarn.util.RackResolver                 
     - Resolved 
testing-worker-linux-docker-1305e497-3358-linux-10.prod.travis-ci.org to 
/default-rack
    21:50:22,753 INFO  
org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService  - 
NodeManager from node 
testing-worker-linux-docker-1305e497-3358-linux-10.prod.travis-ci.org(cmPort: 
42997 httpPort: 58616) registered with capability: <memory:4096, vCores:666>, 
assigned nodeId 
testing-worker-linux-docker-1305e497-3358-linux-10.prod.travis-ci.org:42997
    21:50:22,758 INFO  
org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl  - 
testing-worker-linux-docker-1305e497-3358-linux-10.prod.travis-ci.org:50661 
Node Transitioned from NEW to RUNNING
    21:50:22,759 INFO  
org.apache.hadoop.yarn.server.nodemanager.security.NMContainerTokenSecretManager
  - Rolling master-key for container-tokens, got key with id -1878662703
    21:50:22,758 INFO  
org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService  - 
NodeManager from node 
testing-worker-linux-docker-1305e497-3358-linux-10.prod.travis-ci.org(cmPort: 
50661 httpPort: 45772) registered with capability: <memory:4096, vCores:666>, 
assigned nodeId 
testing-worker-linux-docker-1305e497-3358-linux-10.prod.travis-ci.org:50661
    21:50:22,760 INFO  
org.apache.hadoop.yarn.server.nodemanager.security.NMContainerTokenSecretManager
  - Rolling master-key for container-tokens, got key with id -1878662703
    21:50:22,760 INFO  
org.apache.hadoop.yarn.server.nodemanager.security.NMTokenSecretManagerInNM  - 
Rolling master-key for nm-tokens, got key with id :-748258147
    21:50:22,761 INFO  
org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl  - Registered 
with ResourceManager as 
testing-worker-linux-docker-1305e497-3358-linux-10.prod.travis-ci.org:42997 
with total resource of <memory:4096, vCores:666>
    21:50:22,761 INFO  
org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl  - Notifying 
ContainerManager to unblock new container-requests
    21:50:22,761 INFO  
org.apache.hadoop.yarn.server.nodemanager.security.NMTokenSecretManagerInNM  - 
Rolling master-key for nm-tokens, got key with id :-748258147
    21:50:22,762 INFO  
org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl  - Registered 
with ResourceManager as 
testing-worker-linux-docker-1305e497-3358-linux-10.prod.travis-ci.org:50661 
with total resource of <memory:4096, vCores:666>
    21:50:22,762 INFO  
org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl  - Notifying 
ContainerManager to unblock new container-requests
    21:50:23,077 INFO  org.apache.flink.yarn.YarnTestBase                       
     - Runner stopped earlier than expected with return value = 0
    21:50:23,077 INFO  org.apache.flink.yarn.YarnTestBase                       
     - Sending stdout content through logger: 
    
    NodeManagers in the Cluster 0|Property         |Value          
    +---------------------------------------+
    Summary: totalMemory 0 totalCores 0
    
    
    
    
    21:50:23,077 INFO  org.apache.flink.yarn.YarnTestBase                       
     - Sending stderr content through logger: 
    
    
    
    
    21:50:23,580 ERROR org.apache.flink.yarn.YARNSessionFIFOITCase              
     - 
    
--------------------------------------------------------------------------------
    Test testQueryCluster(org.apache.flink.yarn.YARNSessionFIFOITCase) failed 
with:
    java.lang.AssertionError: During the timeout period of 180 seconds the 
expected string did not show up
    ```
    
    Looking through the logs, I think the issue is the following:
    When the `testQueryCluster()` is executed, the NodeManagers are not yet 
registered with YARN. That's why the number of nodemanagers in the test is 0.
    The issue is occurring after you change, because the test execution order 
changed. Before your change, the `testQueryCluster()` test was executed after 
other tests, so the NM's were always registered.
    Since you removed many tests from the FIFOITCase, the `testQueryCluster()` 
is the first test to be executed. Apparently, the test setup is not waiting 
until all NM's are connected.
    
    I think you can solve the issue using the `waitForNodeManagersToConnect` 
from the MiniYARNCluster.


> Configure number of vcores
> --------------------------
>
>                 Key: FLINK-2213
>                 URL: https://issues.apache.org/jira/browse/FLINK-2213
>             Project: Flink
>          Issue Type: Improvement
>          Components: YARN Client
>    Affects Versions: 0.10.0
>            Reporter: Ufuk Celebi
>            Assignee: Klou
>             Fix For: 1.0.0
>
>
> Currently, the number of vcores per YARN container is set to 1.
> It is desirable to allow configuring this value. As a simple heuristic it 
> makes sense to at least set it to the number of slots per container.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-2213) Configure number of vcores

Reply via email to