[ 
https://issues.apache.org/jira/browse/METRON-1525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

tantian updated METRON-1525:
----------------------------
    Description: The memory metric of My yarn cluster is 16G, after adding 4 
models in the cluster, I can not add any models. Then I found the reserved 
memory of the cluster is 0 G, so I wanted to remove some models, so I can 
release some memory and add new models, but it failed. And I viewed the logs of 
the Maas Application Master, I found that none request that submitted by the 
client was received by the Maas Application Master. So I had to look at the 
source code to  find the reason, and I did. The Maas Client and the Maas 
Application Master use a zookeeper queue (SimpleDistributedQueue) to 
communicate, so I looked at the content of the queue (the znode is: 
/maas/queue), and I found there are many elements in the queue, so I thought 
the Maas Application Master must be blocked somewhere, and I found it was 
blocked at the while loop in 
org.apache.metron.maas.service.ApplicationMaster.run() (line 520 in 
org.apache.metron.maas.server.ApplicationMaster.java). The reason is also 
simple, the queue to store the allocated containers is a blocking queue, so 
when the reserved memory of the cluster is not enough to allocate a container, 
the resource manager do not response the 'allocate container' request from the 
Maas Application Master, so the Maas Application Master blocks when trying to 
get a container from the queue.  (was: The memory metric of My yarn cluster is 
16G, after adding 4 models in the cluster, I can not add any models. Then I 
found the reserved memory of the cluster is 0 G, so I wanted to remove some 
models, so I can release some memory and add new models, but it failed. And I 
viewed the logs of the Maas Application Master, I found that none request that 
submitted by the client was received by the Maas Application Master. So I had 
to look at the source code to  find the reason, and I did. The Maas Client and 
the Maas Application Master use a zookeeper queue (SimpleDistributedQueue) to 
communicate, so I looked at the content of the queue (the znode is: 
/maas/queue), and I found there are many elements in the queue, so I thought 
the Maas Application Master must be blocked somewhere, and I found it was 
blocked at the while loop in 
org.apache.metron.maas.service.ApplicationMaster.run().)

> MAAS Application Master can not execute any command submitted by the client 
> using maas_deploy.sh
> ------------------------------------------------------------------------------------------------
>
>                 Key: METRON-1525
>                 URL: https://issues.apache.org/jira/browse/METRON-1525
>             Project: Metron
>          Issue Type: Bug
>    Affects Versions: 0.4.1
>            Reporter: tantian
>            Priority: Major
>
> The memory metric of My yarn cluster is 16G, after adding 4 models in the 
> cluster, I can not add any models. Then I found the reserved memory of the 
> cluster is 0 G, so I wanted to remove some models, so I can release some 
> memory and add new models, but it failed. And I viewed the logs of the Maas 
> Application Master, I found that none request that submitted by the client 
> was received by the Maas Application Master. So I had to look at the source 
> code to  find the reason, and I did. The Maas Client and the Maas Application 
> Master use a zookeeper queue (SimpleDistributedQueue) to communicate, so I 
> looked at the content of the queue (the znode is: /maas/queue), and I found 
> there are many elements in the queue, so I thought the Maas Application 
> Master must be blocked somewhere, and I found it was blocked at the while 
> loop in org.apache.metron.maas.service.ApplicationMaster.run() (line 520 in 
> org.apache.metron.maas.server.ApplicationMaster.java). The reason is also 
> simple, the queue to store the allocated containers is a blocking queue, so 
> when the reserved memory of the cluster is not enough to allocate a 
> container, the resource manager do not response the 'allocate container' 
> request from the Maas Application Master, so the Maas Application Master 
> blocks when trying to get a container from the queue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to