> On Oct. 5, 2016, 6:18 p.m., Jagadish Venkatraman wrote:
> > Overall, the patch looks great! This is exciting given that Samza can 
> > support scheduling based on tags. For example, jobs with rocksdb can be 
> > assigned to nodes with SSDs.
> > 
> > 
> > Can you please add some detail on testing this feature? 
> > What was the label setup of the cluster? (for example: Did we use an 
> > exclusive node label?), How many node labels? How many containers were 
> > requested for the job?

We haven't tested it in production, but the main idea is the next: we have 3 
different types of nodes in our hadoop cluster. The first type is used for 
ApplicationMasters (actually, we put up to 4 AM containers on one node). The 
second type is used for stateless jobs and this type of nodes has a small 
amount of memory. And the last type is used for stateful jobs and has more 
memory than others. So, there are 3 labels in our cluster: taskam, tasklowmem, 
taskhighmem. Now we force YARN to put containers on a particular type of nodes 
by a small trick with resources (we chose resources for node in such a way that 
YARN doesn't have any other variants except only one type of nodes). But hadoop 
labels is a more natural way to request containers to be placed on a specific 
node's type.


> On Oct. 5, 2016, 6:18 p.m., Jagadish Venkatraman wrote:
> > samza-yarn/src/main/java/org/apache/samza/job/yarn/YarnClusterResourceManager.java,
> >  line 207
> > <https://reviews.apache.org/r/51633/diff/1/?file=1491248#file1491248line207>
> >
> >     Question: How does the node-label feature inter-play with the preferred 
> > host feature in Yarn? (Samza leverages this for host-affinity)
> >     
> >     Does the label take precedence over the host? (or vice-versa).

If a preffered host is requested then label expression will be ignored. Here is 
from the hadoop documentation: Any please note that node label expression now 
can only take effect when the resource request has resourceName = ANY.


- Maxim


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51633/#review151519
-----------------------------------------------------------


On Oct. 7, 2016, 12:08 a.m., Maxim Logvinenko wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51633/
> -----------------------------------------------------------
> 
> (Updated Oct. 7, 2016, 12:08 a.m.)
> 
> 
> Review request for samza.
> 
> 
> Bugs: SAMZA-1013
>     https://issues.apache.org/jira/browse/SAMZA-1013
> 
> 
> Repository: samza
> 
> 
> Description
> -------
> 
> YARN Node labels were introduced in Hadoop version 2.6, which allows to group 
> nodes with similar characteristics and allows applications to specify where 
> to run. This patch adds support for YARN node labels in Samza.
> 
> In this implementation, node labels are defined directly in yarnConfig in 
> YarnClusterResourceManager. It might be better to have node labels as a part 
> of SamzaResourceRequest and SamzaResource classes, but 
> org.apache.hadoop.yarn.api.records.Container class doesn't contain node label 
> and hence we have nothing to pass to the SamzaResource constructor in 
> onContainersAllocated method of YarnClusterResourceManager class.
> 
> 
> Diffs
> -----
> 
>   samza-yarn/src/main/java/org/apache/samza/config/YarnConfig.java 8f2dc48 
>   
> samza-yarn/src/main/java/org/apache/samza/job/yarn/YarnClusterResourceManager.java
>  96d3d7c 
>   samza-yarn/src/main/scala/org/apache/samza/job/yarn/ClientHelper.scala 
> 0998c43 
> 
> Diff: https://reviews.apache.org/r/51633/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Maxim Logvinenko
> 
>

Reply via email to