> On Oct. 5, 2016, 6:18 p.m., Jagadish Venkatraman wrote: > > Overall, the patch looks great! This is exciting given that Samza can > > support scheduling based on tags. For example, jobs with rocksdb can be > > assigned to nodes with SSDs. > > > > > > Can you please add some detail on testing this feature? > > What was the label setup of the cluster? (for example: Did we use an > > exclusive node label?), How many node labels? How many containers were > > requested for the job?
We haven't tested it in production, but the main idea is the next: we have 3 different types of nodes in our hadoop cluster. The first type is used for ApplicationMasters (actually, we put up to 4 AM containers on one node). The second type is used for stateless jobs and this type of nodes has a small amount of memory. And the last type is used for stateful jobs and has more memory than others. So, there are 3 labels in our cluster: taskam, tasklowmem, taskhighmem. Now we force YARN to put containers on a particular type of nodes by a small trick with resources (we chose resources for node in such a way that YARN doesn't have any other variants except only one type of nodes). But hadoop labels is a more natural way to request containers to be placed on a specific node's type. > On Oct. 5, 2016, 6:18 p.m., Jagadish Venkatraman wrote: > > samza-yarn/src/main/java/org/apache/samza/job/yarn/YarnClusterResourceManager.java, > > line 207 > > <https://reviews.apache.org/r/51633/diff/1/?file=1491248#file1491248line207> > > > > Question: How does the node-label feature inter-play with the preferred > > host feature in Yarn? (Samza leverages this for host-affinity) > > > > Does the label take precedence over the host? (or vice-versa). If a preffered host is requested then label expression will be ignored. Here is from the hadoop documentation: Any please note that node label expression now can only take effect when the resource request has resourceName = ANY. - Maxim ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/51633/#review151519 ----------------------------------------------------------- On Oct. 7, 2016, 12:08 a.m., Maxim Logvinenko wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/51633/ > ----------------------------------------------------------- > > (Updated Oct. 7, 2016, 12:08 a.m.) > > > Review request for samza. > > > Bugs: SAMZA-1013 > https://issues.apache.org/jira/browse/SAMZA-1013 > > > Repository: samza > > > Description > ------- > > YARN Node labels were introduced in Hadoop version 2.6, which allows to group > nodes with similar characteristics and allows applications to specify where > to run. This patch adds support for YARN node labels in Samza. > > In this implementation, node labels are defined directly in yarnConfig in > YarnClusterResourceManager. It might be better to have node labels as a part > of SamzaResourceRequest and SamzaResource classes, but > org.apache.hadoop.yarn.api.records.Container class doesn't contain node label > and hence we have nothing to pass to the SamzaResource constructor in > onContainersAllocated method of YarnClusterResourceManager class. > > > Diffs > ----- > > samza-yarn/src/main/java/org/apache/samza/config/YarnConfig.java 8f2dc48 > > samza-yarn/src/main/java/org/apache/samza/job/yarn/YarnClusterResourceManager.java > 96d3d7c > samza-yarn/src/main/scala/org/apache/samza/job/yarn/ClientHelper.scala > 0998c43 > > Diff: https://reviews.apache.org/r/51633/diff/ > > > Testing > ------- > > > Thanks, > > Maxim Logvinenko > >