> On Oct. 5, 2016, 6:18 p.m., Jagadish Venkatraman wrote: > > Overall, the patch looks great! This is exciting given that Samza can > > support scheduling based on tags. For example, jobs with rocksdb can be > > assigned to nodes with SSDs. > > > > > > Can you please add some detail on testing this feature? > > What was the label setup of the cluster? (for example: Did we use an > > exclusive node label?), How many node labels? How many containers were > > requested for the job? > > Maxim Logvinenko wrote: > We haven't tested it in production, but the main idea is the next: we > have 3 different types of nodes in our hadoop cluster. The first type is used > for ApplicationMasters (actually, we put up to 4 AM containers on one node). > The second type is used for stateless jobs and this type of nodes has a small > amount of memory. And the last type is used for stateful jobs and has more > memory than others. So, there are 3 labels in our cluster: taskam, > tasklowmem, taskhighmem. Now we force YARN to put containers on a particular > type of nodes by a small trick with resources (we chose resources for node in > such a way that YARN doesn't have any other variants except only one type of > nodes). But hadoop labels is a more natural way to request containers to be > placed on a specific node's type.
So, in this case, do you not care about *host affinity* at all when the job re-starts? Are you okay with your container coming back up on a different host (as long as it is a host with label `taskHighMem`)? We should make it explicit that when host-affinity.enabled=true, then node labelling will be ignored. Is my understanding reasonable? - Jagadish ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/51633/#review151519 ----------------------------------------------------------- On Oct. 7, 2016, 12:08 a.m., Maxim Logvinenko wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/51633/ > ----------------------------------------------------------- > > (Updated Oct. 7, 2016, 12:08 a.m.) > > > Review request for samza. > > > Bugs: SAMZA-1013 > https://issues.apache.org/jira/browse/SAMZA-1013 > > > Repository: samza > > > Description > ------- > > YARN Node labels were introduced in Hadoop version 2.6, which allows to group > nodes with similar characteristics and allows applications to specify where > to run. This patch adds support for YARN node labels in Samza. > > In this implementation, node labels are defined directly in yarnConfig in > YarnClusterResourceManager. It might be better to have node labels as a part > of SamzaResourceRequest and SamzaResource classes, but > org.apache.hadoop.yarn.api.records.Container class doesn't contain node label > and hence we have nothing to pass to the SamzaResource constructor in > onContainersAllocated method of YarnClusterResourceManager class. > > > Diffs > ----- > > samza-yarn/src/main/java/org/apache/samza/config/YarnConfig.java 8f2dc48 > > samza-yarn/src/main/java/org/apache/samza/job/yarn/YarnClusterResourceManager.java > 96d3d7c > samza-yarn/src/main/scala/org/apache/samza/job/yarn/ClientHelper.scala > 0998c43 > > Diff: https://reviews.apache.org/r/51633/diff/ > > > Testing > ------- > > > Thanks, > > Maxim Logvinenko > >