----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/44920/#review126385 -----------------------------------------------------------
I have some high level questions w/ the current class layout/hierarchy. Will sync up w/ Jagadish in person. samza-core/src/main/java/org/apache/samza/clustermanager/AbstractContainerAllocator.java (line 45) <https://reviews.apache.org/r/44920/#comment189378> nit: If you want to change this, open a separate JIRA and remove the TODO here. samza-core/src/main/java/org/apache/samza/clustermanager/AbstractContainerAllocator.java (line 85) <https://reviews.apache.org/r/44920/#comment189377> Is this a copy of the AbstractContainerAllocator in samza-yarn? Or a modified version? If it is a simple copy, I would suggest to remove the code in samza-yarn, s.t. we don't leave dup codes in two different places. samza-core/src/main/java/org/apache/samza/clustermanager/AbstractContainerAllocator.java (line 143) <https://reviews.apache.org/r/44920/#comment189384> I assume that the lines between 143 to 148 are read/write to resourceRequestState object? Shouldn't it be synchronized for thread-safety? samza-core/src/main/java/org/apache/samza/clustermanager/AbstractContainerAllocator.java (line 149) <https://reviews.apache.org/r/44920/#comment189388> Why is this "expectedContainerId"? Is there a case that we asked for container_0 and we get back container_9? samza-core/src/main/java/org/apache/samza/clustermanager/AbstractContainerAllocator.java (line 157) <https://reviews.apache.org/r/44920/#comment189395> Wouldn't it be better to encapsulate the container launch and app state change in a single method? I thought Jacob did some refactor in the samza-yarn old classes. Is it just not picked up by this patch yet? samza-core/src/main/java/org/apache/samza/clustermanager/AbstractContainerAllocator.java (line 160) <https://reviews.apache.org/r/44920/#comment189391> I would prefer to: 1) wrap the internal member variable via accessor methods of SamzaAppState 2) if jobHealthy is purely derived from neededResources.get() == 0, do not introduce an additional variable for that. samza-core/src/main/java/org/apache/samza/clustermanager/AbstractContainerAllocator.java (line 216) <https://reviews.apache.org/r/44920/#comment189396> What's the UUID here for? Since it is purely random UUID in the caller, can we remove that and provide a version of constructor that like below? {code} SamzaResourceRequest(int cpuCores, int memMb, String host, String containerId) {code} samza-core/src/main/java/org/apache/samza/clustermanager/AbstractContainerAllocator.java (line 219) <https://reviews.apache.org/r/44920/#comment189397> Same comment here. Prefer not to leak out the internal member variables if not necessary. samza-core/src/main/java/org/apache/samza/clustermanager/AbstractContainerAllocator.java (line 266) <https://reviews.apache.org/r/44920/#comment189398> nit: spaces needed around '=' samza-core/src/main/java/org/apache/samza/clustermanager/ClusterBasedJobCoordinator.java (line 49) <https://reviews.apache.org/r/44920/#comment189439> Since we are adding multi-threaded container models, would it be a reasonable time to add thread-safety in all new classes? Since the job coordinator mainly is requesting/allocating containers, we can start w/ a simple synchronous keyword on all methods, w/o too much concerns of the performance degradation. samza-core/src/main/java/org/apache/samza/clustermanager/ClusterBasedJobCoordinator.java (line 56) <https://reviews.apache.org/r/44920/#comment189440> question: I thought that this *is* an implementation of JobCoordinator API??? Otherwise, the class name is really confusing since it suggest that. samza-core/src/main/java/org/apache/samza/clustermanager/ClusterBasedJobCoordinator.java (line 61) <https://reviews.apache.org/r/44920/#comment189471> I think that this is the implementation of ClusterJobCoordinatorBase class in the design doc? Why there is no implementation of JobCoordinator interface as in the design doc? samza-core/src/main/java/org/apache/samza/clustermanager/ClusterBasedJobCoordinator.java (line 65) <https://reviews.apache.org/r/44920/#comment189441> nit: could use the same annotation as other variables. samza-core/src/main/java/org/apache/samza/clustermanager/ClusterBasedJobCoordinator.java (line 66) <https://reviews.apache.org/r/44920/#comment189443> nit: Why do we have both ClusterManagerConfig and ContainerProcessManager in this class? I would imagine that we are using ClusterManagerConfig to instantiate ContainerProcessManager class. Then, in the ClusterBasedJobCoordinator, we can simply keep the reference to ContainerProcessManager?? samza-core/src/main/java/org/apache/samza/clustermanager/ClusterBasedJobCoordinator.java (line 80) <https://reviews.apache.org/r/44920/#comment189446> suggestion for future: is this an inherited metrics class for YARN? I think that we should create a more generic SamzaJobCoordinatorMetrics later. samza-core/src/main/java/org/apache/samza/clustermanager/ClusterBasedJobCoordinator.java (line 88) <https://reviews.apache.org/r/44920/#comment189447> From the comments, it seems that this is mainly handle the "container failures/allocation", not task. Why can't this be part of ContainerProcessManager? samza-core/src/main/java/org/apache/samza/clustermanager/ClusterBasedJobCoordinator.java (line 93) <https://reviews.apache.org/r/44920/#comment189450> So, why do we have a JobModelReader to do read-only function in JobCoordinator? I thought that the JobCoordinator class should have JobModelManager module which can do both read/write JobModel, while write is only valid if the JobCoordinator is the leader? Are you going to implement JobCoordiatorLeader and JobCoordinatorFollower interfaces to separate them? It seems to be a bit too complex than necessary. samza-core/src/main/java/org/apache/samza/clustermanager/ClusterBasedJobCoordinator.java (line 98) <https://reviews.apache.org/r/44920/#comment189451> This also seems to be a feature of taskManager, can we fold it in that object? samza-core/src/main/java/org/apache/samza/clustermanager/ClusterBasedJobCoordinator.java (line 103) <https://reviews.apache.org/r/44920/#comment189452> I thought that we *always* start JMX server in the containers? Why are we turning it off? samza-core/src/main/java/org/apache/samza/clustermanager/ClusterBasedJobCoordinator.java (line 109) <https://reviews.apache.org/r/44920/#comment189456> two points: 1) If this is *only* used in YarnJobCoordinator, let's move it there for now. 2) Why can't this be in ContainerProcessManager, since the exceptions happened within calls to ContainerProcessManager? samza-core/src/main/java/org/apache/samza/clustermanager/ClusterBasedJobCoordinator.java (line 116) <https://reviews.apache.org/r/44920/#comment189458> Shouldn't this be the member of the StreamProcessor class that includes the JobCoordinator object, since it is a global object? samza-core/src/main/java/org/apache/samza/clustermanager/ClusterBasedJobCoordinator.java (line 121) <https://reviews.apache.org/r/44920/#comment189462> nit: use {@link } to refer to JobCoordinator class name here. samza-core/src/main/java/org/apache/samza/clustermanager/ClusterBasedJobCoordinator.java (line 131) <https://reviews.apache.org/r/44920/#comment189474> Question: why isn't this JmxServer object's lifecycle == StreamProcessor? Or even, the whole user-defined process? samza-core/src/main/java/org/apache/samza/clustermanager/ClusterBasedJobCoordinator.java (line 154) <https://reviews.apache.org/r/44920/#comment189477> Curious: why do we need all these to create ContainerProcessManager? I thought that we should only need cluster manager related config info? Something like: ClusterManagerConfig??? samza-core/src/main/java/org/apache/samza/clustermanager/ClusterBasedJobCoordinator.java (line 158) <https://reviews.apache.org/r/44920/#comment189478> Not sure why we need a separate SamzaTaskManager?? samza-core/src/main/java/org/apache/samza/clustermanager/ClusterBasedJobCoordinator.java (line 173) <https://reviews.apache.org/r/44920/#comment189479> I think that this really should be at least in StreamProcessor. - Yi Pan (Data Infrastructure) On March 31, 2016, 10:40 p.m., Jagadish Venkatraman wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/44920/ > ----------------------------------------------------------- > > (Updated March 31, 2016, 10:40 p.m.) > > > Review request for samza, Boris Shkolnik, Chris Pettitt, Jake Maes, Yi Pan > (Data Infrastructure), Navina Ramesh, and Xinyu Liu. > > > Repository: samza > > > Description > ------- > > 1.Proposed new APIs for running Samza without Yarn. (SAMZA-881) > - Defined the ContainerProcessManager abstraction. > - Defined the SamzaResource, SamzaResourceRequest. > - Re-wrote the SamzaAppMaster logic into a ClusterBasedJobCoordinator. > 2.Defined a ClusterManagerConfig to handle configurations independent of > Yarn/Mesos. > 3.Made Samza completely independent of Yarn. This cleanly separates Samza > specific components and Yarn > specific components. > 4.Readability improvements to the existing code base. > -Added explicity documentation for every method, member and class > (including on thread-safety) > - Made internal variables final to document intent, visibility across > threads. (trivially by adding modifiers, or by changing where they're > initialized.) > 5.Refactored JobCoordinator into a JobModelReader. > > == Diff2 == > Address review feedback. > > Design Doc: > https://issues.apache.org/jira/secure/attachment/12790540/SamzaJobCoordinatorRe-designProposal.pdf > > > Diffs > ----- > > > samza-core/src/main/java/org/apache/samza/clustermanager/AbstractContainerAllocator.java > PRE-CREATION > > samza-core/src/main/java/org/apache/samza/clustermanager/ClusterBasedJobCoordinator.java > PRE-CREATION > > samza-core/src/main/java/org/apache/samza/clustermanager/ContainerAllocator.java > PRE-CREATION > > samza-core/src/main/java/org/apache/samza/clustermanager/ContainerManagerFactory.java > PRE-CREATION > > samza-core/src/main/java/org/apache/samza/clustermanager/ContainerProcessManager.java > PRE-CREATION > > samza-core/src/main/java/org/apache/samza/clustermanager/ContainerRequestState.java > PRE-CREATION > > samza-core/src/main/java/org/apache/samza/clustermanager/HostAwareContainerAllocator.java > PRE-CREATION > > samza-core/src/main/java/org/apache/samza/clustermanager/ResourceFailure.java > PRE-CREATION > samza-core/src/main/java/org/apache/samza/clustermanager/SamzaAppState.java > PRE-CREATION > > samza-core/src/main/java/org/apache/samza/clustermanager/SamzaContainerLaunchException.java > PRE-CREATION > samza-core/src/main/java/org/apache/samza/clustermanager/SamzaResource.java > PRE-CREATION > > samza-core/src/main/java/org/apache/samza/clustermanager/SamzaResourceRequest.java > PRE-CREATION > > samza-core/src/main/java/org/apache/samza/clustermanager/SamzaResourceStatus.java > PRE-CREATION > > samza-core/src/main/java/org/apache/samza/clustermanager/SamzaTaskManager.java > PRE-CREATION > samza-core/src/main/java/org/apache/samza/config/ClusterManagerConfig.java > PRE-CREATION > samza-core/src/main/scala/org/apache/samza/coordinator/JobModelReader.scala > PRE-CREATION > > samza-core/src/main/scala/org/apache/samza/metrics/SamzaAppMasterMetrics.scala > PRE-CREATION > samza-shell/src/main/bash/run-am.sh > 9545a96953baaff17ad14962e02bc12aadbb1101 > > samza-yarn/src/main/java/org/apache/samza/job/yarn/refactor/YarnAppState.java > PRE-CREATION > > samza-yarn/src/main/java/org/apache/samza/job/yarn/refactor/YarnContainerManager.java > PRE-CREATION > > samza-yarn/src/main/java/org/apache/samza/job/yarn/refactor/YarnContainerManagerFactory.java > PRE-CREATION > > samza-yarn/src/main/java/org/apache/samza/job/yarn/refactor/YarnContainerRunner.java > PRE-CREATION > > samza-yarn/src/main/scala/org/apache/samza/job/yarn/refactor/SamzaAppMasterService.scala > PRE-CREATION > > samza-yarn/src/main/scala/org/apache/samza/job/yarn/refactor/SamzaYarnAppMasterLifecycle.scala > PRE-CREATION > > Diff: https://reviews.apache.org/r/44920/diff/ > > > Testing > ------- > > Tested with sample jobs on clusters of varying sizes. Tested locally. TODO: > Unit tests. > > > Thanks, > > Jagadish Venkatraman > >