> On April 26, 2016, 12:23 a.m., Yi Pan (Data Infrastructure) wrote: > > Overall, ltgm. Only a few comments to be addressed. One high level comment > > is that we don't need a refactor in the package path to indicate that we > > are refactoring. :) Thanks! > > Yi Pan (Data Infrastructure) wrote: > BTW, it seems that a rebase against the latest master is needed.
Yeah absolutely :) we certainly need a rebase. (Since, the scope of this patch is large, alot of changes have been checked-in in the interim. I'll carefully rebase the pending changes. Thanks! > On April 26, 2016, 12:23 a.m., Yi Pan (Data Infrastructure) wrote: > > samza-core/src/main/java/org/apache/samza/clustermanager/ContainerProcessManager.java, > > line 385 > > <https://reviews.apache.org/r/44920/diff/9/?file=1355967#file1355967line385> > > > > It seems that we are only logging the exceptionOccurred here to set > > shouldShutdown() to true. We need to be able to tell whether the shutdown > > is normal shutdown or caused by exception. Shouldn't we also keep the > > throwable s.t. we know the exception caused shutdown here? Correct me if I'm wrong - But, if exceptionOccured is true, it always means that the shutdown is abnormal. Any irrecoverable exception occuring in the ClusterResourceManager will result in this exception. We log an error indicating the exception. (I'm not sure how keeping a throwable as a member will help.) > On April 26, 2016, 12:23 a.m., Yi Pan (Data Infrastructure) wrote: > > samza-core/src/main/java/org/apache/samza/clustermanager/ResourceRequestState.java, > > line 65 > > <https://reviews.apache.org/r/44920/diff/9/?file=1355971#file1355971line65> > > > > nit: it seems that we can remove this and use synchronized keyword in > > the public methods that this lock object is used. The original design used synchronized on methods. After discussing with Chris, we thought that it'd be better for the refactor to use a final variable as a lock (or a reentrant Lock). This is because there is nothing preventing someone holding an instance of ResourceRequestState from using it as a lock elsewhere, causing unexpected deadlocks. > On April 26, 2016, 12:23 a.m., Yi Pan (Data Infrastructure) wrote: > > samza-core/src/main/java/org/apache/samza/config/ClusterManagerConfig.java, > > line 38 > > <https://reviews.apache.org/r/44920/diff/9/?file=1355977#file1355977line38> > > > > let's remove the 'refactor'. Sure, fixed :) thanks for the feedback on the packaging. > On April 26, 2016, 12:23 a.m., Yi Pan (Data Infrastructure) wrote: > > samza-core/src/main/scala/org/apache/samza/checkpoint/CheckpointTool.scala, > > line 144 > > <https://reviews.apache.org/r/44920/diff/9/?file=1355981#file1355981line144> > > > > Weird to see the val name is coordinator and the constructor is > > JobModelManager() Nice catch, fixed! > On April 26, 2016, 12:23 a.m., Yi Pan (Data Infrastructure) wrote: > > samza-shell/src/main/bash/run-am.sh, line 26 > > <https://reviews.apache.org/r/44920/diff/9/?file=1356001#file1356001line26> > > > > nit: remove the empty line change. Fixed. > On April 26, 2016, 12:23 a.m., Yi Pan (Data Infrastructure) wrote: > > samza-yarn/src/main/java/org/apache/samza/job/yarn/refactor/YarnAppState.java, > > line 1 > > <https://reviews.apache.org/r/44920/diff/9/?file=1356006#file1356006line1> > > > > nit: I don't think that we need explicit 'refactor' in the package name. Fixed. > On April 26, 2016, 12:23 a.m., Yi Pan (Data Infrastructure) wrote: > > samza-yarn/src/main/java/org/apache/samza/job/yarn/refactor/YarnClusterResourceManager.java, > > line 1 > > <https://reviews.apache.org/r/44920/diff/9/?file=1356007#file1356007line1> > > > > same here. No need for the extra level of "refactor". Fixed > On April 26, 2016, 12:23 a.m., Yi Pan (Data Infrastructure) wrote: > > samza-yarn/src/main/java/org/apache/samza/job/yarn/refactor/YarnClusterResourceManager.java, > > line 372 > > <https://reviews.apache.org/r/44920/diff/9/?file=1356007#file1356007line372> > > > > nit: it would be nice to add a note here that these unimplemented > > interfaces are methods that are specific to YARN AMRMClient. Great point. I've added the note. Thanks for the careful review. > On April 26, 2016, 12:23 a.m., Yi Pan (Data Infrastructure) wrote: > > samza-yarn/src/main/java/org/apache/samza/job/yarn/refactor/YarnContainerRunner.java, > > line 93 > > <https://reviews.apache.org/r/44920/diff/9/?file=1356008#file1356008line93> > > > > Why? I thought that it is useful, at least for logging? If we truly > > don't need it, let's remove it. > > > > BTW, how does container launched on the remote NM know about the > > containerId if it is not passed via command line here? Yes, currently, it is useful for logging in this method. I've removed the todo. The motivation behind the TODO was the launchStreamProcessor(SamzaResource resource, CommandBuilder builder) API in the ClusterResourceManager. That does not take in any containerID, It merely takes in a resource and a builder that encapsulates all context (including the containerID). Since, this makes the TODO is redundant, I've removed it. The containerId is passed via command-line as an environment variable when the container is started. L:110, L:194 of YarnContainerRunner ensure this. > On April 26, 2016, 12:23 a.m., Yi Pan (Data Infrastructure) wrote: > > samza-yarn/src/main/scala/org/apache/samza/job/yarn/refactor/SamzaAppMasterService.scala, > > line 1 > > <https://reviews.apache.org/r/44920/diff/9/?file=1356012#file1356012line1> > > > > Maybe change it to YarnAppMasterWebService and remove the 'refactor' in > > the path? I've renamed it to SamzaYarnAppMasterService (so that its similar to SamzaYarnAppMasterLifecycle and other clases) and removed the refactor from the path.. - Jagadish ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/44920/#review130448 ----------------------------------------------------------- On April 23, 2016, 12:21 a.m., Jagadish Venkatraman wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/44920/ > ----------------------------------------------------------- > > (Updated April 23, 2016, 12:21 a.m.) > > > Review request for samza, Boris Shkolnik, Chris Pettitt, Jake Maes, Yi Pan > (Data Infrastructure), Navina Ramesh, and Xinyu Liu. > > > Repository: samza > > > Description > ------- > > 1.Proposed new APIs for running Samza without Yarn. (SAMZA-881) > - Defined the ClusterResourceManager abstraction. This will abstract out > Yarn and Mesos's interaction with Samza. > - Defined the SamzaResource, SamzaResourceRequest. > - Re-wrote the SamzaAppMaster logic into a ClusterBasedJobCoordinator. > 2.Defined a ClusterManagerConfig to handle configurations independent of > Yarn/Mesos. > 3.Made Samza completely independent of Yarn. This cleanly separates Samza > specific components and Yarn > specific components. > 4.Readability improvements to the existing code base. > -Added explicity documentation for every method, member and class > (including on thread-safety) > - Made internal variables final to document intent, visibility across > threads. (trivially by adding modifiers, or by changing where they're > initialized.) > 5.Refactored JobCoordinator into a JobModelReader. > > == Diff2 == > Address Chriss review feedback. > > Design Doc: > https://issues.apache.org/jira/secure/attachment/12800342/Samza%20JobCoordinator%20Re-design%20Proposal_1.pdf > > == Diff 3 == > Address Yi's feedback > > == Diff 4 == > Sync with current master > > > Diffs > ----- > > checkstyle/import-control.xml c15b8e74de8e5aac5ac83278c52ab3dba1630e50 > > samza-core/src/main/java/org/apache/samza/clustermanager/AbstractContainerAllocator.java > PRE-CREATION > > samza-core/src/main/java/org/apache/samza/clustermanager/ClusterBasedJobCoordinator.java > PRE-CREATION > > samza-core/src/main/java/org/apache/samza/clustermanager/ClusterResourceManager.java > PRE-CREATION > > samza-core/src/main/java/org/apache/samza/clustermanager/ContainerAllocator.java > PRE-CREATION > > samza-core/src/main/java/org/apache/samza/clustermanager/ContainerProcessManager.java > PRE-CREATION > > samza-core/src/main/java/org/apache/samza/clustermanager/HostAwareContainerAllocator.java > PRE-CREATION > > samza-core/src/main/java/org/apache/samza/clustermanager/ResourceFailure.java > PRE-CREATION > > samza-core/src/main/java/org/apache/samza/clustermanager/ResourceManagerFactory.java > PRE-CREATION > > samza-core/src/main/java/org/apache/samza/clustermanager/ResourceRequestState.java > PRE-CREATION > samza-core/src/main/java/org/apache/samza/clustermanager/SamzaAppState.java > PRE-CREATION > > samza-core/src/main/java/org/apache/samza/clustermanager/SamzaContainerLaunchException.java > PRE-CREATION > samza-core/src/main/java/org/apache/samza/clustermanager/SamzaResource.java > PRE-CREATION > > samza-core/src/main/java/org/apache/samza/clustermanager/SamzaResourceRequest.java > PRE-CREATION > > samza-core/src/main/java/org/apache/samza/clustermanager/SamzaResourceStatus.java > PRE-CREATION > samza-core/src/main/java/org/apache/samza/config/ClusterManagerConfig.java > PRE-CREATION > samza-core/src/main/java/org/apache/samza/container/LocalityManager.java > a3281c2c5f481a78cc4ae791c77d0e9805202477 > > samza-core/src/main/java/org/apache/samza/container/grouper/task/TaskAssignmentManager.java > ec5cf3da4d1967cf586cdf074262a1f42f1efb75 > samza-core/src/main/java/org/apache/samza/storage/StorageRecovery.java > 0324e90a20c2fd31d1b7c6a0707aa3685a1cec20 > samza-core/src/main/scala/org/apache/samza/checkpoint/CheckpointTool.scala > 31b208f47b12a4e9ef1134b1c0bfe532f6c07a80 > samza-core/src/main/scala/org/apache/samza/coordinator/JobCoordinator.scala > 384b2e777c73fc1e4bc8a29312c9ea5372162ca1 > samza-core/src/main/scala/org/apache/samza/job/local/ProcessJob.scala > 66618165d27aa916238cc86b27631c5db3435c6a > > samza-core/src/main/scala/org/apache/samza/job/local/ProcessJobFactory.scala > 17c2e5be9ee216c88e3c07784c4f9c05cd92e9c9 > samza-core/src/main/scala/org/apache/samza/job/local/ThreadJobFactory.scala > 5acfe875d3a3d9842497e646f0f04ea2861ae950 > > samza-core/src/main/scala/org/apache/samza/metrics/ContainerProcessManagerMetrics.scala > PRE-CREATION > > samza-core/src/test/java/org/apache/samza/clustermanager/MockClusterResourceManager.java > PRE-CREATION > > samza-core/src/test/java/org/apache/samza/clustermanager/MockClusterResourceManagerCallback.java > PRE-CREATION > > samza-core/src/test/java/org/apache/samza/clustermanager/MockContainerAllocator.java > PRE-CREATION > > samza-core/src/test/java/org/apache/samza/clustermanager/MockContainerListener.java > PRE-CREATION > > samza-core/src/test/java/org/apache/samza/clustermanager/MockContainerRequestState.java > PRE-CREATION > > samza-core/src/test/java/org/apache/samza/clustermanager/MockHttpServer.java > PRE-CREATION > > samza-core/src/test/java/org/apache/samza/clustermanager/TestContainerAllocator.java > PRE-CREATION > > samza-core/src/test/java/org/apache/samza/clustermanager/TestContainerProcessManager.java > PRE-CREATION > > samza-core/src/test/java/org/apache/samza/clustermanager/TestContainerRequestState.java > PRE-CREATION > > samza-core/src/test/java/org/apache/samza/clustermanager/TestHostAwareContainerAllocator.java > PRE-CREATION > > samza-core/src/test/scala/org/apache/samza/container/TestSamzaContainer.scala > 9df12d2974c686c07457b29d873fd3e9dab72e21 > > samza-core/src/test/scala/org/apache/samza/coordinator/TestJobCoordinator.scala > 110c3a910aa0bae77dfe5eebbf82286b56dc4654 > samza-core/src/test/scala/org/apache/samza/job/local/TestProcessJob.scala > 3a710a82f6104dcbdcfcb9f166fe05003074c4b0 > > samza-log4j/src/main/java/org/apache/samza/logging/log4j/StreamAppender.java > 0c6329ede9b3df4dc05125729b5b44ba2c98803a > samza-shell/src/main/bash/run-am.sh > 9545a96953baaff17ad14962e02bc12aadbb1101 > samza-shell/src/main/bash/run-jc.sh PRE-CREATION > > samza-yarn/src/main/java/org/apache/samza/job/yarn/AbstractContainerAllocator.java > b4789e62beb1120f11a8101664b10c34ae930e58 > samza-yarn/src/main/java/org/apache/samza/job/yarn/SamzaAppState.java > 77280bab8aeb242b34b5b780c84e6deab1a45f51 > samza-yarn/src/main/java/org/apache/samza/job/yarn/SamzaTaskManager.java > caee6e6c182d3cf86bd4fe193f8b1797605b2480 > > samza-yarn/src/main/java/org/apache/samza/job/yarn/refactor/YarnAppState.java > PRE-CREATION > > samza-yarn/src/main/java/org/apache/samza/job/yarn/refactor/YarnClusterResourceManager.java > PRE-CREATION > > samza-yarn/src/main/java/org/apache/samza/job/yarn/refactor/YarnContainerRunner.java > PRE-CREATION > > samza-yarn/src/main/java/org/apache/samza/job/yarn/refactor/YarnResourceManagerFactory.java > PRE-CREATION > > samza-yarn/src/main/java/org/apache/samza/validation/YarnJobValidationTool.java > 70f1e4f7663560e61b7aaa757946adbe03d2bd76 > samza-yarn/src/main/scala/org/apache/samza/job/yarn/SamzaAppMaster.scala > 80deb3b18c094d83af67535f9d0156f18ae3f5e4 > > samza-yarn/src/main/scala/org/apache/samza/job/yarn/refactor/SamzaAppMasterService.scala > PRE-CREATION > > samza-yarn/src/main/scala/org/apache/samza/job/yarn/refactor/SamzaYarnAppMasterLifecycle.scala > PRE-CREATION > > samza-yarn/src/test/java/org/apache/samza/job/yarn/TestContainerAllocator.java > b253f98f7258bb611e1ad6672f74b07ab7e20b70 > > samza-yarn/src/test/java/org/apache/samza/job/yarn/TestHostAwareContainerAllocator.java > 93e430b6ee986b06ecdac4979552d774724a1fbd > > samza-yarn/src/test/java/org/apache/samza/job/yarn/TestSamzaTaskManager.java > faa697db49ec1c11d76c88d919a356a5ae409a15 > > samza-yarn/src/test/java/org/apache/samza/job/yarn/util/MockContainerListener.java > cb82cccf75b54cfbefd586700e8283cb41173833 > > samza-yarn/src/test/java/org/apache/samza/job/yarn/util/MockContainerRequestState.java > 879a7d0d06b087cfe0417f3fa5801b43ac7fc458 > > samza-yarn/src/test/scala/org/apache/samza/job/yarn/TestSamzaAppMasterLifecycle.scala > 30cf34fe1fd3f74537d16e8a51b467cd50835357 > > samza-yarn/src/test/scala/org/apache/samza/job/yarn/TestSamzaAppMasterService.scala > 7f5d9f4af088589d4287c26737bae25567c157d7 > > Diff: https://reviews.apache.org/r/44920/diff/ > > > Testing > ------- > > Tested with sample jobs on clusters of varying sizes / loads for host > affinity. Also tested locally on a local yarn cluster. > > Added Unit tests. > > > Thanks, > > Jagadish Venkatraman > >