> On March 16, 2016, 11:10 p.m., Chris Pettitt wrote: > > Some more comments > > Chris Pettitt wrote: > Sorry, did not mean to create issues for all of the below. I think #1 and > #2 are the most interesting to look at of the group.
I did not want to loose comments, So, I'm publishing them. I'll update the RB with the revised draft. > On March 16, 2016, 11:10 p.m., Chris Pettitt wrote: > > samza-core/src/main/java/org/apache/samza/clustermanager/SamzaAppState.java, > > line 83 > > <https://reviews.apache.org/r/44920/diff/1/?file=1301270#file1301270line83> > > > > This should be final, especially as we have other code that is relying > > on it to be a) correct and b) visible (from a thread-safety perspective) > > for proper shutdown. Agreed! I've made containerCount an atomic integer. I believe that currently solves the visibility, atomicity problem for now for that field. Long term, I believe SamzaAppState should be thread-safe (using copy-on-write), with private final variables, and accessors. However, making this thread-safe/private is a change that is more involved. (It will be a separate effort altogether - SAMZA-902 tracks this) > On March 16, 2016, 11:10 p.m., Chris Pettitt wrote: > > samza-core/src/main/java/org/apache/samza/clustermanager/SamzaTaskManager.java, > > line 147 > > <https://reviews.apache.org/r/44920/diff/1/?file=1301275#file1301275line147> > > > > Should we wait forever or fail after some reasonable timout? The current implementation waits for ever. I did not feel compelled to change it in this refactoring. I think it's okay to wait - because, allocator.stop() is called just before. That sets the flag (volatile) to false. So, the next time the allocator thread sees this flag, it will shutdown. > On March 16, 2016, 11:10 p.m., Chris Pettitt wrote: > > samza-core/src/main/scala/org/apache/samza/coordinator/JobModelReader.scala, > > line 59 > > <https://reviews.apache.org/r/44920/diff/1/?file=1301277#file1301277line59> > > > > It's not obvious - I'm probably missing it - but it seems that we only > > write to this field in getJobCoordinator. If that's the case then we could > > just move this into that function instead of storing it here as a volatile. I've gotten rid of it. It does not seem to be used elsewhere. > On March 16, 2016, 11:10 p.m., Chris Pettitt wrote: > > samza-yarn/src/main/java/org/apache/samza/job/yarn/refactor/YarnContainerManager.java, > > line 86 > > <https://reviews.apache.org/r/44920/diff/1/?file=1301281#file1301281line86> > > > > Minor: may as well use a diamond here since you're using it on the next > > line. Good point. Fixed. - Jagadish ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/44920/#review123928 ----------------------------------------------------------- On March 16, 2016, 6:23 p.m., Jagadish Venkatraman wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/44920/ > ----------------------------------------------------------- > > (Updated March 16, 2016, 6:23 p.m.) > > > Review request for samza, Boris Shkolnik, Chris Pettitt, Jake Maes, Yi Pan > (Data Infrastructure), Navina Ramesh, and Xinyu Liu. > > > Repository: samza > > > Description > ------- > > Samza currently has tight coupling with Yarn. This makes it impossible to > integrate with other resource managers like Mesos, or to run standalone > without any resource manager. This RB is a step to implementing SAMZA-881. > > Design Doc: > https://issues.apache.org/jira/secure/attachment/12790540/SamzaJobCoordinatorRe-designProposal.pdf > > 1.Proposed new APIs for a resource manager to integrate with Samza. > (SAMZA-881) > - Defined the ContainerProcessManager abstraction, SamzaResource, > SamzaResourceRequest. > - Re-wrote the SamzaAppMaster into a ClusterBasedJobCoordinator. > - Re-wrote yarn specific request logic by abstracting it into a > YarnContainerManager. > 2.Defined a ClusterManagerConfig to handle configurations independent of > Yarn/Mesos. > 3.Made Samza's cluster interaction independent of Yarn. This separates Samza > specific components into samza-core and Yarn components into samza-yarn. > 4.Readability improvements to the existing code base. > -Added docs for most methods, member variables and classes (including on > thread-safety) > - Made internal variables final to document intent, visibility across > threads. (trivially by adding modifiers, or by changing where they're > initialized.) > 5.Refactored JobCoordinator into a JobModelReader. > > TODO: Can go into the upcoming release. (P0) > 1.Refactor the UI state variables and tests. Port some method re-orgs from > SAMZA-867 into here. > 2.Revise packaging structure. > 4.Document new configs. > 5.Rename run-am.sh to run-coordinator.sh, Delete all files in the > non-refactored namespace. (For unit-testing, these files continue to exist) > > TODO: (P1) > 1.Build Mesos integration for Samza. Should be simpler to integrate with the > newer APIs. > - I started on this, and I plan to refine and post an RB in one of the > hack-days. > 2.Refactor the SamzaAppState class to provide more accessors and eliminate > public variables. (This was > a consequence of the already existing design which I've tried to be > compatible with) > > TODO: I plan to track these with JIRAs so that they can be done later. (P2) > 1.Get rid of the HTTP Server in the JobCoordinator > 2.Make YarnJobCoordinator implement the JobCoordinator API as SAMZA-881. > > > Diffs > ----- > > > samza-core/src/main/java/org/apache/samza/clustermanager/AbstractContainerAllocator.java > PRE-CREATION > > samza-core/src/main/java/org/apache/samza/clustermanager/ClusterBasedJobCoordinator.java > PRE-CREATION > > samza-core/src/main/java/org/apache/samza/clustermanager/ContainerAllocator.java > PRE-CREATION > > samza-core/src/main/java/org/apache/samza/clustermanager/ContainerManagerFactory.java > PRE-CREATION > > samza-core/src/main/java/org/apache/samza/clustermanager/ContainerProcessManager.java > PRE-CREATION > > samza-core/src/main/java/org/apache/samza/clustermanager/ContainerRequestState.java > PRE-CREATION > > samza-core/src/main/java/org/apache/samza/clustermanager/HostAwareContainerAllocator.java > PRE-CREATION > > samza-core/src/main/java/org/apache/samza/clustermanager/ResourceFailure.java > PRE-CREATION > samza-core/src/main/java/org/apache/samza/clustermanager/SamzaAppState.java > PRE-CREATION > > samza-core/src/main/java/org/apache/samza/clustermanager/SamzaContainerLaunchException.java > PRE-CREATION > samza-core/src/main/java/org/apache/samza/clustermanager/SamzaResource.java > PRE-CREATION > > samza-core/src/main/java/org/apache/samza/clustermanager/SamzaResourceRequest.java > PRE-CREATION > > samza-core/src/main/java/org/apache/samza/clustermanager/SamzaResourceStatus.java > PRE-CREATION > > samza-core/src/main/java/org/apache/samza/clustermanager/SamzaTaskManager.java > PRE-CREATION > samza-core/src/main/java/org/apache/samza/config/ClusterManagerConfig.java > PRE-CREATION > samza-core/src/main/scala/org/apache/samza/coordinator/JobModelReader.scala > PRE-CREATION > > samza-core/src/main/scala/org/apache/samza/metrics/SamzaAppMasterMetrics.scala > PRE-CREATION > samza-shell/src/main/bash/run-am.sh > 9545a96953baaff17ad14962e02bc12aadbb1101 > > samza-yarn/src/main/java/org/apache/samza/job/yarn/refactor/YarnAppState.java > PRE-CREATION > > samza-yarn/src/main/java/org/apache/samza/job/yarn/refactor/YarnContainerManager.java > PRE-CREATION > > samza-yarn/src/main/java/org/apache/samza/job/yarn/refactor/YarnContainerManagerFactory.java > PRE-CREATION > > samza-yarn/src/main/java/org/apache/samza/job/yarn/refactor/YarnContainerRunner.java > PRE-CREATION > > samza-yarn/src/main/scala/org/apache/samza/job/yarn/YarnAppMasterListener.scala > 6bf3046a1ae4ed8f57500acae763184084ad0e09 > > samza-yarn/src/main/scala/org/apache/samza/job/yarn/refactor/SamzaAppMasterService.scala > PRE-CREATION > > samza-yarn/src/main/scala/org/apache/samza/job/yarn/refactor/SamzaYarnAppMasterLifecycle.scala > PRE-CREATION > > samza-yarn/src/test/scala/org/apache/samza/job/yarn/TestSamzaAppMasterLifecycle.scala > 30cf34fe1fd3f74537d16e8a51b467cd50835357 > > samza-yarn/src/test/scala/org/apache/samza/job/yarn/TestSamzaAppMasterService.scala > 7f5d9f4af088589d4287c26737bae25567c157d7 > > Diff: https://reviews.apache.org/r/44920/diff/ > > > Testing > ------- > > 1.Tested with running test jobs in Yarn clusters of varying sizes from 1 node > to 36 nodes. > 2.Tested for failures by killing containers, and ensuring they were brought > up again. > > TODO: > 1.Refactor all unit tests, and ensure updates to state variables are > consistent with the current design. > 2.More testing with test jobs on clusters. > > > Thanks, > > Jagadish Venkatraman > >