> On March 16, 2016, 8:48 p.m., Chris Pettitt wrote:
> > samza-core/src/main/java/org/apache/samza/clustermanager/AbstractContainerAllocator.java,
> >  line 46
> > <https://reviews.apache.org/r/44920/diff/1/?file=1301262#file1301262line46>
> >
> >     Discussed with Jagadish offline, but summarizing here:
> >     
> >     I think we could simplify some of the interactions across classes if we 
> > make this a concrete task that takes a container allocator strategy 
> > interface (e.g. standard or host aware). A few nice properties that fall 
> > out:
> >     
> >     1. The allocators don't need to know the inner workings of 
> > AbstractContainerAllocator (ACA). In fact, the implementation of ACA can 
> > change without breaking allocators.
> >     2. It easy to test just the allocator strategy in isolation because it 
> > becomes more functional in nature (e.g. no direct dependency on state in 
> > ACA).
> >     3. Easier to code review due to less cross-class interactions ;)

Great point. Thanks for the insight Chris. I just logged SAMZA-900 for this.


> On March 16, 2016, 8:48 p.m., Chris Pettitt wrote:
> > samza-core/src/main/java/org/apache/samza/clustermanager/SamzaAppState.java,
> >  line 36
> > <https://reviews.apache.org/r/44920/diff/1/?file=1301270#file1301270line36>
> >
> >     Agreed. This is a bit mix of atomic and mutable state. Either make it 
> > thread safe (e.g. lock down mutability of fields), ensure that remaining 
> > mutable state can be updated atomically, or decide it is not intended to be 
> > thread safe and get rid of additional locks / volatile accesses that add 
> > cost with no gain. Former is preferred :).

Great feedback Chris! I could not agree more! :-)

SamzaAppState class is a currently a source of major problems. I did not want 
to touch it (as it was not scoped in this refactoring). Upon digging further, I 
realize the problem of making this thread-safe/private is slightly involved.

1. There is a jobCoordinator object that is exposed publicly as a part of 
SamzaAppState. The jobCoordinator inturn exposes a nested jobModel instance 
directly thorough its accessor. The JobModel embeds a LocalityManager that 
mutates state during some public method calls. Hence, The jobModel instance is 
*not* thread-safe and is shared concurrently across the UI threads, the HTTP 
server threads in the queued thread pool,the SamzaAppMaster thread. (Created 
SAMZA-899 to make the JobModel immutable)

2. There are a bunch of state data structures that are publicly exposed in 
SamzaAppState. These must be made thread-safe into accessors. These public 
global variables could be mutated everywhere in Samza without regard for 
safety/visibility or correctness. For example, there is an integer 
containerCount that is public which is manipulated by both the metrics reporter 
and the callback threads. (I created SAMZA-901 to track this)

I will work on both of these as these ASAP.


- Jagadish


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/44920/#review123902
-----------------------------------------------------------


On March 16, 2016, 6:23 p.m., Jagadish Venkatraman wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/44920/
> -----------------------------------------------------------
> 
> (Updated March 16, 2016, 6:23 p.m.)
> 
> 
> Review request for samza, Boris Shkolnik, Chris Pettitt, Jake Maes, Yi Pan 
> (Data Infrastructure), Navina Ramesh, and Xinyu Liu.
> 
> 
> Repository: samza
> 
> 
> Description
> -------
> 
> Samza currently has tight coupling with Yarn. This makes it impossible to 
> integrate with other resource managers like Mesos, or to run standalone 
> without any resource manager. This RB is a step to implementing SAMZA-881.
> 
> Design Doc: 
> https://issues.apache.org/jira/secure/attachment/12790540/SamzaJobCoordinatorRe-designProposal.pdf
> 
> 1.Proposed new APIs for a resource manager to integrate with Samza. 
> (SAMZA-881)
>    - Defined the ContainerProcessManager abstraction, SamzaResource, 
> SamzaResourceRequest. 
>    - Re-wrote the SamzaAppMaster into a ClusterBasedJobCoordinator.
>    - Re-wrote yarn specific request logic by abstracting it into a 
> YarnContainerManager. 
> 2.Defined a ClusterManagerConfig to handle configurations independent of 
> Yarn/Mesos.
> 3.Made Samza's cluster interaction independent of Yarn. This separates Samza 
> specific components into samza-core and Yarn components into samza-yarn.
> 4.Readability improvements to the existing code base.
>    -Added docs for most methods, member variables and classes (including on 
> thread-safety)
>    - Made internal variables final to document intent, visibility across 
> threads. (trivially by adding modifiers, or by changing where they're 
> initialized.)
> 5.Refactored JobCoordinator into a JobModelReader.
> 
> TODO: Can go into the upcoming release. (P0)
> 1.Refactor the UI state variables and tests. Port some method re-orgs from 
> SAMZA-867 into here.
> 2.Revise packaging structure.
> 4.Document new configs.
> 5.Rename run-am.sh to run-coordinator.sh, Delete all files in the 
> non-refactored namespace. (For unit-testing, these files continue to exist)
> 
> TODO: (P1)
> 1.Build Mesos integration for Samza. Should be simpler to integrate with the 
> newer APIs.
>   - I started on this, and I plan to refine and post an RB in one of the 
> hack-days.
> 2.Refactor the SamzaAppState class to provide more accessors and eliminate 
> public variables. (This was
> a consequence of the already existing design which I've tried to be 
> compatible with)
> 
> TODO: I plan to track these with JIRAs so that they can be done later. (P2)
> 1.Get rid of the HTTP Server in the JobCoordinator
> 2.Make YarnJobCoordinator implement the JobCoordinator API as SAMZA-881.
> 
> 
> Diffs
> -----
> 
>   
> samza-core/src/main/java/org/apache/samza/clustermanager/AbstractContainerAllocator.java
>  PRE-CREATION 
>   
> samza-core/src/main/java/org/apache/samza/clustermanager/ClusterBasedJobCoordinator.java
>  PRE-CREATION 
>   
> samza-core/src/main/java/org/apache/samza/clustermanager/ContainerAllocator.java
>  PRE-CREATION 
>   
> samza-core/src/main/java/org/apache/samza/clustermanager/ContainerManagerFactory.java
>  PRE-CREATION 
>   
> samza-core/src/main/java/org/apache/samza/clustermanager/ContainerProcessManager.java
>  PRE-CREATION 
>   
> samza-core/src/main/java/org/apache/samza/clustermanager/ContainerRequestState.java
>  PRE-CREATION 
>   
> samza-core/src/main/java/org/apache/samza/clustermanager/HostAwareContainerAllocator.java
>  PRE-CREATION 
>   
> samza-core/src/main/java/org/apache/samza/clustermanager/ResourceFailure.java 
> PRE-CREATION 
>   samza-core/src/main/java/org/apache/samza/clustermanager/SamzaAppState.java 
> PRE-CREATION 
>   
> samza-core/src/main/java/org/apache/samza/clustermanager/SamzaContainerLaunchException.java
>  PRE-CREATION 
>   samza-core/src/main/java/org/apache/samza/clustermanager/SamzaResource.java 
> PRE-CREATION 
>   
> samza-core/src/main/java/org/apache/samza/clustermanager/SamzaResourceRequest.java
>  PRE-CREATION 
>   
> samza-core/src/main/java/org/apache/samza/clustermanager/SamzaResourceStatus.java
>  PRE-CREATION 
>   
> samza-core/src/main/java/org/apache/samza/clustermanager/SamzaTaskManager.java
>  PRE-CREATION 
>   samza-core/src/main/java/org/apache/samza/config/ClusterManagerConfig.java 
> PRE-CREATION 
>   samza-core/src/main/scala/org/apache/samza/coordinator/JobModelReader.scala 
> PRE-CREATION 
>   
> samza-core/src/main/scala/org/apache/samza/metrics/SamzaAppMasterMetrics.scala
>  PRE-CREATION 
>   samza-shell/src/main/bash/run-am.sh 
> 9545a96953baaff17ad14962e02bc12aadbb1101 
>   
> samza-yarn/src/main/java/org/apache/samza/job/yarn/refactor/YarnAppState.java 
> PRE-CREATION 
>   
> samza-yarn/src/main/java/org/apache/samza/job/yarn/refactor/YarnContainerManager.java
>  PRE-CREATION 
>   
> samza-yarn/src/main/java/org/apache/samza/job/yarn/refactor/YarnContainerManagerFactory.java
>  PRE-CREATION 
>   
> samza-yarn/src/main/java/org/apache/samza/job/yarn/refactor/YarnContainerRunner.java
>  PRE-CREATION 
>   
> samza-yarn/src/main/scala/org/apache/samza/job/yarn/YarnAppMasterListener.scala
>  6bf3046a1ae4ed8f57500acae763184084ad0e09 
>   
> samza-yarn/src/main/scala/org/apache/samza/job/yarn/refactor/SamzaAppMasterService.scala
>  PRE-CREATION 
>   
> samza-yarn/src/main/scala/org/apache/samza/job/yarn/refactor/SamzaYarnAppMasterLifecycle.scala
>  PRE-CREATION 
>   
> samza-yarn/src/test/scala/org/apache/samza/job/yarn/TestSamzaAppMasterLifecycle.scala
>  30cf34fe1fd3f74537d16e8a51b467cd50835357 
>   
> samza-yarn/src/test/scala/org/apache/samza/job/yarn/TestSamzaAppMasterService.scala
>  7f5d9f4af088589d4287c26737bae25567c157d7 
> 
> Diff: https://reviews.apache.org/r/44920/diff/
> 
> 
> Testing
> -------
> 
> 1.Tested with running test jobs in Yarn clusters of varying sizes from 1 node 
> to 36 nodes.
> 2.Tested for failures by killing containers, and ensuring they were brought 
> up again.
> 
> TODO:
> 1.Refactor all unit tests, and ensure updates to state variables are 
> consistent with the current design.
> 2.More testing with test jobs on clusters.
> 
> 
> Thanks,
> 
> Jagadish Venkatraman
> 
>

Reply via email to