-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/48080/#review135892
-----------------------------------------------------------



The code is super clean and well tested! 

Mostly conceptual questions below. 

Also, I think we need an entry in the config table for the new configs:
docs/learn/documentation/versioned/jobs/configuration-table.html


samza-core/src/main/java/org/apache/samza/container/disk/DiskQuotaEnforcer.java 
(line 64)
<https://reviews.apache.org/r/48080/#comment200916>

    This comment suggests the quota is per-task, but the config structure 
"container.disk.quota.bytes" suggests it is per-container. Perhaps the term 
"task" is overloaded and I'm misinterpreting it.



samza-core/src/main/java/org/apache/samza/container/disk/DiskQuotaEnforcer.java 
(line 72)
<https://reviews.apache.org/r/48080/#comment200969>

    Just an observation. This could be implemented as a Segment Tree
    https://en.wikipedia.org/wiki/Segment_tree



samza-core/src/main/java/org/apache/samza/container/disk/DiskQuotaPolicy.java 
(lines 23 - 24)
<https://reviews.apache.org/r/48080/#comment200915>

    Is this the "percentage of available disk space" on the entire disk, or the 
percentage available to the Task?



samza-core/src/main/java/org/apache/samza/container/disk/DiskQuotaPolicy.java 
(line 28)
<https://reviews.apache.org/r/48080/#comment200968>

    I'm not sure I see the vision. It seems like this patch is expecting users 
to configure a set of policies with non overlapping watermarks and the 
DiskQuotaEnforcer relies on this paradigm. 
    
    Maybe because this patch doesn't add a config table update describing how 
to compose a set of policies to achieve a desired behavior, but I found it 
difficult to reason about how this works without drawing it out on my notepad. 
I had to explore the code to find details like:
    1. Whether the watermark values correspond to diskQuotaUtilization or 
diskQuotaRemainingPercentage
    2. Whether the watermarks are allowed to overlap
    3. If the watermarks overlap, is the workFactor additive, multiplicative, 
or precedence?
    
    To me though, the simplest representation of a throttling policy is a 
simple function:
    workfactor = policy.apply(quotaUtilizationPct)
    
    This, as an API, would support both user-defined policy functions or an 
encapsulated implementation of this watermark system.



samza-core/src/main/java/org/apache/samza/container/disk/DiskQuotaPolicy.java 
(lines 34 - 35)
<https://reviews.apache.org/r/48080/#comment200967>

    Should there be any verification that lowWaterMarkPercent <= 
highWaterMarkPercent and that both of them should be values betweeen 0.0 and 
1.0?
    
    Actually, I found the former verification in DiskQuotaEnforcer, but 
expected to find it here. I don't see any verification of the min watermark.



samza-core/src/main/java/org/apache/samza/util/ThrottlingExecutor.java (line 70)
<https://reviews.apache.org/r/48080/#comment200984>

    Won't this continue to add more delay if the workFactor stays constant? So 
it becomes an rapid decay instead of a static throttle value for each policy?
    
    It seems the job would essentially halt after a small number of loops for 
any factor other than an infintesimally small one.
    
    Also, this behavior seems to overlap with the multiple policies. When I 
first saw the multiple policies, I thought, "ok, so to implement a linear 
decay, I'd specify a set of policies, each one with a tighter throttle" I was 
assuming the throttle rate was static. But seeing this additive code, I can't 
imagine using multiple policies because the job will rapidly slow with even 
just one.


- Jake Maes


On May 31, 2016, 5:27 p.m., Chris Pettitt wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/48080/
> -----------------------------------------------------------
> 
> (Updated May 31, 2016, 5:27 p.m.)
> 
> 
> Review request for samza.
> 
> 
> Repository: samza
> 
> 
> Description
> -------
> 
> This change introduces a ThrottlingExecutor which is used to control the
> rate of execution in the main run loop. The DiskQuotaEnforcer houses the
> rules for switching from one DiskQuotaPolicy to the next as new disk
> usage samples arrive.
> 
> By default, no throttling will occur. New policies can be added using
> the following form:
> 
> ```
> container.disk.quota.bytes=XXX
> container.disk.quota.policy.count=2
> container.disk.quota.policy.0.lowWaterMark=0.4
> container.disk.quota.policy.0.highWaterMark=0.5
> container.disk.quota.policy.0.workFactor=0.5
> container.disk.quota.policy.1.lowWaterMark=0.05
> container.disk.quota.policy.1.highWaterMark=0.1
> container.disk.quota.policy.1.workFactor=0.05
> ```
> 
> See ThrottlingExecutor for details about how the work factor works and
> DiskQuotaPolicy for details about how the low and high water mark
> configuration work.
> 
> 
> Diffs
> -----
> 
>   
> samza-core/src/main/java/org/apache/samza/container/disk/DiskQuotaEnforcer.java
>  PRE-CREATION 
>   
> samza-core/src/main/java/org/apache/samza/container/disk/DiskQuotaPolicy.java 
> PRE-CREATION 
>   
> samza-core/src/main/java/org/apache/samza/container/disk/DiskSpaceMonitor.java
>  2a565be7858a4d3a6adbc49989b43b71ca3f6721 
>   samza-core/src/main/java/org/apache/samza/util/HighResolutionClock.java 
> PRE-CREATION 
>   
> samza-core/src/main/java/org/apache/samza/util/SystemHighResolutionClock.java 
> PRE-CREATION 
>   samza-core/src/main/java/org/apache/samza/util/ThrottlingExecutor.java 
> PRE-CREATION 
>   samza-core/src/main/scala/org/apache/samza/container/RunLoop.scala 
> 3f25eca6e3dffc57360e8bd8c435177c2a9a910a 
>   
> samza-core/src/main/scala/org/apache/samza/container/SameThreadExecutor.scala 
> PRE-CREATION 
>   samza-core/src/main/scala/org/apache/samza/container/SamzaContainer.scala 
> cf3c4c0ab08a59760bc899c6f2027755e933b350 
>   
> samza-core/src/main/scala/org/apache/samza/container/SamzaContainerMetrics.scala
>  9e6641c3628290dc05e1eb5537e86bff9d37f92c 
>   samza-core/src/main/scala/org/apache/samza/util/Util.scala 
> fc3d085d7fff9f7dcec766ba48e550eb0052e99d 
>   
> samza-core/src/test/java/org/apache/samza/container/disk/TestDiskQuotaEnforcer.java
>  PRE-CREATION 
>   samza-core/src/test/java/org/apache/samza/util/TestThrottlingExecutor.java 
> PRE-CREATION 
>   samza-core/src/test/scala/org/apache/samza/util/TestUtil.scala 
> 05b4e5c37578340eefe6d412f5a76f540bec6fa6 
> 
> Diff: https://reviews.apache.org/r/48080/diff/
> 
> 
> Testing
> -------
> 
> - Added new unit tests
> - Ran existing tests with gradle test
> - Verified throttling behavior and instrumentation with local deployment
> - Verified average latency impact of feature to be < 150ns for Linux and OSX
> 
> 
> Thanks,
> 
> Chris Pettitt
> 
>

Reply via email to