Re: Review Request 48459: SAMZA-964 Improve the performance of the continuous OFFSET checkpointing for logged stores

Navina Ramesh Tue, 14 Jun 2016 13:11:54 -0700

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/48459/#review137418
-----------------------------------------------------------





samza-api/src/main/java/org/apache/samza/system/ExtendedSystemAdmin.java (line 
30)
<https://reviews.apache.org/r/48459/#comment202551>

    Why do we need 2 methods for getSystemStreamPartitionCounts? I think you 
can change the method signature since this interface was introduced in this 
current version. Was that the reason you added a new method? 
    
    I could be missing something very obvious here :D
    Seems like "getSystemStreamPartitionCounts(Set<String> streamNames)" is 
used only in a Mock implementation



samza-core/src/main/scala/org/apache/samza/storage/TaskStorageManager.scala 
(line 230)
<https://reviews.apache.org/r/48459/#comment202715>

    nit: Doc can be improved by explaining why this is more efficient. That 
will be more informative. Or remove it altogther.



samza-core/src/main/scala/org/apache/samza/system/StreamMetadataCache.scala 
(line 67)
<https://reviews.apache.org/r/48459/#comment202719>

    Seems like we need a more precise interface for accessing stream's metadata 
cache. 
    An interface that will allow specifying list of ssps and the cacheTTL. 
    
    Note to self: Make changes to systemAdmin interface after the next release 
:)



samza-kafka/src/main/scala/org/apache/samza/system/kafka/KafkaSystemAdmin.scala 
(line 274)
<https://reviews.apache.org/r/48459/#comment202713>

    Documentation is not clear. It says "It does not retry if there is a 
failure".. But you are retrying upto "maxretries".
    
    It will be useful to define what is considered a failure, for example, what 
exceptions are ok for retry and which ones aren't. If there isn't any clear 
distinction between the exceptions, you can mention that as well



samza-kafka/src/main/scala/org/apache/samza/system/kafka/KafkaSystemAdmin.scala 
(line 279)
<https://reviews.apache.org/r/48459/#comment202714>

    nit: more accurate documentation here.  You can move it out of the function 
definition



samza-kafka/src/main/scala/org/apache/samza/system/kafka/KafkaSystemAdmin.scala 
(line 282)
<https://reviews.apache.org/r/48459/#comment202721>

    I don't like what we are doing here - setting the ttl to be max value. 
However, I don't see a better way of doing it. :(



samza-kafka/src/main/scala/org/apache/samza/system/kafka/KafkaSystemAdmin.scala 
(line 322)
<https://reviews.apache.org/r/48459/#comment202722>

    I strongly feel the need to make retries as a part of the 
ExponentialSleepStrategy. It will certainly be used in many places. Considering 
the urgency of this fix and the fact that the runLoop is in Scale, we can table 
it for another day.



samza-kv/src/main/scala/org/apache/samza/storage/kv/CachedStore.scala (line 191)
<https://reviews.apache.org/r/48459/#comment202723>

    nit: Do we really need these trace lines? If we do, can you change it to 
something like: "Begin flush to store" and "Flush to store complete"



samza-kv/src/main/scala/org/apache/samza/storage/kv/LoggedStore.scala (line 103)
<https://reviews.apache.org/r/48459/#comment202724>

    Same comment as above


- Navina Ramesh


On June 13, 2016, 12:22 a.m., Jake Maes wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/48459/
> -----------------------------------------------------------
> 
> (Updated June 13, 2016, 12:22 a.m.)
> 
> 
> Review request for samza, Boris Shkolnik, Chris Pettitt, Jake Maes, Navina 
> Ramesh, Jagadish Venkatraman, Xinyu Liu, and Yi Pan (Data Infrastructure).
> 
> 
> Bugs: SAMZA-964
>     https://issues.apache.org/jira/browse/SAMZA-964
> 
> 
> Repository: samza
> 
> 
> Description
> -------
> 
> SAMZA-964 Improve the performance of the continuous OFFSET checkpointing for 
> logged stores
> 
> 1. Cache metadata more aggressively. Only expire metadata if we get Kafka 
> exceptions. This applies for all cases EXCEPT the partition count monitor, 
> which uses the TTL from the StreamMetadataCache
> 2. Reduce excessive Offset fetching. 
> 3. Do not allow unbounded exponential backoff for offset checkpointing, just 
> skip the offset file. Exponential backoff can balloon the commit time and 
> stall the event loop. So we will only retry up to 3 times for a max delay of 
> 400ms
> 4. Add some trace log messages to help track/time KV Store flushes (the other 
> culprit for the slowdown)
> 
> 
> Diffs
> -----
> 
>   samza-api/src/main/java/org/apache/samza/system/ExtendedSystemAdmin.java 
> daa2212cf1d54e90861657fab86b2e780d7e89e2 
>   
> samza-core/src/main/java/org/apache/samza/coordinator/stream/CoordinatorStreamSystemConsumer.java
>  0a6661c423a09944aa211223cad205958d3b1fee 
>   samza-core/src/main/scala/org/apache/samza/storage/TaskStorageManager.scala 
> c7b05203a1958a62af9dec04b215d985c4646dc4 
>   samza-core/src/main/scala/org/apache/samza/system/StreamMetadataCache.scala 
> 18b47ec3393978e403cadd8754f3fa5fd68654e9 
>   
> samza-core/src/test/scala/org/apache/samza/coordinator/TestJobCoordinator.scala
>  110c3a910aa0bae77dfe5eebbf82286b56dc4654 
>   
> samza-core/src/test/scala/org/apache/samza/storage/TestTaskStorageManager.scala
>  c8ea64c7c67dd6bf789d2a3445d620ccef1beac0 
>   
> samza-kafka/src/main/scala/org/apache/samza/system/kafka/KafkaSystemAdmin.scala
>  23aa58dff6b5e282bb634d3913cacd73003402ea 
>   
> samza-kafka/src/test/scala/org/apache/samza/system/kafka/TestKafkaSystemAdmin.scala
>  6c292234dcdd54eaca05f3e1a3fc401e205d6066 
>   
> samza-kv-rocksdb/src/main/scala/org/apache/samza/storage/kv/RocksDbKeyValueStore.scala
>  f0965aec5f3ec2a214dc40c70832c58273623749 
>   samza-kv/src/main/scala/org/apache/samza/storage/kv/CachedStore.scala 
> c28f8db8cb59bd5415e78535877acc1e5bee0f67 
>   samza-kv/src/main/scala/org/apache/samza/storage/kv/LoggedStore.scala 
> 7bba6ff37d8266674e7f15c10c7c146f4a41fc91 
>   
> samza-kv/src/main/scala/org/apache/samza/storage/kv/SerializedKeyValueStore.scala
>  8e183efcdec6fd3f921fc2bfe1971c95715930ed 
> 
> Diff: https://reviews.apache.org/r/48459/diff/
> 
> 
> Testing
> -------
> 
> New unit tests. 
> 
> ./gradlew clean build
> bin/check-all.sh on my Mac
> 
> Manual testing with 2 test jobs and the big job that had the performance 
> issue.
> 
> 
> Thanks,
> 
> Jake Maes
> 
>

Re: Review Request 48459: SAMZA-964 Improve the performance of the continuous OFFSET checkpointing for logged stores

Reply via email to