-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/48459/
-----------------------------------------------------------

(Updated June 14, 2016, 11:13 p.m.)


Review request for samza, Boris Shkolnik, Chris Pettitt, Jake Maes, Navina 
Ramesh, Jagadish Venkatraman, Xinyu Liu, and Yi Pan (Data Infrastructure).


Bugs: SAMZA-964
    https://issues.apache.org/jira/browse/SAMZA-964


Repository: samza


Description
-------

SAMZA-964 Improve the performance of the continuous OFFSET checkpointing for 
logged stores

1. Cache metadata more aggressively. Only expire metadata if we get Kafka 
exceptions. This applies for all cases EXCEPT the partition count monitor, 
which uses the TTL from the StreamMetadataCache
2. Reduce excessive Offset fetching. 
3. Do not allow unbounded exponential backoff for offset checkpointing, just 
skip the offset file. Exponential backoff can balloon the commit time and stall 
the event loop. So we will only retry up to 3 times for a max delay of 400ms
4. Add some trace log messages to help track/time KV Store flushes (the other 
culprit for the slowdown)


Diffs (updated)
-----

  samza-api/src/main/java/org/apache/samza/system/ExtendedSystemAdmin.java 
daa2212cf1d54e90861657fab86b2e780d7e89e2 
  
samza-core/src/main/java/org/apache/samza/coordinator/stream/CoordinatorStreamSystemConsumer.java
 0a6661c423a09944aa211223cad205958d3b1fee 
  samza-core/src/main/scala/org/apache/samza/storage/TaskStorageManager.scala 
c7b05203a1958a62af9dec04b215d985c4646dc4 
  samza-core/src/main/scala/org/apache/samza/system/StreamMetadataCache.scala 
18b47ec3393978e403cadd8754f3fa5fd68654e9 
  
samza-core/src/test/scala/org/apache/samza/coordinator/TestJobCoordinator.scala 
110c3a910aa0bae77dfe5eebbf82286b56dc4654 
  
samza-core/src/test/scala/org/apache/samza/storage/TestTaskStorageManager.scala 
c8ea64c7c67dd6bf789d2a3445d620ccef1beac0 
  
samza-kafka/src/main/scala/org/apache/samza/system/kafka/KafkaSystemAdmin.scala 
23aa58dff6b5e282bb634d3913cacd73003402ea 
  
samza-kafka/src/test/scala/org/apache/samza/system/kafka/TestKafkaSystemAdmin.scala
 6c292234dcdd54eaca05f3e1a3fc401e205d6066 
  
samza-kv-rocksdb/src/main/scala/org/apache/samza/storage/kv/RocksDbKeyValueStore.scala
 f0965aec5f3ec2a214dc40c70832c58273623749 
  samza-kv/src/main/scala/org/apache/samza/storage/kv/CachedStore.scala 
c28f8db8cb59bd5415e78535877acc1e5bee0f67 
  samza-kv/src/main/scala/org/apache/samza/storage/kv/LoggedStore.scala 
7bba6ff37d8266674e7f15c10c7c146f4a41fc91 
  
samza-kv/src/main/scala/org/apache/samza/storage/kv/SerializedKeyValueStore.scala
 8e183efcdec6fd3f921fc2bfe1971c95715930ed 

Diff: https://reviews.apache.org/r/48459/diff/


Testing
-------

New unit tests. 

./gradlew clean build
bin/check-all.sh on my Mac

Manual testing with 2 test jobs and the big job that had the performance issue.


Thanks,

Jake Maes

Reply via email to