----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/48459/ -----------------------------------------------------------
Review request for samza, Boris Shkolnik, Chris Pettitt, Jake Maes, Navina Ramesh, Jagadish Venkatraman, Xinyu Liu, and Yi Pan (Data Infrastructure). Bugs: SAMZA-964 https://issues.apache.org/jira/browse/SAMZA-964 Repository: samza Description ------- SAMZA-964 Improve the performance of the continuous OFFSET checkpointing for logged stores 1. Cache metadata more aggressively. Only expire metadata if we get Kafka exceptions. This applies for all cases EXCEPT the partition count monitor, which uses the TTL from the StreamMetadataCache 2. Reduce excessive Offset fetching. 3. Do not allow unbounded exponential backoff for offset checkpointing, just skip the offset file. Exponential backoff can balloon the commit time and stall the event loop. So we will only retry up to 3 times for a max delay of 400ms 4. Add some trace log messages to help track/time KV Store flushes (the other culprit for the slowdown) Diffs ----- samza-api/src/main/java/org/apache/samza/system/ExtendedSystemAdmin.java daa2212cf1d54e90861657fab86b2e780d7e89e2 samza-core/src/main/java/org/apache/samza/coordinator/stream/CoordinatorStreamSystemConsumer.java 0a6661c423a09944aa211223cad205958d3b1fee samza-core/src/main/scala/org/apache/samza/storage/TaskStorageManager.scala c7b05203a1958a62af9dec04b215d985c4646dc4 samza-core/src/main/scala/org/apache/samza/system/StreamMetadataCache.scala 18b47ec3393978e403cadd8754f3fa5fd68654e9 samza-core/src/test/scala/org/apache/samza/coordinator/TestJobCoordinator.scala 110c3a910aa0bae77dfe5eebbf82286b56dc4654 samza-core/src/test/scala/org/apache/samza/storage/TestTaskStorageManager.scala c8ea64c7c67dd6bf789d2a3445d620ccef1beac0 samza-kafka/src/main/scala/org/apache/samza/system/kafka/KafkaSystemAdmin.scala 23aa58dff6b5e282bb634d3913cacd73003402ea samza-kafka/src/test/scala/org/apache/samza/system/kafka/TestKafkaSystemAdmin.scala 6c292234dcdd54eaca05f3e1a3fc401e205d6066 samza-kv-rocksdb/src/main/scala/org/apache/samza/storage/kv/RocksDbKeyValueStore.scala f0965aec5f3ec2a214dc40c70832c58273623749 samza-kv/src/main/scala/org/apache/samza/storage/kv/CachedStore.scala c28f8db8cb59bd5415e78535877acc1e5bee0f67 samza-kv/src/main/scala/org/apache/samza/storage/kv/LoggedStore.scala 7bba6ff37d8266674e7f15c10c7c146f4a41fc91 samza-kv/src/main/scala/org/apache/samza/storage/kv/SerializedKeyValueStore.scala 8e183efcdec6fd3f921fc2bfe1971c95715930ed Diff: https://reviews.apache.org/r/48459/diff/ Testing ------- New unit tests. ./gradlew clean build Manual testing with 2 test jobs and the big job that had the performance issue. Thanks, Jake Maes