----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/43589/ -----------------------------------------------------------
(Updated Feb. 15, 2016, 9:54 p.m.) Review request for samza. Repository: samza Description ------- The class `org.apache.samza.storage.kv.CachedStore` is currently calling `store.flush()` when evicting dirty entries. This in turn causes RocksDB to flush its memtables much more than necessary, causing slowdowns. In a mixed put / get workload, e.g. 2 gets for 1 put with an object cache size of 1000, RocksDB will flush its memtable roughly every 333 calls to `put()`; that is every time the eldest entry from the cache is dirty. In our benchmarks, this leads to a more than 20x drop in throughput. The attached patch fixes the issue as follows: `CachedStore.put()` no longer flushes when evicting dirty entries. It calls `store.putAll()` with all dirty entries and resets the dirty list and count but does not call `store.flush()`. Likewise, `CachedStore.cache.removeEldestEntry()` no longer flushes when evicting dirty entries but calls `store.putAll()` on all dirty entries and resets the dirty list and count. The behaviour of `CachedStore.flush()` is unaffected. Diffs ----- samza-kv/src/main/scala/org/apache/samza/storage/kv/CachedStore.scala 9a5b2d5 samza-kv/src/main/scala/org/apache/samza/storage/kv/CachedStoreMetrics.scala df8efae samza-kv/src/test/scala/org/apache/samza/storage/kv/TestCachedStore.scala 198720c Diff: https://reviews.apache.org/r/43589/diff/ Testing (updated) ------- Unit tests adapted. Thanks, Nicolas Maquet