-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/43589/
-----------------------------------------------------------

(Updated Feb. 15, 2016, 9:54 p.m.)


Review request for samza.


Repository: samza


Description
-------

The class `org.apache.samza.storage.kv.CachedStore` is currently calling 
`store.flush()` when evicting dirty entries. This in turn causes RocksDB to 
flush its memtables much more than necessary, causing slowdowns.

In a mixed put / get workload, e.g. 2 gets for 1 put with an object cache size 
of 1000, RocksDB will flush its memtable roughly every 333 calls to `put()`; 
that is every time the eldest entry from the cache is dirty. In our benchmarks, 
this leads to a more than 20x drop in throughput.

The attached patch fixes the issue as follows:
`CachedStore.put()` no longer flushes when evicting dirty entries. It calls 
`store.putAll()` with all dirty entries and resets the dirty list and count but 
does not call `store.flush()`.
Likewise, `CachedStore.cache.removeEldestEntry()` no longer flushes when 
evicting dirty entries but calls `store.putAll()` on all dirty entries and 
resets the dirty list and count.
The behaviour of `CachedStore.flush()` is unaffected.


Diffs
-----

  samza-kv/src/main/scala/org/apache/samza/storage/kv/CachedStore.scala 9a5b2d5 
  samza-kv/src/main/scala/org/apache/samza/storage/kv/CachedStoreMetrics.scala 
df8efae 
  samza-kv/src/test/scala/org/apache/samza/storage/kv/TestCachedStore.scala 
198720c 

Diff: https://reviews.apache.org/r/43589/diff/


Testing (updated)
-------

Unit tests adapted.


Thanks,

Nicolas Maquet

Reply via email to