Hi Leo,

At linkedin when we switched to using RocksDB for Samza last year, we did
some tests to see how well RocksDB performs. We used the rocksdb
microbenchmark(
https://github.com/facebook/rocksdb/blob/master/java/benchmark/src/main/java/org/rocksdb/benchmark/DbBenchmark.java)
to conduct serval tests. For sequential write (10 bytes key, 800 bytes
value, 1,000,000,000 entries), Rocksdb write throughput is around 311 MB
/sec with SSD. You could take a look at the result (
https://issues.apache.org/jira/secure/attachment/12723431/2015-04-06%20RocksDB%20Performance.pdf)
from SAMZA-543 attachment.

When Samza restore data in RocksDB, it is doing RocksDB db put operation
for entry(RocksDbKeyValueStore->putAll). And it takes time to reseed if
your changelog is huge. Hence Samza 0.10 introduce Yarn host-affinity
feature which Jagadish mentions. This should help to solve the long RocksDB
restore time in most cases.

Thanks,
-Tao

On Thu, Feb 18, 2016 at 8:35 AM, Leo Woessner <est...@gmail.com> wrote:

> We are starting to use the key-value store with rocksdb.  We are trying to
> offically add Samza to our stack and functionally everything is great. But,
>
> I am seeing minutes to hours restore time.  Does anyone have any benchmarks
> on data size versus restore time?  My big question is how will this scale.
>
> Thanks in advance
>
> --
> Leo Woessner
>

Reply via email to