Hi Leo, At linkedin when we switched to using RocksDB for Samza last year, we did some tests to see how well RocksDB performs. We used the rocksdb microbenchmark( https://github.com/facebook/rocksdb/blob/master/java/benchmark/src/main/java/org/rocksdb/benchmark/DbBenchmark.java) to conduct serval tests. For sequential write (10 bytes key, 800 bytes value, 1,000,000,000 entries), Rocksdb write throughput is around 311 MB /sec with SSD. You could take a look at the result ( https://issues.apache.org/jira/secure/attachment/12723431/2015-04-06%20RocksDB%20Performance.pdf) from SAMZA-543 attachment.
When Samza restore data in RocksDB, it is doing RocksDB db put operation for entry(RocksDbKeyValueStore->putAll). And it takes time to reseed if your changelog is huge. Hence Samza 0.10 introduce Yarn host-affinity feature which Jagadish mentions. This should help to solve the long RocksDB restore time in most cases. Thanks, -Tao On Thu, Feb 18, 2016 at 8:35 AM, Leo Woessner <est...@gmail.com> wrote: > We are starting to use the key-value store with rocksdb. We are trying to > offically add Samza to our stack and functionally everything is great. But, > > I am seeing minutes to hours restore time. Does anyone have any benchmarks > on data size versus restore time? My big question is how will this scale. > > Thanks in advance > > -- > Leo Woessner >