Hi Jack, You are right. In the situation where state *doesn't need to be recovered upon failure*, you could use in-memory store, instead of RocksDB. The only limitation there is the size of your dataset. If it exceeds beyond the allocated memory, you won't have any checks around it. Also, if you have a "TTL" or cached dataset, you may have to manually time-out the entries.
If you do use RocksDB with a changelog, the data not only gets persisted to the disk, but also to the changelog topic. So, if the container fails and restarts on another host, it can recover the state from the changelog. Additionally, if you enable host-affinity feature for your Samza application, a failed container may get allocated on the same host. In this case, it will re-use the already persisted state on disk, after making sure that it is caught up with the changelog stream. HTH, Navina On Tue, Aug 2, 2016 at 11:51 AM, Jack Huang <jackhu...@mz.com> wrote: > Hi all, > > Is there any reason to use RocksDB without associating it to changelog in > Kafka? My understanding is that even though Rocks persists data to disk, > when container fails the partition might be restarted on a different > machine, where there is no persisted data on disk. In that case, wouldn't > it make sense to use a in-memory store (e.g. java.util.Map) instead since > Rocks effectively does not provide persistence under Samza's architecture? > > Thanks, > Jack > -- Navina R.