Hi Jack,
You are right. In the situation where state *doesn't need to be recovered
upon failure*, you could use in-memory store, instead of RocksDB.
The only limitation there is the size of your dataset. If it exceeds beyond
the allocated memory, you won't have any checks around it. Also, if you
have a "TTL" or cached dataset, you may have to manually time-out the
entries.

If you do use RocksDB with a changelog, the data not only gets persisted to
the disk, but also to the changelog topic. So, if the container fails and
restarts on another host, it can recover the state from the changelog.

Additionally, if you enable host-affinity feature for your Samza
application, a failed container may get allocated on the same host. In this
case, it will re-use the already persisted state on disk, after making sure
that it is caught up with the changelog stream.

HTH,
Navina

On Tue, Aug 2, 2016 at 11:51 AM, Jack Huang <jackhu...@mz.com> wrote:

> Hi all,
>
> Is there any reason to use RocksDB without associating it to changelog in
> Kafka? My understanding is that even though Rocks persists data to disk,
> when container fails the partition might be restarted on a different
> machine, where there is no persisted data on disk. In that case, wouldn't
> it make sense to use a in-memory store (e.g. java.util.Map) instead since
> Rocks effectively does not provide persistence under Samza's architecture?
>
> Thanks,
> Jack
>



-- 
Navina R.

Reply via email to