Chris + Samza Devs, I was wondering whether Samza could support re-processing as described by the Kappa architecture or Liquid ( http://www.cidrdb.org/cidr2015/Papers/CIDR15_Paper25u.pdf).
It seems that a changelog is not sufficient to be able to restore state backward in time. Kafka compaction will guarantee that local state can be restored from where it left off but I don't see how it can restore past state. Imagine the case where a stream job has a lot of state in it's local store but it has not updated any keys in a long time. Time t1: All of the data would be in the tail of the Kafka log (past the cleaner point). Time t2: The job updates some keys. Now we're in a state where the next compaction will blow away the old values for those keys. Time t3: Compaction occurs and old values are discarded. Say we want to launch a re-processing job that would begin from t1. If we launch that job before t3, it will correctly restore it's state. However, if we launch the job after t3, it will be missing old values, right? Unless I'm misunderstanding something, the only way around this is to keep snapshots in addition to the changelog. Has there been any discussion of providing an option in Samza of taking RocksDB snapshots and persisting them to an object store or HDFS? Thanks, Roger