Hi,

We have a Flink streaming pipeline (1.4.2) which reads from Kafka, uses
mapWithState with RocksDB and writes the updated states to Cassandra.
We also would like to reprocess the ingested records from HDFS. For this we
consider computing the latest state of the records over the whole dataset
in a batch
manner instead of reading them record by record.

What are the options (best practices) to bring batch and streaming together
(FLINK-2320 is open at the moment)? Is it possible to build the RocksDB
state "offline"
and share it with the streaming job?

Ideally the best would be to have one job which switches from batch to
streaming once all records have been read from HDFS.


Thanks,
Peter

Reply via email to