Hi Ken, you can certainly have partitioned sources and sinks. You can control the parallelism by calling .setParallelism() method. If you need a partitioned sink, you can call .keyBy() to hash partition.
I did not completely understand the requirements of your program. Can you maybe provide pseudo code for how the program should look like. Some general comments: - Flink's fault tolerance mechanism does not work with iterative data flows yet. This is work in progress see: FLINK-3257 [1] - Flink's fault tolerance mechanism does only work if you expose all! internal operator state. So you would need to put your Java DB in Flink state to have a recoverable job. - Is the DB essential in your application? Could you use Flink's key-partitioned state interface instead? That would help to make your job fault-tolerant. Best, Fabian [1] https://issues.apache.org/jira/browse/FLINK-3257 [2] https://ci.apache.org/projects/flink/flink-docs-release-1.1/apis/streaming/state.html#using-the-keyvalue-state-interface 2016-09-29 1:15 GMT+02:00 Ken Krugler <kkrugler_li...@transpac.com>: > Hi all, > > I’ve got a very specialized DB (runs in the JVM) that I need to use to > both keep track of state and generate new records to be processed by my > Flink streaming workflow. Some of the workflow results are updates to be > applied to the DB. > > And the DB needs to be partitioned. > > My initial approach is to wrap it in a regular operator, and have > subsequent streams be inputs for updating state. So now I’ve got an > IterativeDataStream, which should work. > > But I imagine I could also wrap this DB in a source and a sink, yes? > Though I’m not sure how I could partition it as a source, in that case. > > If it is feasible to have a partitioned source/sink, are there general > pros/cons to either approach? > > Thanks, > > — Ken > >