I have a Dataflow pipeline that reads data from JDBC and Pub/Sub. My ideal
pipeline backfills its state and output from historical data via the JDBC
input, and then continues processing new elements arriving via pub/sub.
Conceptually, this seems easy to do with a filter on each source
before/after some specific cutoff instant.

However, when I add pub/sub into the pipeline, it runs in streaming mode,
and the pipeline does not produce the expected results -- all of the
results that would be produced based on looping timers seem to be missing.

I thought this might be related to the post-inputs Flatten, but I've taken
pub/sub out of the equation, and run the same exact JDBC-based pipeline in
batch vs streaming mode, and the JDBC-only pipeline in streaming mode
produces the same partial results.

What could be happening?

Regards,
Raman

Reply via email to