Re: Beam Runners: What about Batch to Streaming transition

Robert Bradshaw Fri, 11 Jan 2019 04:29:51 -0800

A runner is free to process things in streaming mode, batch mode, or
even alternate between the two. Generally there are certain
efficiencies/simplifications that only work (well) in batch mode, and
on the other hand the presence of an unbounded source means one cannot
wait for a PCollection to be entirely produced before scheduling
downstream work, which is why runners often have different modes and
execute an entire pipeline in one or the other, but being able to
transition (e.g. occasionally running over unbounded PCollections in
batch mode when a streaming pipeline has a large amount of backlog or
low constraints on latency) is consistent with the model and a
desirable feature that I've heard discussed before.


On Fri, Jan 11, 2019 at 10:43 AM Alex Van Boxel <a...@vanboxel.be> wrote:
>
> A question for the runner implementers:
>
> The Beam model is stream vs batch agnostic. But I have use cases where we 
> replay history (from BigTable or BigQuery) but then transition into streaming.
>
> Now with Splittable DoFn's it's easier to create inputs that start batch, 
> then go streaming. But I have the impression that the runners work either 
> streaming or batch. I don't think the model for the runners support going 
> from massive batch processing into streaming mode right?
>
> So if you have an Unbounded input anywhere the runner will work streaming, 
> even processing the batch workload?
>
> Is it something the community is thinking about?
>
>  _/
> _/ Alex Van Boxel

Re: Beam Runners: What about Batch to Streaming transition

Reply via email to