Re: Possible to use GlobalWindows for writing unbounded input to files?

2019-01-11 Thread Jeff Klukas
It is indeed well documented that numShards is required for unbounded input. And I do believe that a helpful error is thrown in the case of unbounded input and runner-determined sharding. I do believe there's still a bug here; it's just wandered quite a bit from the original title of the thread. T

Re: Possible to use GlobalWindows for writing unbounded input to files?

2019-01-11 Thread Chamikara Jayalath
Actually, this is a documented known issue. https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileIO.java#L152 On Fri, Jan 11, 2019 at 9:23 AM Jeff Klukas wrote: > Indeed, I was wrong about the ValueProvider distinction. I updated that in > the JIRA.

Re: Possible to use GlobalWindows for writing unbounded input to files?

2019-01-11 Thread Jeff Klukas
Indeed, I was wrong about the ValueProvider distinction. I updated that in the JIRA. It's when numShards is 0 (so runner-provided sharding) vs. an explicit number. Things work fine for explicit sharding. It's the runner-provided sharding mode that encounters the Flatten of PCollections with confli

Re: Beam Runners: What about Batch to Streaming transition

2019-01-11 Thread Robert Bradshaw
A runner is free to process things in streaming mode, batch mode, or even alternate between the two. Generally there are certain efficiencies/simplifications that only work (well) in batch mode, and on the other hand the presence of an unbounded source means one cannot wait for a PCollection to be

Beam Runners: What about Batch to Streaming transition

2019-01-11 Thread Alex Van Boxel
A question for the runner implementers: The Beam model is stream vs batch agnostic. But I have use cases where we replay history (from BigTable or BigQuery) but then transition into streaming. Now with Splittable DoFn's it's easier to create inputs that start batch, then go streaming. But I have