You are not obliged to generate exactly the numSplits in return to
split(). If you have fewer queues (partitions), you can just return
fewer number of splits (and therefore make sure that all your splits
have associated queue). On the other hand, if your source returns
watermark of BoundedWindo
Hi,
I want to iterate multiple times on the Iterable (the output of GroupByKey
transformation)
When my Runner is SparkRunner, I get an exception:
Caused by: java.lang.IllegalStateException: ValueIterator can't be iterated
more than once,otherwise there could be data lost
at
org
Hi Gershi,
could you please outline the pipeline you are trying to execute?
Basically, you cannot iterate the Iterable multiple times in single
ParDo. It should be possible, though, to apply multiple ParDos to output
from GroupByKey.
Jan
On 9/26/19 3:32 PM, Gershi, Noam wrote:
Hi,
I want
Did you try moving the imports from the process function to the top of
main.py?
Kyle Weaver | Software Engineer | github.com/ibzib | kcwea...@google.com
On Wed, Sep 25, 2019 at 11:27 PM Yu Watanabe wrote:
> Hello.
>
> I would like to ask for help with resolving dependency issue for imported
>
Jan, in Beam users expect to be able to iterate the GBK output multiple
times even from within the same ParDo.
Is this something that Beam on Spark Runner never supported?
On Thu, Sep 26, 2019 at 6:50 AM Jan Lukavský wrote:
> Hi Gershi,
>
> could you please outline the pipeline you are trying to
https://issues.apache.org/jira/browse/BEAM-8312 should help with this. In
summary, I would say that portable beam-on-Flink is basically feature
complete, but we're still smoothing out some of the ease-of-use issues. And
feedback like this really helps, so thanks!
On Tue, Sep 24, 2019 at 8:10 PM Yu