Hi Kenn

Thanks for your guidance, I understand that batch mode waits for previous 
stage. But the real issue in this particular case is not only this.

Dataflow runner adds a step automatically 
"BatchStatefulParDoOverrides.GbkBeforeStatefulParDo" which not only waits for 
previous stage but it waits for a very very very long time. Is there a way to 
give hint to Dataflow runner not to add this step, as in my case I functionally 
do not require this step. 

Thanks for your suggestion, will create another thread to understand BQ options

Thanks
Aniruddh

On 2020/04/23 03:51:31, Kenneth Knowles <k...@apache.org> wrote: 
> The definition of batch mode for Dataflow is this: completely compute the
> result of one stage of computation before starting the next stage. There is
> no way around this. It does not have to do with using state and timers.
> 
> If you are working with state & timers & triggers, and you are hoping for
> output before the pipeline is completely terminated, then you most likely
> want streaming mode. Perhaps it is best to investigate the BQ read
> performance issue.
> 
> Kenn
> 
> On Wed, Apr 22, 2020 at 4:04 PM Aniruddh Sharma <asharma...@gmail.com>
> wrote:
> 
> > Hi
> >
> > I am reading a bounded collection from BQ.
> >
> > I have to use a Stateful & Timely operation.
> >
> > a) I am invoking job in batch mode. Dataflow runner adds a step
> > "BatchStatefulParDoOverrides.GbkBeforeStatefulParDo" which has partitionBy.
> > This partitionBy waits for all the data to come and becomes a bottleneck.
> > when I read about its documentation it seems its objective it to be added
> > when there are no windows.
> >
> > I tried added windows and triggering them before stateful step, but
> > everything comes to this partitionBy step and waits till all data is here.
> >
> > Is there a way to write code in some way (like window etc) or give
> > Dataflow a hint not to add this step in.
> >
> > b) I dont want to call this job in streaming mode, When I call in
> > streaming mode, this Dataflow runner does not add this step, but in
> > Streaming BQ read becomes a bottleneck.
> >
> > So either I have to solve how I read BQ faster if I call job in Streaming
> > mode or How I bypass this partitionBy from
> > "BatchStatefulParDoOverrides.GbkBeforeStatefulParDo" if I invoke job in
> > batch mode ?
> >
> > Thanks
> > Aniruddh
> >
> >
> >
> >
> 

Reply via email to