Different runners decide it differently.
E.g. for the Dataflow runner: in batch mode, bundles are usually quite
large, e.g. something like several-dozen-MB chunks of files, or pretty big
key ranges of something like BigTable or GroupByKey output. The bundle
sizes are not known in advance (e.g. whe
Same here - shame on me. Congratulations on the graduation Gris, very happy
to have you back!
On Tue, 22 May 2018 at 09:19 Ismaël Mejía wrote:
> I missed somehow this email thread.
> Congratulations Gris and welcome back!
>
> On Fri, May 18, 2018 at 5:34 AM Jesse Anderson
> wrote:
>
> > Congrat
I missed somehow this email thread.
Congratulations Gris and welcome back!
On Fri, May 18, 2018 at 5:34 AM Jesse Anderson
wrote:
> Congrats!
> On Thu, May 17, 2018, 6:44 PM Robert Burke wrote:
>> Congrats & welcome back!
>> On Thu, May 17, 2018, 5:44 PM Huygaa Batsaikhan
wrote:
>>> Welcome
Hi Eugene!
I had gone through that link before sending an email here. It does a decent job
explaining when to use which method and what kind of optimisations we are
looking at, but didn’t really answer the question I had i.e. the controlling
granularity of elements of PCollection in a bundle. K
Thanks for the insight Kenneth. It would surprise me if the the decision made
by runner about latency vs amortized cost is non deterministic. Are there any
benchmarking results with respect to bundling kicking in somewhere?
> On May 21, 2018, at 8:52 PM, Kenneth Knowles wrote:
>
> Hi Abdul,
>