Definitely seems like the formatting got lost in translation, sorry about
that :)

I guess both cases (methods) create splits, which are essentially a list of
bounded/unbounded source instances, each responsible for reading certain
segments (physical or otherwise) of the data.

On Mon, Jan 9, 2017 at 11:51 PM Stephen Sisk <s...@google.com.invalid>
wrote:

> hi!
>
> I think your strikethrough got lost due to this being a text-only email
> list. To make sure, I think you're asking the following:
> " would it be reasonable to think of splitIntoBundles as generateSplits? "
> (ie, you strikethrough'd Initial)
>
> They are very similar and I definitely also think of them as occupying the
> same niche. I'll let someone else who was around for naming discuss whether
> it was intentional or not. Conceptually, the way that bounded vs streaming
> are handled means that they are doing slightly different things: a bounded
> source is really kind of creating physical chunks of the data, whereas the
> streaming source is creating conceptual divisions of the data that will be
> used later. I'm not sure that's worth the confusion caused by the
> differences.
>
> One thing to clarify - splitIntoBundles does have an "Initial" aspect to
> it. I don't believe there is a publicly defined/written down order the
> Sources & Reader methods are called in, but a runner trying to get
> efficiency would be able to use splitIntoBundles during job startup to be
> able to split up the work before creating readers rather than after
> creating readers and waiting to use splitAtFraction.
>
> S
>
> On Sun, Jan 8, 2017 at 6:06 AM Stas Levin <stasle...@gmail.com> wrote:
>
> > Hi,
> >
> > A short terminology question regarding "bundle", and
> > particularly splitIntoBundles vs. generateInitialSplits.
> >
> > In *BoundedSource* we have:
> > List<? extends BoundedSource<T>> *splitIntoBundles*(...)
> >
> > In *UnboundedSource* we have:
> > List<? extends UnboundedSource<OutputT, CheckpointMarkT>>
> > *generateInitialSplits*(...)
> >
> > I was wondering if the names were intentionally made different, i.e.
> "into
> > bundles" vs "into splits"?
> > In a way these two methods carry out a very similar task, would it be
> > reasonable to think of *splitIntoBundles *as *generate*Initial*Splits? *
> > (strikethrough due to "initial" not being applicable in the case of
> bounded
> > sources)
> >
> > Regards,
> > Stas
> >
>

Reply via email to