Definitely seems like the formatting got lost in translation, sorry about that :)
I guess both cases (methods) create splits, which are essentially a list of bounded/unbounded source instances, each responsible for reading certain segments (physical or otherwise) of the data. On Mon, Jan 9, 2017 at 11:51 PM Stephen Sisk <s...@google.com.invalid> wrote: > hi! > > I think your strikethrough got lost due to this being a text-only email > list. To make sure, I think you're asking the following: > " would it be reasonable to think of splitIntoBundles as generateSplits? " > (ie, you strikethrough'd Initial) > > They are very similar and I definitely also think of them as occupying the > same niche. I'll let someone else who was around for naming discuss whether > it was intentional or not. Conceptually, the way that bounded vs streaming > are handled means that they are doing slightly different things: a bounded > source is really kind of creating physical chunks of the data, whereas the > streaming source is creating conceptual divisions of the data that will be > used later. I'm not sure that's worth the confusion caused by the > differences. > > One thing to clarify - splitIntoBundles does have an "Initial" aspect to > it. I don't believe there is a publicly defined/written down order the > Sources & Reader methods are called in, but a runner trying to get > efficiency would be able to use splitIntoBundles during job startup to be > able to split up the work before creating readers rather than after > creating readers and waiting to use splitAtFraction. > > S > > On Sun, Jan 8, 2017 at 6:06 AM Stas Levin <stasle...@gmail.com> wrote: > > > Hi, > > > > A short terminology question regarding "bundle", and > > particularly splitIntoBundles vs. generateInitialSplits. > > > > In *BoundedSource* we have: > > List<? extends BoundedSource<T>> *splitIntoBundles*(...) > > > > In *UnboundedSource* we have: > > List<? extends UnboundedSource<OutputT, CheckpointMarkT>> > > *generateInitialSplits*(...) > > > > I was wondering if the names were intentionally made different, i.e. > "into > > bundles" vs "into splits"? > > In a way these two methods carry out a very similar task, would it be > > reasonable to think of *splitIntoBundles *as *generate*Initial*Splits? * > > (strikethrough due to "initial" not being applicable in the case of > bounded > > sources) > > > > Regards, > > Stas > > >