Hey Saliya, I recommend using DataSetUtils.zipWithIndex for this task. [1] It comes with flink-java.
[1] https://github.com/apache/flink/blob/master/flink-java/src/main/java/org/apache/flink/api/java/utils/DataSetUtils.java#L77 On Thu, Feb 25, 2016 at 5:52 PM, Saliya Ekanayake <esal...@gmail.com> wrote: > Thank you, Marton. That seems doable. > > However, is there a way I can create a dummy indexed data set? Like a way > to partition the index range without data across parallel tasks. For > example, if I could have something like, > > DataSet<IndexedSet> ds = ... > > then I can implement a custom method to load required data for a split > within a map operation, which will be less expensive than a join for my > case. > > Thank you, > Saliya > > On Thu, Feb 25, 2016 at 11:45 AM, Márton Balassi <balassi.mar...@gmail.com > > wrote: > >> Hey Saliya, >> >> I would add a uniqe ID to both the DataSets, the variable you referred to >> as 'i'. Then you can join the two DataSets on the field containing 'i' and >> do the mapping on the joined result. >> >> Hope this helps, >> >> Marton >> >> On Thu, Feb 25, 2016 at 5:38 PM, Saliya Ekanayake <esal...@gmail.com> >> wrote: >> >>> Hi, >>> >>> I've two data sets like, >>> >>> DataSet<T> a = ... >>> DataSet<T> b = ... >>> >>> They have the same type and same decomposition. I want to apply a map >>> operator that need both *a* and *b. *For example, >>> >>> a.map( i -> OP) >>> >>> within this OP I need the corresponding (*i *th) element of *b* as >>> well. Is there a way to do this? >>> >>> Thank you, >>> Saliya >>> >>> -- >>> Saliya Ekanayake >>> Ph.D. Candidate | Research Assistant >>> School of Informatics and Computing | Digital Science Center >>> Indiana University, Bloomington >>> Cell 812-391-4914 >>> http://saliya.org >>> >> >> > > > -- > Saliya Ekanayake > Ph.D. Candidate | Research Assistant > School of Informatics and Computing | Digital Science Center > Indiana University, Bloomington > Cell 812-391-4914 > http://saliya.org >