Hi, you can do it like this:
1) you have to split each label record of the main dataset into separate records: (0,List(a, b, c, d, e, f, g)) -> (0, a), (0, b), (0, c), ..., (0, g) (1,List(b, c, f, a, g)) -> (1, b), (1, c), ..., (1, g) 2) join word index dataset with splitted main dataset: DataSet<Tuple2<Integer, String>> splittedMain = ... DataSet<Tuple2<Long, String>> wordIdx = ... DataSet<Integer, Long> joined = splittedMain.join(wordIdx).where(1).equalTo(1).with(...) 3) Group by Label: DataSet<Tuple2<Integer, Long[]>> labelsWithIdx = joined.groupBy(0).reduceGroup(...) // collect all indexes in list / array Best, Fabian 2016-10-10 23:49 GMT+02:00 Kürşat Kurt <kur...@kursatkurt.com>: > Hi; > > > > I have MainDataset (Label,WordList) : > > > > (0,List(a, b, c, d, e, f, g)) > > (1,List(b, c, f, a, g)) > > > > ..and, wordIndex dataset(created with .zipWithIndex) : > > > > wordIndex> (0,a) > > wordIndex> (1,b) > > wordIndex> (2,c) > > wordIndex> (3,d) > > wordIndex> (4,e) > > wordIndex> (5,f) > > wordIndex> (6,g) > > > > How can i convert mainDataset to indexed wordList dataset like this: > > (0,List(1,2,3,4,5,6)) > > (1,List(2,3,5,0,6) > > >