Hi Mustafa, I'm afraid, this is not possible. Although you can annotate DataSources with partitioning information, this is not enough to avoid repartitioning for a CoGroup. The reason for that is that CoGroup requires co-partitioning of both inputs, i.e., both inputs must be equally partitioned (same number of partitions, same partitioning function, same location of partitions). Since Flink is dynamically assigning tasks to execution slots, it is not possible to co-locate data that was read from a data source and data coming from the result of another computation.
If you just need the result of the first co-group on disk, you could also build a single program that does both co-groups and additional writes the result of the first co-group to disk (Flink supports multiple data sinks). Best, Fabian 2015-05-18 15:43 GMT+02:00 Mustafa Elbehery <elbeherymust...@gmail.com>: > Hi, > > I am writing a flink job, in which I have three datasets. I have > partitionedByHash the first two before coGrouping them. > > My plan is to spill the result of coGrouping to disk, and then re-read it > again before coGrouping with the third dataset. > > My question is, is there anyway to inform flink that the first coGroup > result is already partitioned ?! I know I can re-partition again before > coGrouping but I would like to know if there is anyway to avoid a step > which was already executed, > > Regards. > > -- > Mustafa Elbehery > EIT ICT Labs Master School <http://www.masterschool.eitictlabs.eu/home/> > +49(0)15750363097 > skype: mustafaelbehery87 > >