Hi Till Thanks for the reply. If you think about it however, having several diverging computational paths from an intermediate point will probably require re-computation anyway, in case the number of these paths is even higher than the slots available. Could that be an argument against a possible implementation? Making the output of the non-deterministic step persistent seems costly however. Is there any way to ensure that the data source is partitioned across the different slots in exactly the same way every-time? For example, I am using a {{generateSequence}} call, and the internal iterator, namely the NumberSequenceIterator seems deterministic in its operation, at least as far as how the elements are grouped together. But surprisingly, I observed different splits now and then.
Regards Sachin -- Sachin Goel Computer Science, IIT Delhi m. +91-9871457685 On Wed, Aug 12, 2015 at 12:58 PM, Till Rohrmann <trohrm...@apache.org> wrote: > At the moment, Flink does not support the calculation of intermediate > results from which you can continue your computation. When you execute jobs > which share parts of its job graph, then they are recomputed. When your job > contains operators with non-deterministic output, then there is no > guarantee that the shared job graph parts produce the same results. > > What you can do is to execute the two jobs in parallel so that they share > the input of the non-deterministic operator. Alternatively, you can persist > the data set after your non-deterministic operator by writing it manually > to disc and reading it from there. > > Cheers, > Till > > On Wed, Aug 12, 2015 at 1:34 AM, Sachin Goel <sachingoel0...@gmail.com> > wrote: > > > I'm writing a utility to split a data set randomly into several parts and > > return an Array of data sets. However, whenever I operate on any of > > these *subsets, > > *the program basically start from the original data set, and the split is > > performed again. > > > > To ensure that these subsets are mutually exclusive, we need to generate > > the exact same sequence of random numbers, but also to ensure that the > > elements arrive in a filter job in exactly the same order. How do I > achieve > > this? > > Here's the code I've written: > > https://github.com/apache/flink/pull/921/files > > > > Regards > > Sachin > > > > -- Sachin Goel > > Computer Science, IIT Delhi > > m. +91-9871457685 > > >