At the moment, Flink does not support the calculation of intermediate
results from which you can continue your computation. When you execute jobs
which share parts of its job graph, then they are recomputed. When your job
contains operators with non-deterministic output, then there is no
guarantee that the shared job graph parts produce the same results.

What you can do is to execute the two jobs in parallel so that they share
the input of the non-deterministic operator. Alternatively, you can persist
the data set after your non-deterministic operator by writing it manually
to disc and reading it from there.

Cheers,
Till

On Wed, Aug 12, 2015 at 1:34 AM, Sachin Goel <sachingoel0...@gmail.com>
wrote:

> I'm writing a utility to split a data set randomly into several parts and
> return an Array of data sets. However, whenever I operate on any of
> these *subsets,
> *the program basically start from the original data set, and the split is
> performed again.
>
> To ensure that these subsets are mutually exclusive, we need to generate
> the exact same sequence of random numbers, but also to ensure that the
> elements arrive in a filter job in exactly the same order. How do I achieve
> this?
> Here's the code I've written:
> https://github.com/apache/flink/pull/921/files
>
> Regards
> Sachin
>
> -- Sachin Goel
> Computer Science, IIT Delhi
> m. +91-9871457685
>

Reply via email to