I do not know if I understand completely, but I would create a new DataSet based on filtering the condition and then persist this DataSet.
So : DataSet ds2 = DataSet1.filter(Condition) 2ds.output(...) On Mon, May 9, 2016 at 11:09 AM, Ufuk Celebi <u...@apache.org> wrote: > Flink has support for spillable intermediate results. Currently they > are only set if necessary to avoid pipeline deadlocks. > > You can force this via > > env.getConfig().setExecutionMode(ExecutionMode.BATCH); > > This will write shuffles to disk, but you don't get the fine-grained > control you probably need for your use case. > > – Ufuk > > On Thu, May 5, 2016 at 3:29 PM, Paschek, Robert > <robert.pasc...@tu-berlin.de> wrote: > > Hi Mailing List, > > > > > > > > I want to write and read intermediates to/from disk. > > > > The following foo- codesnippet may illustrate my intention: > > > > > > > > public void mapPartition(Iterable<T> tuples, Collector<T> out) { > > > > > > > > for (T tuple : tuples) { > > > > > > > > if (Condition) > > > > out.collect(tuple); > > > > else > > > > writeTupleToDisk > > > > } > > > > > > > > While (‘TupleOnDisk’) > > > > out.collect(‘ReadNextTupleFromDisk’); > > > > } > > > > > > > > I'am wondering if flink provides an integrated class for this purpose. I > > also have to precise identify the files with the intermediates due > > parallelism of mapPartition. > > > > > > > > > > > > Thank you in advance! > > > > Robert >