Re: Writing Intermediates to disk

Vikram Saxena Mon, 09 May 2016 03:17:22 -0700

I do not know if I understand completely, but I would  create a new DataSet
based on filtering the condition and then persist this DataSet.


So :

DataSet ds2 = DataSet1.filter(Condition)

2ds.output(...)




On Mon, May 9, 2016 at 11:09 AM, Ufuk Celebi <u...@apache.org> wrote:

> Flink has support for spillable intermediate results. Currently they
> are only set if necessary to avoid pipeline deadlocks.
>
> You can force this via
>
> env.getConfig().setExecutionMode(ExecutionMode.BATCH);
>
> This will write shuffles to disk, but you don't get the fine-grained
> control you probably need for your use case.
>
> – Ufuk
>
> On Thu, May 5, 2016 at 3:29 PM, Paschek, Robert
> <robert.pasc...@tu-berlin.de> wrote:
> > Hi Mailing List,
> >
> >
> >
> > I want to write and read intermediates to/from disk.
> >
> > The following foo- codesnippet may illustrate my intention:
> >
> >
> >
> > public void mapPartition(Iterable<T> tuples, Collector<T> out) {
> >
> >
> >
> >                 for (T tuple : tuples) {
> >
> >
> >
> >                                if (Condition)
> >
> >                                                out.collect(tuple);
> >
> >                                else
> >
> >                                                writeTupleToDisk
> >
> >                 }
> >
> >
> >
> >                 While (‘TupleOnDisk’)
> >
> >                                out.collect(‘ReadNextTupleFromDisk’);
> >
> > }
> >
> >
> >
> > I'am wondering if flink provides an integrated class for this purpose. I
> > also have to precise identify the files with the intermediates due
> > parallelism of mapPartition.
> >
> >
> >
> >
> >
> > Thank you in advance!
> >
> > Robert
>

Re: Writing Intermediates to disk

Reply via email to