I discussed a bit via Skype with Gyula and Paris.

We thought about the following way to do it:

 - We add a dedicated streaming mode for now. The streaming mode supersedes
the batch mode, so it can run both type of programs.

 - The streaming mode sets the memory manager to "lazy allocation".
    -> So long as it runs pure streaming jobs, the full heap will be
available to window buffers and UDFs.
    -> Batch programs can still run, so mixed workloads are not prevented.
Batch programs are a bit less robust there, because the memory manager does
not pre-allocate memory. UDFs can eat into Flink's memory portion.

 - The streaming mode starts the necessary configured components/services
for state backups



Over the next versions, we want to bring these things together:
  - use the managed memory for window buffers
  - on-demand starting of the state backend

Then, we deprecate the streaming mode, let both modes start the cluster in
the same way.





On Thu, May 21, 2015 at 4:01 PM, Aljoscha Krettek <aljos...@apache.org>
wrote:

> Would it not be possible to start the snapshot service once the user
> starts the first streaming job? About 2) with checkpointing coming up,
> would it not make sense to shift to managed memory rather sooner than
> later. Then this point would become moot.
>
> On Thu, May 21, 2015 at 3:47 PM, Matthias J. Sax
> <mj...@informatik.hu-berlin.de> wrote:
> > What would be the consequences on "mixed" programs? (If there is any
> > plan to support those?)
> >
> > Would it be necessary to have a third mode? Or would those programs
> > simple run in streaming mode?
> >
> > -Matthias
> >
> > On 05/21/2015 03:12 PM, Stephan Ewen wrote:
> >> Hi all!
> >>
> >> We discussed a while back about introducing a dedicated streaming mode
> for
> >> Flink. I would like to take a go at this and implement the changes, but
> >> discuss them before.
> >>
> >>
> >> Here is a brief summary why we wanted to introduce the dedicated
> streaming
> >> mode:
> >> Even though both batch and streaming are executed by the same execution
> >> engine,
> >> a streaming setup of Flink varies a bit from a batch setup:
> >>
> >> 1) The streaming cluster starts an additional service to store the
> >> distributed state snapshots.
> >>
> >> 2) Streaming mode uses memory a bit different, so we should configure
> the
> >> memory manager differently. This difference may eventually go away.
> >>
> >>
> >>
> >> Concretely, to implement this, I was thinking about introducing the
> >> following externally visible changes
> >>
> >>  - Additional scripts "start-streaming-cluster.sh" and
> >> "start-streaming-local.sh"
> >>
> >>  - An execution mode parameter for the TaskManager ("batch / streaming")
> >>
> >>  - An execution mode parameter for the JobManager TaskManager ("batch /
> >> streaming")
> >>
> >>  - All local executors and mini clusters need a flag that specifies
> whether
> >> they will start
> >>    a streaming cluster, or a pure batch cluster.
> >>
> >>
> >> Anything else that comes to your minds?
> >>
> >>
> >> Greetings,
> >> Stephan
> >>
> >
>

Reply via email to