This sounds very reasonable.
On May 21, 2015 9:34 PM, "Stephan Ewen" <se...@apache.org> wrote:

> I discussed a bit via Skype with Gyula and Paris.
>
>
> We thought about the following way to do it:
>
>  - We add a dedicated streaming mode for now. The streaming mode supersedes
> the batch mode, so it can run both type of programs.
>
>  - The streaming mode sets the memory manager to "lazy allocation".
>     -> So long as it runs pure streaming jobs, the full heap will be
> available to window buffers and UDFs.
>     -> Batch programs can still run, so mixed workloads are not prevented.
> Batch programs are a bit less robust there, because the memory manager does
> not pre-allocate memory. UDFs can eat into Flink's memory portion.
>
>  - The streaming mode starts the necessary configured components/services
> for state backups
>
>
>
> Over the next versions, we want to bring these things together:
>   - use the managed memory for window buffers
>   - on-demand starting of the state backend
>
> Then, we deprecate the streaming mode, let both modes start the cluster in
> the same way.
>
>
>
>
>
> On Thu, May 21, 2015 at 4:01 PM, Aljoscha Krettek <aljos...@apache.org>
> wrote:
>
> > Would it not be possible to start the snapshot service once the user
> > starts the first streaming job? About 2) with checkpointing coming up,
> > would it not make sense to shift to managed memory rather sooner than
> > later. Then this point would become moot.
> >
> > On Thu, May 21, 2015 at 3:47 PM, Matthias J. Sax
> > <mj...@informatik.hu-berlin.de> wrote:
> > > What would be the consequences on "mixed" programs? (If there is any
> > > plan to support those?)
> > >
> > > Would it be necessary to have a third mode? Or would those programs
> > > simple run in streaming mode?
> > >
> > > -Matthias
> > >
> > > On 05/21/2015 03:12 PM, Stephan Ewen wrote:
> > >> Hi all!
> > >>
> > >> We discussed a while back about introducing a dedicated streaming mode
> > for
> > >> Flink. I would like to take a go at this and implement the changes,
> but
> > >> discuss them before.
> > >>
> > >>
> > >> Here is a brief summary why we wanted to introduce the dedicated
> > streaming
> > >> mode:
> > >> Even though both batch and streaming are executed by the same
> execution
> > >> engine,
> > >> a streaming setup of Flink varies a bit from a batch setup:
> > >>
> > >> 1) The streaming cluster starts an additional service to store the
> > >> distributed state snapshots.
> > >>
> > >> 2) Streaming mode uses memory a bit different, so we should configure
> > the
> > >> memory manager differently. This difference may eventually go away.
> > >>
> > >>
> > >>
> > >> Concretely, to implement this, I was thinking about introducing the
> > >> following externally visible changes
> > >>
> > >>  - Additional scripts "start-streaming-cluster.sh" and
> > >> "start-streaming-local.sh"
> > >>
> > >>  - An execution mode parameter for the TaskManager ("batch /
> streaming")
> > >>
> > >>  - An execution mode parameter for the JobManager TaskManager ("batch
> /
> > >> streaming")
> > >>
> > >>  - All local executors and mini clusters need a flag that specifies
> > whether
> > >> they will start
> > >>    a streaming cluster, or a pure batch cluster.
> > >>
> > >>
> > >> Anything else that comes to your minds?
> > >>
> > >>
> > >> Greetings,
> > >> Stephan
> > >>
> > >
> >
>

Reply via email to