I discussed a bit via Skype with Gyula and Paris.
We thought about the following way to do it: - We add a dedicated streaming mode for now. The streaming mode supersedes the batch mode, so it can run both type of programs. - The streaming mode sets the memory manager to "lazy allocation". -> So long as it runs pure streaming jobs, the full heap will be available to window buffers and UDFs. -> Batch programs can still run, so mixed workloads are not prevented. Batch programs are a bit less robust there, because the memory manager does not pre-allocate memory. UDFs can eat into Flink's memory portion. - The streaming mode starts the necessary configured components/services for state backups Over the next versions, we want to bring these things together: - use the managed memory for window buffers - on-demand starting of the state backend Then, we deprecate the streaming mode, let both modes start the cluster in the same way. On Thu, May 21, 2015 at 4:01 PM, Aljoscha Krettek <aljos...@apache.org> wrote: > Would it not be possible to start the snapshot service once the user > starts the first streaming job? About 2) with checkpointing coming up, > would it not make sense to shift to managed memory rather sooner than > later. Then this point would become moot. > > On Thu, May 21, 2015 at 3:47 PM, Matthias J. Sax > <mj...@informatik.hu-berlin.de> wrote: > > What would be the consequences on "mixed" programs? (If there is any > > plan to support those?) > > > > Would it be necessary to have a third mode? Or would those programs > > simple run in streaming mode? > > > > -Matthias > > > > On 05/21/2015 03:12 PM, Stephan Ewen wrote: > >> Hi all! > >> > >> We discussed a while back about introducing a dedicated streaming mode > for > >> Flink. I would like to take a go at this and implement the changes, but > >> discuss them before. > >> > >> > >> Here is a brief summary why we wanted to introduce the dedicated > streaming > >> mode: > >> Even though both batch and streaming are executed by the same execution > >> engine, > >> a streaming setup of Flink varies a bit from a batch setup: > >> > >> 1) The streaming cluster starts an additional service to store the > >> distributed state snapshots. > >> > >> 2) Streaming mode uses memory a bit different, so we should configure > the > >> memory manager differently. This difference may eventually go away. > >> > >> > >> > >> Concretely, to implement this, I was thinking about introducing the > >> following externally visible changes > >> > >> - Additional scripts "start-streaming-cluster.sh" and > >> "start-streaming-local.sh" > >> > >> - An execution mode parameter for the TaskManager ("batch / streaming") > >> > >> - An execution mode parameter for the JobManager TaskManager ("batch / > >> streaming") > >> > >> - All local executors and mini clusters need a flag that specifies > whether > >> they will start > >> a streaming cluster, or a pure batch cluster. > >> > >> > >> Anything else that comes to your minds? > >> > >> > >> Greetings, > >> Stephan > >> > > >