This sounds very reasonable. On May 21, 2015 9:34 PM, "Stephan Ewen" <se...@apache.org> wrote:
> I discussed a bit via Skype with Gyula and Paris. > > > We thought about the following way to do it: > > - We add a dedicated streaming mode for now. The streaming mode supersedes > the batch mode, so it can run both type of programs. > > - The streaming mode sets the memory manager to "lazy allocation". > -> So long as it runs pure streaming jobs, the full heap will be > available to window buffers and UDFs. > -> Batch programs can still run, so mixed workloads are not prevented. > Batch programs are a bit less robust there, because the memory manager does > not pre-allocate memory. UDFs can eat into Flink's memory portion. > > - The streaming mode starts the necessary configured components/services > for state backups > > > > Over the next versions, we want to bring these things together: > - use the managed memory for window buffers > - on-demand starting of the state backend > > Then, we deprecate the streaming mode, let both modes start the cluster in > the same way. > > > > > > On Thu, May 21, 2015 at 4:01 PM, Aljoscha Krettek <aljos...@apache.org> > wrote: > > > Would it not be possible to start the snapshot service once the user > > starts the first streaming job? About 2) with checkpointing coming up, > > would it not make sense to shift to managed memory rather sooner than > > later. Then this point would become moot. > > > > On Thu, May 21, 2015 at 3:47 PM, Matthias J. Sax > > <mj...@informatik.hu-berlin.de> wrote: > > > What would be the consequences on "mixed" programs? (If there is any > > > plan to support those?) > > > > > > Would it be necessary to have a third mode? Or would those programs > > > simple run in streaming mode? > > > > > > -Matthias > > > > > > On 05/21/2015 03:12 PM, Stephan Ewen wrote: > > >> Hi all! > > >> > > >> We discussed a while back about introducing a dedicated streaming mode > > for > > >> Flink. I would like to take a go at this and implement the changes, > but > > >> discuss them before. > > >> > > >> > > >> Here is a brief summary why we wanted to introduce the dedicated > > streaming > > >> mode: > > >> Even though both batch and streaming are executed by the same > execution > > >> engine, > > >> a streaming setup of Flink varies a bit from a batch setup: > > >> > > >> 1) The streaming cluster starts an additional service to store the > > >> distributed state snapshots. > > >> > > >> 2) Streaming mode uses memory a bit different, so we should configure > > the > > >> memory manager differently. This difference may eventually go away. > > >> > > >> > > >> > > >> Concretely, to implement this, I was thinking about introducing the > > >> following externally visible changes > > >> > > >> - Additional scripts "start-streaming-cluster.sh" and > > >> "start-streaming-local.sh" > > >> > > >> - An execution mode parameter for the TaskManager ("batch / > streaming") > > >> > > >> - An execution mode parameter for the JobManager TaskManager ("batch > / > > >> streaming") > > >> > > >> - All local executors and mini clusters need a flag that specifies > > whether > > >> they will start > > >> a streaming cluster, or a pure batch cluster. > > >> > > >> > > >> Anything else that comes to your minds? > > >> > > >> > > >> Greetings, > > >> Stephan > > >> > > > > > >