Huge +1 from my side :) Sorry for the late response.
On Thu, May 21, 2015 at 9:54 PM, Aljoscha Krettek <aljos...@apache.org> wrote: > This sounds very reasonable. > On May 21, 2015 9:34 PM, "Stephan Ewen" <se...@apache.org> wrote: > > > I discussed a bit via Skype with Gyula and Paris. > > > > > > We thought about the following way to do it: > > > > - We add a dedicated streaming mode for now. The streaming mode > supersedes > > the batch mode, so it can run both type of programs. > > > > - The streaming mode sets the memory manager to "lazy allocation". > > -> So long as it runs pure streaming jobs, the full heap will be > > available to window buffers and UDFs. > > -> Batch programs can still run, so mixed workloads are not > prevented. > > Batch programs are a bit less robust there, because the memory manager > does > > not pre-allocate memory. UDFs can eat into Flink's memory portion. > > > > - The streaming mode starts the necessary configured components/services > > for state backups > > > > > > > > Over the next versions, we want to bring these things together: > > - use the managed memory for window buffers > > - on-demand starting of the state backend > > > > Then, we deprecate the streaming mode, let both modes start the cluster > in > > the same way. > > > > > > > > > > > > On Thu, May 21, 2015 at 4:01 PM, Aljoscha Krettek <aljos...@apache.org> > > wrote: > > > > > Would it not be possible to start the snapshot service once the user > > > starts the first streaming job? About 2) with checkpointing coming up, > > > would it not make sense to shift to managed memory rather sooner than > > > later. Then this point would become moot. > > > > > > On Thu, May 21, 2015 at 3:47 PM, Matthias J. Sax > > > <mj...@informatik.hu-berlin.de> wrote: > > > > What would be the consequences on "mixed" programs? (If there is any > > > > plan to support those?) > > > > > > > > Would it be necessary to have a third mode? Or would those programs > > > > simple run in streaming mode? > > > > > > > > -Matthias > > > > > > > > On 05/21/2015 03:12 PM, Stephan Ewen wrote: > > > >> Hi all! > > > >> > > > >> We discussed a while back about introducing a dedicated streaming > mode > > > for > > > >> Flink. I would like to take a go at this and implement the changes, > > but > > > >> discuss them before. > > > >> > > > >> > > > >> Here is a brief summary why we wanted to introduce the dedicated > > > streaming > > > >> mode: > > > >> Even though both batch and streaming are executed by the same > > execution > > > >> engine, > > > >> a streaming setup of Flink varies a bit from a batch setup: > > > >> > > > >> 1) The streaming cluster starts an additional service to store the > > > >> distributed state snapshots. > > > >> > > > >> 2) Streaming mode uses memory a bit different, so we should > configure > > > the > > > >> memory manager differently. This difference may eventually go away. > > > >> > > > >> > > > >> > > > >> Concretely, to implement this, I was thinking about introducing the > > > >> following externally visible changes > > > >> > > > >> - Additional scripts "start-streaming-cluster.sh" and > > > >> "start-streaming-local.sh" > > > >> > > > >> - An execution mode parameter for the TaskManager ("batch / > > streaming") > > > >> > > > >> - An execution mode parameter for the JobManager TaskManager > ("batch > > / > > > >> streaming") > > > >> > > > >> - All local executors and mini clusters need a flag that specifies > > > whether > > > >> they will start > > > >> a streaming cluster, or a pure batch cluster. > > > >> > > > >> > > > >> Anything else that comes to your minds? > > > >> > > > >> > > > >> Greetings, > > > >> Stephan > > > >> > > > > > > > > > >