The streaming mode runs batch jobs as well :-) There should be slightly reduced predictability in the memory management in the streaming mode, but otherwise there should not be a problem.
So if you want to run mixed workloads, you start the streaming mode. (Note: Currently, the batch mode runs streaming jobs as well, but gives them very little memory. I am thinking of prohibiting that (separate discussion), to prevent people from not noticing that and running a highly sub-optimal Flink setup.) On Tue, May 26, 2015 at 8:26 PM, Henry Saputra <henry.sapu...@gmail.com> wrote: > One immediate concern I have is the deployment topology. With > streaming has its own cluster deployment, this means that in > standalone mode, if ops would like to deploy Flink it has to know what > mode it needs to deploy Flink as, either batch or Streaming. So, if > the use case was to support both batch and streaming, would that mean > the deployment need to separate 2 clusters to support different > applications to run on Flink? > > I think this would be ok if Flink is deployed in YARN or other > resource management platforms like Mesos or Apache Myriad. Maybe > someone, like Robert, could confirm this is the case. > > - Henry > > On Tue, May 26, 2015 at 1:51 AM, Maximilian Michels <m...@apache.org> > wrote: > > +1 great changes coming up! I like the idea that, ultimately, Flink will > > handle streaming and batch programs equally well independently of the > > chosen cluster startup mode. > > > > What is the time frame for these changes? > > > > On Tue, May 26, 2015 at 7:34 AM, Henry Saputra <henry.sapu...@gmail.com> > > wrote: > > > >> Thanks Aljoscha and Stephan, this helps > >> > >> - Henry > >> > >> On Fri, May 22, 2015 at 4:37 AM, Stephan Ewen <se...@apache.org> wrote: > >> > Aljoscha is right. There are plans to migrate the streaming state to > the > >> > MemoryManager as well, but streaming state is not managed at this > point. > >> > > >> > What is managed in streaming jobs is the data buffered and cached in > the > >> > network stack. But that is a different memory pool than the memory > >> manager. > >> > We keep those pools separate because the network stack is currently > more > >> > advanced in terms of dynamically rebalancing memory, compared to the > >> memory > >> > manager. > >> > > >> > On Fri, May 22, 2015 at 12:25 PM, Aljoscha Krettek < > aljos...@apache.org> > >> > wrote: > >> > > >> >> Hi, > >> >> streaming currently does not use any memory manager. All state is > kept > >> >> in Java Objects on the Java Heap, for example an ArrayList<> for the > >> >> window buffer. > >> >> > >> >> On Thu, May 21, 2015 at 11:56 PM, Henry Saputra < > >> henry.sapu...@gmail.com> > >> >> wrote: > >> >> > Hi Stephan, Gyula, Paris, > >> >> > > >> >> > How does streaming currently different in term of memory > management? > >> >> > Currently we only have one MemoryManager which is used by both > modes I > >> >> > believe. > >> >> > > >> >> > - Henry > >> >> > > >> >> > On Thu, May 21, 2015 at 12:34 PM, Stephan Ewen <se...@apache.org> > >> wrote: > >> >> >> I discussed a bit via Skype with Gyula and Paris. > >> >> >> > >> >> >> > >> >> >> We thought about the following way to do it: > >> >> >> > >> >> >> - We add a dedicated streaming mode for now. The streaming mode > >> >> supersedes > >> >> >> the batch mode, so it can run both type of programs. > >> >> >> > >> >> >> - The streaming mode sets the memory manager to "lazy > allocation". > >> >> >> -> So long as it runs pure streaming jobs, the full heap will > be > >> >> >> available to window buffers and UDFs. > >> >> >> -> Batch programs can still run, so mixed workloads are not > >> >> prevented. > >> >> >> Batch programs are a bit less robust there, because the memory > >> manager > >> >> does > >> >> >> not pre-allocate memory. UDFs can eat into Flink's memory portion. > >> >> >> > >> >> >> - The streaming mode starts the necessary configured > >> >> components/services > >> >> >> for state backups > >> >> >> > >> >> >> > >> >> >> > >> >> >> Over the next versions, we want to bring these things together: > >> >> >> - use the managed memory for window buffers > >> >> >> - on-demand starting of the state backend > >> >> >> > >> >> >> Then, we deprecate the streaming mode, let both modes start the > >> cluster > >> >> in > >> >> >> the same way. > >> >> >> > >> >> >> > >> >> >> > >> >> >> > >> >> >> > >> >> >> On Thu, May 21, 2015 at 4:01 PM, Aljoscha Krettek < > >> aljos...@apache.org> > >> >> >> wrote: > >> >> >> > >> >> >>> Would it not be possible to start the snapshot service once the > user > >> >> >>> starts the first streaming job? About 2) with checkpointing > coming > >> up, > >> >> >>> would it not make sense to shift to managed memory rather sooner > >> than > >> >> >>> later. Then this point would become moot. > >> >> >>> > >> >> >>> On Thu, May 21, 2015 at 3:47 PM, Matthias J. Sax > >> >> >>> <mj...@informatik.hu-berlin.de> wrote: > >> >> >>> > What would be the consequences on "mixed" programs? (If there > is > >> any > >> >> >>> > plan to support those?) > >> >> >>> > > >> >> >>> > Would it be necessary to have a third mode? Or would those > >> programs > >> >> >>> > simple run in streaming mode? > >> >> >>> > > >> >> >>> > -Matthias > >> >> >>> > > >> >> >>> > On 05/21/2015 03:12 PM, Stephan Ewen wrote: > >> >> >>> >> Hi all! > >> >> >>> >> > >> >> >>> >> We discussed a while back about introducing a dedicated > streaming > >> >> mode > >> >> >>> for > >> >> >>> >> Flink. I would like to take a go at this and implement the > >> changes, > >> >> but > >> >> >>> >> discuss them before. > >> >> >>> >> > >> >> >>> >> > >> >> >>> >> Here is a brief summary why we wanted to introduce the > dedicated > >> >> >>> streaming > >> >> >>> >> mode: > >> >> >>> >> Even though both batch and streaming are executed by the same > >> >> execution > >> >> >>> >> engine, > >> >> >>> >> a streaming setup of Flink varies a bit from a batch setup: > >> >> >>> >> > >> >> >>> >> 1) The streaming cluster starts an additional service to store > >> the > >> >> >>> >> distributed state snapshots. > >> >> >>> >> > >> >> >>> >> 2) Streaming mode uses memory a bit different, so we should > >> >> configure > >> >> >>> the > >> >> >>> >> memory manager differently. This difference may eventually go > >> away. > >> >> >>> >> > >> >> >>> >> > >> >> >>> >> > >> >> >>> >> Concretely, to implement this, I was thinking about > introducing > >> the > >> >> >>> >> following externally visible changes > >> >> >>> >> > >> >> >>> >> - Additional scripts "start-streaming-cluster.sh" and > >> >> >>> >> "start-streaming-local.sh" > >> >> >>> >> > >> >> >>> >> - An execution mode parameter for the TaskManager ("batch / > >> >> streaming") > >> >> >>> >> > >> >> >>> >> - An execution mode parameter for the JobManager TaskManager > >> >> ("batch / > >> >> >>> >> streaming") > >> >> >>> >> > >> >> >>> >> - All local executors and mini clusters need a flag that > >> specifies > >> >> >>> whether > >> >> >>> >> they will start > >> >> >>> >> a streaming cluster, or a pure batch cluster. > >> >> >>> >> > >> >> >>> >> > >> >> >>> >> Anything else that comes to your minds? > >> >> >>> >> > >> >> >>> >> > >> >> >>> >> Greetings, > >> >> >>> >> Stephan > >> >> >>> >> > >> >> >>> > > >> >> >>> > >> >> > >> >