Aljoscha is right. There are plans to migrate the streaming state to the MemoryManager as well, but streaming state is not managed at this point.
What is managed in streaming jobs is the data buffered and cached in the network stack. But that is a different memory pool than the memory manager. We keep those pools separate because the network stack is currently more advanced in terms of dynamically rebalancing memory, compared to the memory manager. On Fri, May 22, 2015 at 12:25 PM, Aljoscha Krettek <aljos...@apache.org> wrote: > Hi, > streaming currently does not use any memory manager. All state is kept > in Java Objects on the Java Heap, for example an ArrayList<> for the > window buffer. > > On Thu, May 21, 2015 at 11:56 PM, Henry Saputra <henry.sapu...@gmail.com> > wrote: > > Hi Stephan, Gyula, Paris, > > > > How does streaming currently different in term of memory management? > > Currently we only have one MemoryManager which is used by both modes I > > believe. > > > > - Henry > > > > On Thu, May 21, 2015 at 12:34 PM, Stephan Ewen <se...@apache.org> wrote: > >> I discussed a bit via Skype with Gyula and Paris. > >> > >> > >> We thought about the following way to do it: > >> > >> - We add a dedicated streaming mode for now. The streaming mode > supersedes > >> the batch mode, so it can run both type of programs. > >> > >> - The streaming mode sets the memory manager to "lazy allocation". > >> -> So long as it runs pure streaming jobs, the full heap will be > >> available to window buffers and UDFs. > >> -> Batch programs can still run, so mixed workloads are not > prevented. > >> Batch programs are a bit less robust there, because the memory manager > does > >> not pre-allocate memory. UDFs can eat into Flink's memory portion. > >> > >> - The streaming mode starts the necessary configured > components/services > >> for state backups > >> > >> > >> > >> Over the next versions, we want to bring these things together: > >> - use the managed memory for window buffers > >> - on-demand starting of the state backend > >> > >> Then, we deprecate the streaming mode, let both modes start the cluster > in > >> the same way. > >> > >> > >> > >> > >> > >> On Thu, May 21, 2015 at 4:01 PM, Aljoscha Krettek <aljos...@apache.org> > >> wrote: > >> > >>> Would it not be possible to start the snapshot service once the user > >>> starts the first streaming job? About 2) with checkpointing coming up, > >>> would it not make sense to shift to managed memory rather sooner than > >>> later. Then this point would become moot. > >>> > >>> On Thu, May 21, 2015 at 3:47 PM, Matthias J. Sax > >>> <mj...@informatik.hu-berlin.de> wrote: > >>> > What would be the consequences on "mixed" programs? (If there is any > >>> > plan to support those?) > >>> > > >>> > Would it be necessary to have a third mode? Or would those programs > >>> > simple run in streaming mode? > >>> > > >>> > -Matthias > >>> > > >>> > On 05/21/2015 03:12 PM, Stephan Ewen wrote: > >>> >> Hi all! > >>> >> > >>> >> We discussed a while back about introducing a dedicated streaming > mode > >>> for > >>> >> Flink. I would like to take a go at this and implement the changes, > but > >>> >> discuss them before. > >>> >> > >>> >> > >>> >> Here is a brief summary why we wanted to introduce the dedicated > >>> streaming > >>> >> mode: > >>> >> Even though both batch and streaming are executed by the same > execution > >>> >> engine, > >>> >> a streaming setup of Flink varies a bit from a batch setup: > >>> >> > >>> >> 1) The streaming cluster starts an additional service to store the > >>> >> distributed state snapshots. > >>> >> > >>> >> 2) Streaming mode uses memory a bit different, so we should > configure > >>> the > >>> >> memory manager differently. This difference may eventually go away. > >>> >> > >>> >> > >>> >> > >>> >> Concretely, to implement this, I was thinking about introducing the > >>> >> following externally visible changes > >>> >> > >>> >> - Additional scripts "start-streaming-cluster.sh" and > >>> >> "start-streaming-local.sh" > >>> >> > >>> >> - An execution mode parameter for the TaskManager ("batch / > streaming") > >>> >> > >>> >> - An execution mode parameter for the JobManager TaskManager > ("batch / > >>> >> streaming") > >>> >> > >>> >> - All local executors and mini clusters need a flag that specifies > >>> whether > >>> >> they will start > >>> >> a streaming cluster, or a pure batch cluster. > >>> >> > >>> >> > >>> >> Anything else that comes to your minds? > >>> >> > >>> >> > >>> >> Greetings, > >>> >> Stephan > >>> >> > >>> > > >>> >