Thanks Aljoscha and Stephan, this helps - Henry
On Fri, May 22, 2015 at 4:37 AM, Stephan Ewen <se...@apache.org> wrote: > Aljoscha is right. There are plans to migrate the streaming state to the > MemoryManager as well, but streaming state is not managed at this point. > > What is managed in streaming jobs is the data buffered and cached in the > network stack. But that is a different memory pool than the memory manager. > We keep those pools separate because the network stack is currently more > advanced in terms of dynamically rebalancing memory, compared to the memory > manager. > > On Fri, May 22, 2015 at 12:25 PM, Aljoscha Krettek <aljos...@apache.org> > wrote: > >> Hi, >> streaming currently does not use any memory manager. All state is kept >> in Java Objects on the Java Heap, for example an ArrayList<> for the >> window buffer. >> >> On Thu, May 21, 2015 at 11:56 PM, Henry Saputra <henry.sapu...@gmail.com> >> wrote: >> > Hi Stephan, Gyula, Paris, >> > >> > How does streaming currently different in term of memory management? >> > Currently we only have one MemoryManager which is used by both modes I >> > believe. >> > >> > - Henry >> > >> > On Thu, May 21, 2015 at 12:34 PM, Stephan Ewen <se...@apache.org> wrote: >> >> I discussed a bit via Skype with Gyula and Paris. >> >> >> >> >> >> We thought about the following way to do it: >> >> >> >> - We add a dedicated streaming mode for now. The streaming mode >> supersedes >> >> the batch mode, so it can run both type of programs. >> >> >> >> - The streaming mode sets the memory manager to "lazy allocation". >> >> -> So long as it runs pure streaming jobs, the full heap will be >> >> available to window buffers and UDFs. >> >> -> Batch programs can still run, so mixed workloads are not >> prevented. >> >> Batch programs are a bit less robust there, because the memory manager >> does >> >> not pre-allocate memory. UDFs can eat into Flink's memory portion. >> >> >> >> - The streaming mode starts the necessary configured >> components/services >> >> for state backups >> >> >> >> >> >> >> >> Over the next versions, we want to bring these things together: >> >> - use the managed memory for window buffers >> >> - on-demand starting of the state backend >> >> >> >> Then, we deprecate the streaming mode, let both modes start the cluster >> in >> >> the same way. >> >> >> >> >> >> >> >> >> >> >> >> On Thu, May 21, 2015 at 4:01 PM, Aljoscha Krettek <aljos...@apache.org> >> >> wrote: >> >> >> >>> Would it not be possible to start the snapshot service once the user >> >>> starts the first streaming job? About 2) with checkpointing coming up, >> >>> would it not make sense to shift to managed memory rather sooner than >> >>> later. Then this point would become moot. >> >>> >> >>> On Thu, May 21, 2015 at 3:47 PM, Matthias J. Sax >> >>> <mj...@informatik.hu-berlin.de> wrote: >> >>> > What would be the consequences on "mixed" programs? (If there is any >> >>> > plan to support those?) >> >>> > >> >>> > Would it be necessary to have a third mode? Or would those programs >> >>> > simple run in streaming mode? >> >>> > >> >>> > -Matthias >> >>> > >> >>> > On 05/21/2015 03:12 PM, Stephan Ewen wrote: >> >>> >> Hi all! >> >>> >> >> >>> >> We discussed a while back about introducing a dedicated streaming >> mode >> >>> for >> >>> >> Flink. I would like to take a go at this and implement the changes, >> but >> >>> >> discuss them before. >> >>> >> >> >>> >> >> >>> >> Here is a brief summary why we wanted to introduce the dedicated >> >>> streaming >> >>> >> mode: >> >>> >> Even though both batch and streaming are executed by the same >> execution >> >>> >> engine, >> >>> >> a streaming setup of Flink varies a bit from a batch setup: >> >>> >> >> >>> >> 1) The streaming cluster starts an additional service to store the >> >>> >> distributed state snapshots. >> >>> >> >> >>> >> 2) Streaming mode uses memory a bit different, so we should >> configure >> >>> the >> >>> >> memory manager differently. This difference may eventually go away. >> >>> >> >> >>> >> >> >>> >> >> >>> >> Concretely, to implement this, I was thinking about introducing the >> >>> >> following externally visible changes >> >>> >> >> >>> >> - Additional scripts "start-streaming-cluster.sh" and >> >>> >> "start-streaming-local.sh" >> >>> >> >> >>> >> - An execution mode parameter for the TaskManager ("batch / >> streaming") >> >>> >> >> >>> >> - An execution mode parameter for the JobManager TaskManager >> ("batch / >> >>> >> streaming") >> >>> >> >> >>> >> - All local executors and mini clusters need a flag that specifies >> >>> whether >> >>> >> they will start >> >>> >> a streaming cluster, or a pure batch cluster. >> >>> >> >> >>> >> >> >>> >> Anything else that comes to your minds? >> >>> >> >> >>> >> >> >>> >> Greetings, >> >>> >> Stephan >> >>> >> >> >>> > >> >>> >>