The streaming mode runs batch jobs as well :-)

There should be slightly reduced predictability in the memory management in
the streaming mode, but otherwise there should not be a problem.

So if you want to run mixed workloads, you start the streaming mode.


(Note: Currently, the batch mode runs streaming jobs as well, but gives
them very little memory. I am thinking of prohibiting that (separate
discussion), to prevent people from not noticing that and running a highly
sub-optimal Flink setup.)


On Tue, May 26, 2015 at 8:26 PM, Henry Saputra <henry.sapu...@gmail.com>
wrote:

> One immediate concern I have is the deployment topology. With
> streaming has its own cluster deployment, this means that in
> standalone mode, if ops would like to deploy Flink it has to know what
> mode it needs to deploy Flink as, either batch or Streaming. So, if
> the use case was to support both batch and streaming, would that mean
> the deployment need to separate 2 clusters to support different
> applications to run on Flink?
>
> I think this would be ok if Flink is deployed in YARN or other
> resource management platforms like Mesos or Apache Myriad. Maybe
> someone, like Robert, could confirm this is the case.
>
> - Henry
>
> On Tue, May 26, 2015 at 1:51 AM, Maximilian Michels <m...@apache.org>
> wrote:
> > +1 great changes coming up! I like the idea that, ultimately, Flink will
> > handle streaming and batch programs equally well independently of the
> > chosen cluster startup mode.
> >
> > What is the time frame for these changes?
> >
> > On Tue, May 26, 2015 at 7:34 AM, Henry Saputra <henry.sapu...@gmail.com>
> > wrote:
> >
> >> Thanks Aljoscha and Stephan, this helps
> >>
> >> - Henry
> >>
> >> On Fri, May 22, 2015 at 4:37 AM, Stephan Ewen <se...@apache.org> wrote:
> >> > Aljoscha is right. There are plans to migrate the streaming state to
> the
> >> > MemoryManager as well, but streaming state is not managed at this
> point.
> >> >
> >> > What is managed in streaming jobs is the data buffered and cached in
> the
> >> > network stack. But that is a different memory pool than the memory
> >> manager.
> >> > We keep those pools separate because the network stack is currently
> more
> >> > advanced in terms of dynamically rebalancing memory, compared to the
> >> memory
> >> > manager.
> >> >
> >> > On Fri, May 22, 2015 at 12:25 PM, Aljoscha Krettek <
> aljos...@apache.org>
> >> > wrote:
> >> >
> >> >> Hi,
> >> >> streaming currently does not use any memory manager. All state is
> kept
> >> >> in Java Objects on the Java Heap, for example an ArrayList<> for the
> >> >> window buffer.
> >> >>
> >> >> On Thu, May 21, 2015 at 11:56 PM, Henry Saputra <
> >> henry.sapu...@gmail.com>
> >> >> wrote:
> >> >> > Hi Stephan, Gyula, Paris,
> >> >> >
> >> >> > How does streaming currently different in term of memory
> management?
> >> >> > Currently we only have one MemoryManager which is used by both
> modes I
> >> >> > believe.
> >> >> >
> >> >> > - Henry
> >> >> >
> >> >> > On Thu, May 21, 2015 at 12:34 PM, Stephan Ewen <se...@apache.org>
> >> wrote:
> >> >> >> I discussed a bit via Skype with Gyula and Paris.
> >> >> >>
> >> >> >>
> >> >> >> We thought about the following way to do it:
> >> >> >>
> >> >> >>  - We add a dedicated streaming mode for now. The streaming mode
> >> >> supersedes
> >> >> >> the batch mode, so it can run both type of programs.
> >> >> >>
> >> >> >>  - The streaming mode sets the memory manager to "lazy
> allocation".
> >> >> >>     -> So long as it runs pure streaming jobs, the full heap will
> be
> >> >> >> available to window buffers and UDFs.
> >> >> >>     -> Batch programs can still run, so mixed workloads are not
> >> >> prevented.
> >> >> >> Batch programs are a bit less robust there, because the memory
> >> manager
> >> >> does
> >> >> >> not pre-allocate memory. UDFs can eat into Flink's memory portion.
> >> >> >>
> >> >> >>  - The streaming mode starts the necessary configured
> >> >> components/services
> >> >> >> for state backups
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >> Over the next versions, we want to bring these things together:
> >> >> >>   - use the managed memory for window buffers
> >> >> >>   - on-demand starting of the state backend
> >> >> >>
> >> >> >> Then, we deprecate the streaming mode, let both modes start the
> >> cluster
> >> >> in
> >> >> >> the same way.
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >> On Thu, May 21, 2015 at 4:01 PM, Aljoscha Krettek <
> >> aljos...@apache.org>
> >> >> >> wrote:
> >> >> >>
> >> >> >>> Would it not be possible to start the snapshot service once the
> user
> >> >> >>> starts the first streaming job? About 2) with checkpointing
> coming
> >> up,
> >> >> >>> would it not make sense to shift to managed memory rather sooner
> >> than
> >> >> >>> later. Then this point would become moot.
> >> >> >>>
> >> >> >>> On Thu, May 21, 2015 at 3:47 PM, Matthias J. Sax
> >> >> >>> <mj...@informatik.hu-berlin.de> wrote:
> >> >> >>> > What would be the consequences on "mixed" programs? (If there
> is
> >> any
> >> >> >>> > plan to support those?)
> >> >> >>> >
> >> >> >>> > Would it be necessary to have a third mode? Or would those
> >> programs
> >> >> >>> > simple run in streaming mode?
> >> >> >>> >
> >> >> >>> > -Matthias
> >> >> >>> >
> >> >> >>> > On 05/21/2015 03:12 PM, Stephan Ewen wrote:
> >> >> >>> >> Hi all!
> >> >> >>> >>
> >> >> >>> >> We discussed a while back about introducing a dedicated
> streaming
> >> >> mode
> >> >> >>> for
> >> >> >>> >> Flink. I would like to take a go at this and implement the
> >> changes,
> >> >> but
> >> >> >>> >> discuss them before.
> >> >> >>> >>
> >> >> >>> >>
> >> >> >>> >> Here is a brief summary why we wanted to introduce the
> dedicated
> >> >> >>> streaming
> >> >> >>> >> mode:
> >> >> >>> >> Even though both batch and streaming are executed by the same
> >> >> execution
> >> >> >>> >> engine,
> >> >> >>> >> a streaming setup of Flink varies a bit from a batch setup:
> >> >> >>> >>
> >> >> >>> >> 1) The streaming cluster starts an additional service to store
> >> the
> >> >> >>> >> distributed state snapshots.
> >> >> >>> >>
> >> >> >>> >> 2) Streaming mode uses memory a bit different, so we should
> >> >> configure
> >> >> >>> the
> >> >> >>> >> memory manager differently. This difference may eventually go
> >> away.
> >> >> >>> >>
> >> >> >>> >>
> >> >> >>> >>
> >> >> >>> >> Concretely, to implement this, I was thinking about
> introducing
> >> the
> >> >> >>> >> following externally visible changes
> >> >> >>> >>
> >> >> >>> >>  - Additional scripts "start-streaming-cluster.sh" and
> >> >> >>> >> "start-streaming-local.sh"
> >> >> >>> >>
> >> >> >>> >>  - An execution mode parameter for the TaskManager ("batch /
> >> >> streaming")
> >> >> >>> >>
> >> >> >>> >>  - An execution mode parameter for the JobManager TaskManager
> >> >> ("batch /
> >> >> >>> >> streaming")
> >> >> >>> >>
> >> >> >>> >>  - All local executors and mini clusters need a flag that
> >> specifies
> >> >> >>> whether
> >> >> >>> >> they will start
> >> >> >>> >>    a streaming cluster, or a pure batch cluster.
> >> >> >>> >>
> >> >> >>> >>
> >> >> >>> >> Anything else that comes to your minds?
> >> >> >>> >>
> >> >> >>> >>
> >> >> >>> >> Greetings,
> >> >> >>> >> Stephan
> >> >> >>> >>
> >> >> >>> >
> >> >> >>>
> >> >>
> >>
>

Reply via email to