Re: Running streaming job on every node of cluster

Nico Kruber Mon, 27 Feb 2017 09:50:44 -0800

this may also be a good read:

https://ci.apache.org/projects/flink/flink-docs-release-1.2/concepts/
runtime.html#task-slots-and-resources


On Monday, 27 February 2017 18:40:48 CET Nico Kruber wrote:
> What about setting the parallelism[1] to the total number of slots in your
> cluster?
> By default, all parts of your program are put into the same slot sharing
> group[2] and by setting the parallelism you would have this slot (with your
> whole program) in each parallel slot as well (minus/plus operators that have
> lower/higher parallelism), if I understand it correctly.
> 
> 
> Nico
> 
> [1] https://ci.apache.org/projects/flink/flink-docs-release-1.2/dev/
> parallel.html
> [2] https://ci.apache.org/projects/flink/flink-docs-release-1.2/dev/
> datastream_api.html#task-chaining-and-resource-groups
> 
> On Monday, 27 February 2017 18:17:53 CET Evgeny Kincharov wrote:
> > Thanks for your answer.
> > The problem is that both slots are seized in the one node. Of course if
> > this node has enough free slots. Another nodes idle. I want to utilize
> > cluster resource little bit more. May be the other deployment modes allow
> > it.
> > 
> > BR, Evgeny.
> > 
> > От: Nico Kruber<mailto:n...@data-artisans.com>
> > Отправлено: 27 февраля 2017 г. в 20:07
> > Кому: user@flink.apache.org<mailto:user@flink.apache.org>
> > Копия: Evgeny Kincharov<mailto:evgeny_kincha...@epam.com>
> > Тема: Re: Running streaming job on every node of cluster
> > 
> > Hi Evgeny,
> > I tried to reproduce your example with the following code, having another
> > console listening with "nc -l 12345"
> > 
> > env.setParallelism(2);
> > env.addSource(new SocketTextStreamFunction("localhost", 12345, " ", 3))
> > 
> >                                .map(new MapFunction<String, String>() {
> >                                
> >                                                @Override
> >                                                public String map(final
> > 
> > String s) throws Exception { return s; }
> 
>  })
> 
> >                                .addSink(new DiscardingSink<String>());
> > 
> > This way, I do get a source with parallelism 1 and map & sink with
> > parallelism
> 
>  2 and the whole program accompanying 2 slots as expected. You
> 
> > can check in the web interface of your cluster how many slots are taken
> > after executing one instance of your program.
> > 
> > How do you set your parallelism?
> > 
> > 
> > Nico
> > 
> > On Monday, 27 February 2017 14:04:21 CET Evgeny Kincharov wrote:
> > > Hi,
> > > 
> > > 
> > > 
> > > I have the simplest streaming job, and I want to distribute my job on
> > > every
> 
>  node of my Flink cluster.
> 
> > > Job is simple:
> > > 
> > > 
> > > 
> > > source (SocketTextStream) -> map -> sink (AsyncHtttpSink).
> > > 
> > > 
> > > 
> > > When I increase parallelism of my job when deploying or directly in
> > > code,
> > > no
> 
>  effect because source is can't work in parallel. Now I reduce "Tasks
> 
> > > Slots" to 1 on ever nodes and deploy my job as many times as nodes in
> > > the
> > > cluster. It works when I have only one job. If I want deploy another in
> > > parallel there is no free slots. I hope more convenient way to do that
> > > is
> > > exists. Thanks.
> > > 
> > > 
> > > 
> > > BR,
> > > Evgeny

signature.asc
Description: This is a digitally signed message part.

Re: Running streaming job on every node of cluster

Reply via email to