Re: Running streaming job on every node of cluster

Nico Kruber Mon, 27 Feb 2017 09:41:35 -0800

What about setting the parallelism[1] to the total number of slots in your 
cluster?
By default, all parts of your program are put into the same slot sharing 
group[2] and by setting the parallelism you would have this slot (with your 
whole program) in each parallel slot as well (minus/plus operators that have 
lower/higher parallelism), if I understand it correctly.



Nico

[1] https://ci.apache.org/projects/flink/flink-docs-release-1.2/dev/
parallel.html
[2] https://ci.apache.org/projects/flink/flink-docs-release-1.2/dev/
datastream_api.html#task-chaining-and-resource-groups

On Monday, 27 February 2017 18:17:53 CET Evgeny Kincharov wrote:
> Thanks for your answer.
> The problem is that both slots are seized in the one node. Of course if this
> node has enough free slots. Another nodes idle. I want to utilize cluster
> resource little bit more. May be the other deployment modes allow it.
> 
> BR, Evgeny.
> 
> От: Nico Kruber<mailto:n...@data-artisans.com>
> Отправлено: 27 февраля 2017 г. в 20:07
> Кому: user@flink.apache.org<mailto:user@flink.apache.org>
> Копия: Evgeny Kincharov<mailto:evgeny_kincha...@epam.com>
> Тема: Re: Running streaming job on every node of cluster
> 
> Hi Evgeny,
> I tried to reproduce your example with the following code, having another
> console listening with "nc -l 12345"
> 
> env.setParallelism(2);
> env.addSource(new SocketTextStreamFunction("localhost", 12345, " ", 3))
>                                .map(new MapFunction<String, String>() {
>                                                @Override
>                                                public String map(final
> String s) throws Exception { return s; }
 })
>                                .addSink(new DiscardingSink<String>());
> 
> This way, I do get a source with parallelism 1 and map & sink with
> parallelism
 2 and the whole program accompanying 2 slots as expected. You
> can check in the web interface of your cluster how many slots are taken
> after executing one instance of your program.
> 
> How do you set your parallelism?
> 
> 
> Nico
> 
> On Monday, 27 February 2017 14:04:21 CET Evgeny Kincharov wrote:
> 
> > Hi,
> >
> >
> >
> > I have the simplest streaming job, and I want to distribute my job on
> > every
 node of my Flink cluster.
> >
> >
> >
> > Job is simple:
> >
> >
> >
> > source (SocketTextStream) -> map -> sink (AsyncHtttpSink).
> >
> >
> >
> > When I increase parallelism of my job when deploying or directly in code,
> > no
 effect because source is can't work in parallel. Now I reduce "Tasks
> > Slots" to 1 on ever nodes and deploy my job as many times as nodes in the
> > cluster. It works when I have only one job. If I want deploy another in
> > parallel there is no free slots. I hope more convenient way to do that is
> > exists. Thanks.
> >
> >
> >
> > BR,
> > Evgeny
> 
> 
>

signature.asc
Description: This is a digitally signed message part.

Re: Running streaming job on every node of cluster

Reply via email to