Agreed. But how did Flink decide that it should allot 1 subtask? Why not 2
or 3?
I am trying to understand the implications of using setMaxParallelism vs
setParallelism

On Wed, Mar 28, 2018 at 2:58 PM, Nico Kruber <n...@data-artisans.com> wrote:

> Hi James,
> the number of subtasks being used is defined by the parallelism, the max
> parallelism, however, "... determines the maximum parallelism to which
> you can scale operators" [1]. That is, once set, you cannot ever (even
> after restarting your program from a savepoint) increase the operator's
> parallelism above this value. The actual parallelism can be set per job
> in your program but also in the flink client:
> flink run -p <parallelism> <jar-file> <arguments>
>
>
> Nico
>
>
>
> [1]
> https://ci.apache.org/projects/flink/flink-docs-
> master/ops/production_ready.html#set-maximum-parallelism-
> for-operators-explicitly
>
> On 28/03/18 09:25, Data Engineer wrote:
> > I have a sample application that reads around 2 GB of csv files,
> > converts each record into Avro object and sends it to kafka.
> > I use a custom FileReader that reads the files in a directory.
> > I have set taskmanager.numberOfTaskSlots to 4.
> > I see that if I use setParallelism(3), 3 subtasks are created. But if I
> > use setMaxParallelism(3), only 1 subtask is created.
> >
> > On Wed, Mar 28, 2018 at 12:29 PM, Jörn Franke <jornfra...@gmail.com
> > <mailto:jornfra...@gmail.com>> wrote:
> >
> >     What was the input format, the size and the program that you tried
> >     to execute
> >
> >     On 28. Mar 2018, at 08:18, Data Engineer <dataenginee...@gmail.com
> >     <mailto:dataenginee...@gmail.com>> wrote:
> >
> >>     I went through the explanation on MaxParallelism in the official
> >>     docs here:
> >>     https://ci.apache.org/projects/flink/flink-docs-
> master/ops/production_ready.html#set-maximum-parallelism-
> for-operators-explicitly
> >>     <https://ci.apache.org/projects/flink/flink-docs-
> master/ops/production_ready.html#set-maximum-parallelism-
> for-operators-explicitly>
> >>
> >>     However, I am not able to figure out how Flink decides the
> >>     parallelism value.
> >>     For instance, if I setMaxParallelism to 3, I see that for my job,
> >>     there is only 1 subtask that is created. How did Flink decide that
> >>     1 subtask was enough?
> >>
> >>     Regards,
> >>     James
> >
> >
>
> --
> Nico Kruber | Software Engineer
> data Artisans
>
> Follow us @dataArtisans
> --
> Join Flink Forward - The Apache Flink Conference
> Stream Processing | Event Driven | Real Time
> --
> Data Artisans GmbH | Stresemannstr. 121A,10963 Berlin, Germany
> data Artisans, Inc. | 1161 Mission Street, San Francisco, CA-94103, USA
> --
> Data Artisans GmbH
> Registered at Amtsgericht Charlottenburg: HRB 158244 B
> Managing Directors: Dr. Kostas Tzoumas, Dr. Stephan Ewen
>
>

Reply via email to