Hi James, the number of subtasks being used is defined by the parallelism, the max parallelism, however, "... determines the maximum parallelism to which you can scale operators" [1]. That is, once set, you cannot ever (even after restarting your program from a savepoint) increase the operator's parallelism above this value. The actual parallelism can be set per job in your program but also in the flink client: flink run -p <parallelism> <jar-file> <arguments>
Nico [1] https://ci.apache.org/projects/flink/flink-docs-master/ops/production_ready.html#set-maximum-parallelism-for-operators-explicitly On 28/03/18 09:25, Data Engineer wrote: > I have a sample application that reads around 2 GB of csv files, > converts each record into Avro object and sends it to kafka. > I use a custom FileReader that reads the files in a directory. > I have set taskmanager.numberOfTaskSlots to 4. > I see that if I use setParallelism(3), 3 subtasks are created. But if I > use setMaxParallelism(3), only 1 subtask is created. > > On Wed, Mar 28, 2018 at 12:29 PM, Jörn Franke <jornfra...@gmail.com > <mailto:jornfra...@gmail.com>> wrote: > > What was the input format, the size and the program that you tried > to execute > > On 28. Mar 2018, at 08:18, Data Engineer <dataenginee...@gmail.com > <mailto:dataenginee...@gmail.com>> wrote: > >> I went through the explanation on MaxParallelism in the official >> docs here: >> >> https://ci.apache.org/projects/flink/flink-docs-master/ops/production_ready.html#set-maximum-parallelism-for-operators-explicitly >> >> <https://ci.apache.org/projects/flink/flink-docs-master/ops/production_ready.html#set-maximum-parallelism-for-operators-explicitly> >> >> However, I am not able to figure out how Flink decides the >> parallelism value. >> For instance, if I setMaxParallelism to 3, I see that for my job, >> there is only 1 subtask that is created. How did Flink decide that >> 1 subtask was enough? >> >> Regards, >> James > > -- Nico Kruber | Software Engineer data Artisans Follow us @dataArtisans -- Join Flink Forward - The Apache Flink Conference Stream Processing | Event Driven | Real Time -- Data Artisans GmbH | Stresemannstr. 121A,10963 Berlin, Germany data Artisans, Inc. | 1161 Mission Street, San Francisco, CA-94103, USA -- Data Artisans GmbH Registered at Amtsgericht Charlottenburg: HRB 158244 B Managing Directors: Dr. Kostas Tzoumas, Dr. Stephan Ewen
signature.asc
Description: OpenPGP digital signature