Hi Jins George, Every TM brings additional overhead, e.g., more heartbeat messages. However, a cluster with 28 TMs would not be considered big as there are users that are running Flink applications on thousands of cores [1][2].
Best, Gary [1] https://flink.apache.org/flink-architecture.html#run-applications-at-any-scale [2] https://de.slideshare.net/FlinkForward/flink-forward-sf-2017-stephan-ewen-experiences-running-flink-at-very-large-scale On Thu, Feb 14, 2019 at 6:59 PM Jins George <jins.geo...@aeris.net> wrote: > Thanks Gary. Understood the behavior. > > I am leaning towards running 7 TM on each machine(8 core), I have 4 nodes, > that will end up 28 taskmanagers and 1 job manager. I was wondering if this > can bring additional burden on jobmanager? Is it recommended? > > Thanks, > > Jins George > On 2/14/19 8:49 AM, Gary Yao wrote: > > Hi Jins George, > > This has been asked before [1]. The bottom line is that you currently > cannot > pre-allocate TMs and distribute your tasks evenly. You might be able to > achieve a better distribution across hosts by configuring fewer slots in > your > TMs. > > Best, > Gary > > [1] > http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-1-5-job-distribution-over-cluster-nodes-td21588.html > > > On Wed, Feb 13, 2019 at 6:20 AM Tzu-Li (Gordon) Tai <tzuli...@apache.org> > wrote: > >> Hi, >> >> I'm forwarding this question to Gary (CC'ed), who most likely would have >> an answer for your question here. >> >> Cheers, >> Gordon >> >> On Wed, Feb 13, 2019 at 8:33 AM Jins George <jins.geo...@aeris.net> >> wrote: >> >>> Hello community, >>> >>> I am trying to upgrade a Flink Yarn session cluster running BEAM >>> pipelines from version 1.2.0 to 1.6.3. >>> >>> Here is my session start command: yarn-session.sh -d *-n 4* -jm 1024 >>> -tm 3072 *-s 7* >>> >>> Because of the dynamic resource allocation, no taskmanager gets created >>> initially. Now once I submit a job with parallelism 5, I see that 1 >>> task-manager gets created and all 5 parallel instances are scheduled on the >>> same taskmanager( because I have 7 slots). This can create hot spot as >>> only one physical node ( out of 4 in my case) is utilized for processing. >>> >>> I noticed the legacy mode, which would provision all task managers at >>> cluster creation, but since legacy mode is expected to go away soon, I >>> didn't want to try that route. >>> >>> Is there a way I can configure the multiple jobs or parallel instances >>> of same job spread across all the available Yarn nodes and continue using >>> the 'new' mode ? >>> >>> Thanks, >>> >>> Jins George >>> >>