Hi All, I have set up a flink standalone cluster, with one master and two slave , all RedHat-7 machines. In the master Dashboard http://flink-master:8081/ I can see 2 Task Manager and 8 task slot as I have set taskmanager.numberOfTaskSlots: 4 in flink-conf.yaml in all of the slaves.
Now when I first ran my program with no parallelism mention, I got exception :: java.io.IOException: Could not connect to BlobServer at address /master-node-ip:49313. So I unblocked port 49313 from the firewall and then my program ran successfully using a single task slot. I have couple of questions regarding this: 1. How does flink uses port 49313, Is this port number arbitrary? How would I know which port to unblock before running my program? 2. I wanted my job to utilize all my task slots, so I ran my program using ./bin/flink run -p 8 myjars/flinkstream-flinkcluster.jar. I again got the exception java.io.IOException: Connecting the channel failed: Connecting to remote task manager + 'slave1-url.com/slave1-ip:45835' has failed. This might indicate that the remote task manager has been lost. And java.io.IOException: Connecting the channel failed: Connecting to remote task manager + 'slave2-url.com/slave2-ip:45086' has failed. This might indicate that the remote task manager has been lost. Again after unblocking these ports on the respective machines my program ran successfully utilizing all the 8 task slots. What I want to know is how Flink is behaving now as all these 8 tasks are reading from the same Kafka topic. Will each task gets the same data from the Kafka topic or each task will be receiving separate data from others? What I want is to distribute the events from the same Kafka topic evenly to all the available task slots. Is this a proper way to do so? Thanks & Regards Zeeshan Alam