Regarding the size of Flink cluster

Jessy Ping Fri, 10 Dec 2021 02:46:55 -0800

Hi All,


I have the following questions regarding the sizing of the Flink cluster
doing stateful computation using Datastream API. It will be better if the
community can answer the below questions or doubts.



Suppose we have a pipeline as follows,


*Kafka real time events source1 & Kafka rules source 2 ->
KeyedBroadcastProcessFunction -> Kafka Sink*


As you can see, we will be processing the real-time events from the Kafka
source using the rules broadcasted from the rule source with the help of
keyed broadcast function.


*Questions*



   - I have a machine with 16 CPUs and 32 GB Ram. Which configuration is
   efficient for achieving the target parallelism of 16?


   1. A single task manager with 16 task slots
   2. 16 Task Managers with 1 task slot and 1 CPU each.




   - If I have a broadcast state in my pipeline and I have a single task
   manager with 16 task slots for achieving the target parallelism of 16. Does
   Flink keep 16 copies of broadcast state in the single task manager or there
   will be a single copy in the HEAP for the entire task slots?



   - If a parallelism of n means, I can process only n events/seconds(if
   the latency of the pipeline is 1s.). How many requests a single task
   slot (containing a single task) can execute at a time ?



   - Can Flink process multiple events from the same key at the same time?



   - I have found the following blog regarding the Flink cluster size,
   
https://www.ververica.com/blog/how-to-size-your-apache-flink-cluster-general-guidelines.
   Do we have some other blogs, testimonials, or books regarding the sample
   production setup/configuration of a Flink cluster for achieving
   different ranges of throughput ?



   - Are there any blogs regarding the results of Flink's load testing
   results ?


Thanks

Jessy

Regarding the size of Flink cluster

Reply via email to