Dude don't put a lot of details about operative media and names. ________________________________ From: Manjeet Duhan <mdu...@operative.com> Sent: Monday, July 29, 2019 11:28 PM To: dev@kafka.apache.org <dev@kafka.apache.org> Cc: Praveen Manvi <pma...@operative.com> Subject: Kafka connect task assignment Improvement ( New Feature )
Hi , This is Manjeet here working in operative media . I have been working on confluent kafka for almost 4 years and have made many customized changes for kafka connect sink and source connectors . I have made changes in kafka code base as well for our requirement. There is one feature I have added recently after discussing with our architect Praveen Manvi which I wanted to discuss with you for larger community usage. Background :- We are running more than 30 connectors in the operative but each connector require different machine specification . E.g Kafka connect s3 requires more memory and some of the in house connector require more network bandwidth ( IO ) and processing power (CPU) . We were getting out of memory in worker due to one connector . This effected entire processes and we had to pause this connector. Issue :- We wanted each connector to run on specific machine (in this case , we want 3 type of machines memory , cpu and IO). Existing Solution :- We can start 3 cluster and have specific type of machine in each cluster but this is difficult to manage. Pain points :- 1. We have to consistently take care of cluster while starting machine otherwise it can start in different cluster. 2. We have to change offset storage topic otherwise we will be able to see across cluster connectors Issue Proposed :- We specify type of machine in distributed properties of each worker machine so that when we specify target machine type in connector start , It should be able to start task on exactly same type of machines. In this case we don’t have to take care of above pain points . Different type of machine will be part of same cluster. Example :- I have 4 workers with type as memory (worker 1), cpu (worker 2) and IO (worker3 and worker 4 ). a) We started connector 1 with 2 tasks and specified target machine type as cpu. It will distribute tasks equally on worker 3 and worker 4. b) We started connector 2 with 2 task with target machine type as memory . It will start both task on worker 1. I have made changes for this feature and it is working fine and we are pushing to our production cluster in few days. Please tell if it can be helpful for the larger community. Thanks, Manjeet Duhan