Team,
*Use-case :*
    *IMAP* . I have an application in which an org has users , who use IMAP
to send mails, where the mail contents are produced to kafka.

Here the scaling factors are

   1. org can grow from 1 to million
   2. users can grow from 1 to million.

For this use-case, I need to calculate the producer rate and broker
response rate for a single machine.

So far we have identified, the factors that will be involved in
producer-rate are :

   1. Message size
   2. Request size
   3. Request rate overhead
   4. Request latency
   5. Round Trip Time
   6. Number of Sender Threads
   7. Number of Processor Threads at Broker
   8. Replication factor

Variables identified at Network layer, Kernel, NIC :

   1. sysctl_wmem
   2. Tx queues
   3. Ring Buffer
   4. Driver Queue
   5. NAPI Polling

Observations made so far :

   1. SocketChannel is the one who is the entry point of sending data at
   the application level.
   2. sendfile() system call used to transfer the data.

*Questions* :

   1. How data is transferred from SocketChannel to NIC ? (ie) The
   data-flow in-terms of network(protocol) layer, kernel, network device
   drivers, NIC .
   2. Since, each KafkaProducer instance will create an SocketChannel.What
   is the maximum number of producer instances , a machine can have to utilise
   the network in an efficient manner ?
   3. In-addition to the above listed variables,
      1. What are the list of variables involved in sending data in the
      network layer ?
      2. What are the list of variables involved in sending data in the
      kernel ?
      3. What are the list of variables involved in sending data to NIC ?
   4. How to frame the producer rate in-terms of the variables identified
   in each layer ?
   5. *With the given machine hardware, how to precisely frame the producer
   rate in a single formula in-terms of hardware and software level ?*


Anyone, Please help me in identifying the variables and also in-corporate
those variables in a single formula to frame the producer-rate for a
machine in-terms of producer instances.

Thanks in advance.

PS : I have already came across the following documents

   -
   
https://www.confluent.io/blog/how-choose-number-topics-partitions-kafka-cluster/
   - https://cwiki.apache.org/confluence/display/KAFKA/Performance+testing
   -
   
https://www.slideshare.net/JiangjieQin/producer-performance-tuning-for-apache-kafka-63147600


Regards,
Girija A.

Reply via email to