Regarding framing producer rate in-terms of software as well as hardware configurations

girija arumugam Mon, 30 Nov 2020 01:37:24 -0800

Team,
*Use-case :*
    *IMAP* . I have an application in which an org has users , who use IMAP
to send mails, where the mail contents are produced to kafka.

Here the scaling factors are

1. org can grow from 1 to million
2. users can grow from 1 to million.

For this use-case, I need to calculate the producer rate and broker
response rate for a single machine.

So far we have identified, the factors that will be involved in
producer-rate are :

1. Message size
2. Request size
3. Request rate overhead
4. Request latency
5. Round Trip Time
6. Number of Sender Threads
7. Number of Processor Threads at Broker
8. Replication factor

Variables identified at Network layer, Kernel, NIC :

1. sysctl_wmem
2. Tx queues
3. Ring Buffer
4. Driver Queue
5. NAPI Polling

Observations made so far :

1. SocketChannel is the one who is the entry point of sending data at
the application level.
2. sendfile() system call used to transfer the data.

*Questions* :

1. How data is transferred from SocketChannel to NIC ? (ie) The
data-flow in-terms of network(protocol) layer, kernel, network device
drivers, NIC .
2. Since, each KafkaProducer instance will create an SocketChannel.What
is the maximum number of producer instances , a machine can have to utilise
the network in an efficient manner ?
3. In-addition to the above listed variables,
1. What are the list of variables involved in sending data in the
network layer ?
2. What are the list of variables involved in sending data in the
kernel ?
3. What are the list of variables involved in sending data to NIC ?
4. How to frame the producer rate in-terms of the variables identified
in each layer ?
5. *With the given machine hardware, how to precisely frame the producer
rate in a single formula in-terms of hardware and software level ?*

Anyone, Please help me in identifying the variables and also in-corporate
those variables in a single formula to frame the producer-rate for a
machine in-terms of producer instances.

Thanks in advance.

PS : I have already came across the following documents

https://www.confluent.io/blog/how-choose-number-topics-partitions-kafka-cluster/
- https://cwiki.apache.org/confluence/display/KAFKA/Performance+testing
-

https://www.slideshare.net/JiangjieQin/producer-performance-tuning-for-apache-kafka-63147600

Regards,
Girija A.

Regarding framing producer rate in-terms of software as well as hardware configurations

Reply via email to