Adding few application related configurations which can affect producer rate,
- linger.ms - batch.size - buffer.memory - acks - compression - num.io.threads - num.network.threads On Mon, Nov 30, 2020 at 3:07 PM girija arumugam <girijaarumuga...@gmail.com> wrote: > Team, > *Use-case :* > *IMAP* . I have an application in which an org has users , who use > IMAP to send mails, where the mail contents are produced to kafka. > > Here the scaling factors are > > 1. org can grow from 1 to million > 2. users can grow from 1 to million. > > For this use-case, I need to calculate the producer rate and broker > response rate for a single machine. > > So far we have identified, the factors that will be involved in > producer-rate are : > > 1. Message size > 2. Request size > 3. Request rate overhead > 4. Request latency > 5. Round Trip Time > 6. Number of Sender Threads > 7. Number of Processor Threads at Broker > 8. Replication factor > > Variables identified at Network layer, Kernel, NIC : > > 1. sysctl_wmem > 2. Tx queues > 3. Ring Buffer > 4. Driver Queue > 5. NAPI Polling > > Observations made so far : > > 1. SocketChannel is the one who is the entry point of sending data at > the application level. > 2. sendfile() system call used to transfer the data. > > *Questions* : > > 1. How data is transferred from SocketChannel to NIC ? (ie) The > data-flow in-terms of network(protocol) layer, kernel, network device > drivers, NIC . > 2. Since, each KafkaProducer instance will create an > SocketChannel.What is the maximum number of producer instances , a machine > can have to utilise the network in an efficient manner ? > 3. In-addition to the above listed variables, > 1. What are the list of variables involved in sending data in the > network layer ? > 2. What are the list of variables involved in sending data in the > kernel ? > 3. What are the list of variables involved in sending data to NIC ? > 4. How to frame the producer rate in-terms of the variables identified > in each layer ? > 5. *With the given machine hardware, how to precisely frame the > producer rate in a single formula in-terms of hardware and software level > ?* > > > Anyone, Please help me in identifying the variables and also in-corporate > those variables in a single formula to frame the producer-rate for a > machine in-terms of producer instances. > > Thanks in advance. > > PS : I have already came across the following documents > > - > > https://www.confluent.io/blog/how-choose-number-topics-partitions-kafka-cluster/ > - https://cwiki.apache.org/confluence/display/KAFKA/Performance+testing > - > > https://www.slideshare.net/JiangjieQin/producer-performance-tuning-for-apache-kafka-63147600 > > > Regards, > Girija A. > > >