Re: [PR] [WIP] KAFKA-19589: Reduce number of events generated in AsyncKafkaConsumer.updateFetchPositions() [kafka]

via GitHub Sat, 09 Aug 2025 20:38:30 -0700


kirktrue commented on PR #20324:
URL: https://github.com/apache/kafka/pull/20324#issuecomment-3172338727


   # Results for August 9, 2025 testing
   
   ## Metrics
   
   The metrics shown below come from three separate test runs:
   
   1. Test run `CLASSIC` uses the `CLASSIC` group protocol from `trunk`
   2. Test run `CONSUMER` uses the `CONSUMER` group protocol, also from `trunk`
   3. Test run `CONSUMER (branch)` uses the `CONSUMER` group protocol from my 
current development branch 
(`KAFKA-19589-reduce-events-in-update-fetch-positions`)
   
   Results:
   
   | Metric                    |                   CLASSIC |          CONSUMER 
(trunk) |         CONSUMER (branch) |
   
|:--------------------------|--------------------------:|--------------------------:|--------------------------:|
   | bytes-consumed-rate       |             162857862.759 |              
48074181.128 |              77237286.006 |
   | bytes-consumed-total      |          104199838490.000 |          
104199838490.000 |          104199838490.000 |
   | fetch-latency-avg         |                    27.563 |                    
47.571 |                    25.268 |
   | fetch-latency-max         |                   502.000 |                   
507.000 |                   506.000 |
   | fetch-rate                |                   161.195 |                    
51.055 |                    78.003 |
   | fetch-size-avg            |               1023854.536 |               
1023506.132 |               1023205.444 |
   | fetch-size-max            |               1033664.000 |               
1033664.000 |               1033664.000 |
   | fetch-throttle-time-avg   |                     0.000 |                    
 0.000 |                     0.000 |
   | fetch-throttle-time-max   |                     0.000 |                    
 0.000 |                     0.000 |
   | fetch-total               |                101908.000 |                
102036.000 |                101959.000 |
   | records-consumed-rate     |                312587.069 |                 
92272.900 |                148248.150 |
   | records-consumed-total    |             199999690.000 |             
199999690.000 |             199999690.000 |
   | records-lag-max           |               4484207.000 |               
1806989.000 |               2578731.000 |
   | records-lead-min          |              29035357.000 |              
32266785.000 |              31534419.000 |
   | records-per-request-avg   |                  1965.172 |                  
1964.503 |                  1963.926 |
   | commit-sync-time-ns-total |                     0.000 |                    
 0.000 |                     0.000 |
   | committed-time-ns-total   |                     0.000 |                    
 0.000 |                     0.000 |
   | incoming-byte-rate        |             163876206.099 |              
48227380.969 |              77529673.969 |
   | incoming-byte-total       |          104604394434.000 |          
104604441109.000 |          104622899962.000 |
   | io-ratio                  |                     0.244 |                    
 0.092 |                     0.146 |
   | io-time-ns-avg            |                  3865.850 |                  
2363.981 |                  3871.983 |
   | io-time-ns-total          |          157231567818.000 |          
217749720755.000 |          195497217580.000 |
   | io-wait-ratio             |                     0.303 |                    
 0.523 |                     0.503 |
   | io-wait-time-ns-avg       |                  4784.395 |                 
13368.476 |                 13311.075 |
   | io-wait-time-ns-total     |          111080698186.000 |         
1148273922014.000 |          655479798988.000 |
   | last-poll-seconds-ago     |                     0.000 |                    
 0.000 |                     0.000 |
   | network-io-rate           |                  2213.585 |                   
689.148 |                  1100.397 |
   | network-io-total          |               1385915.000 |               
1426966.000 |               1522117.000 |
   | outgoing-byte-rate        |                 17912.864 |                  
5518.740 |                  8596.430 |
   | outgoing-byte-total       |              11354966.000 |              
11442918.000 |              11387680.000 |
   | poll-idle-ratio-avg       |                       NaN |                    
   NaN |                       NaN |
   | request-rate              |                   162.060 |                    
51.471 |                    78.488 |
   | request-size-avg          |                   110.532 |                   
107.220 |                   109.526 |
   | request-size-max          |                   238.000 |                   
203.000 |                   203.000 |
   | request-total             |                102231.000 |                
102940.000 |                102512.000 |
   | response-rate             |                   162.013 |                    
51.441 |                    78.424 |
   | response-total            |                102226.000 |                
102935.000 |                102507.000 |
   | select-rate               |                 63238.109 |                 
39094.742 |                 37796.679 |
   | select-total              |              40007185.000 |              
87649140.000 |              50222746.000 |
   | time-between-poll-avg     |                     0.016 |                    
 0.054 |                     0.034 |
   | time-between-poll-max     |                   100.000 |                   
100.000 |                    84.000 |
   | Average CPU load          |                    92.23% |                   
146.39% |                   149.89% |
   | P99 CPU load              |                    93.50% |                   
148.00% |                   151.00% |
   | Max CPU load              |                    93.50% |                   
148.00% |                   151.00% |
   
   ## Execution
   
   The test is run on a single AWS instance targeting a six-node cluster. The 
data on the topic is pre-populated with 200,000,000 messages of 512 bytes each 
before any of the tests are run. A single “warm up” run of the performance test 
is executed before executing the three test runs mentioned above.
   
   The command to execute the test is:
   
   ```bash
   kafka-consumer-perf-test.sh \
     --bootstrap-server $BOOTSTRAP_SERVER \
     --topic $TOPIC_NAME \
     --messages $NUM_MESSAGES \
     --consumer.config conf/consumer-$GROUP_PROTOCOL.properties \
     --show-detailed-stats \
     --print-metrics
   ```
   
   `NUM_MESSAGES` is set to `200000000` (200 million).
   
   ## Client configuration
   
   The consumer configuration sets `max.poll.records=5`, but is otherwise 
mostly defaults apart from connectivity/authentication:
   
   ```
   bootstrap.servers=$BOOTSTRAP_SERVER
   sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule 
required
     username="$USER_ID"
     password='$USER_PASSWORD' ;
   security.protocol=SASL_SSL
   sasl.mechanism=PLAIN
   ssl.endpoint.identification.algorithm=
   client.dns.lookup=use_all_dns_ips
   
   # Key
   max.poll.records=5
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [WIP] KAFKA-19589: Reduce number of events generated in AsyncKafkaConsumer.updateFetchPositions() [kafka]

Reply via email to