[ https://issues.apache.org/jira/browse/KAFKA-15264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17748938#comment-17748938 ]
jianbin.chen edited comment on KAFKA-15264 at 7/31/23 2:58 AM: --------------------------------------------------------------- 3.5.1 perf tool to 1.1.0 test no problem, of course, for the acks I set to 1, and 1.1.0 and 3.5.1 perf tool to pressure test 3.5.1 broker 4c8g * 3, the effect is the same, the jitter is very obvious, but if I reduce the pressure, for example, do not put the pressurized machine's bandwidth to play the full, just use 80% of the bandwidth, that is, 80MB * 2, a total of 2 pressurized machine, then 3.5.1 under the pressure test is very stable, and as long as the pressure to the upper limit of the bandwidth of the network card, jitter is obvious, may be this problem is because of the response of the broker slow, resulting in the subsequent sending of blocking, and 1.1.0 does not have this problem You can see that the 'bytein' on the metrics is consistent, while the in and out of the replication are not as good as 1.1.0, I don't know what the reason is, it is always jitter obvious, in addition to the deployment of 3.5.1+zk is not tested, the rest of the combination, such as 1.1.0client with 3.5.1broker, 3.5. 1client with 1.1.0broker, and 3.5.1client with 3.5.1broker, 1.1.0client and 1.1.0broker I have tested, as long as the broker version for 3.5.1, the limit pressure test jitter is greater than 1.1.0 [~ijuma] was (Author: JIRAUSER294217): 3.5.1 perf tool to 1.1.0 test no problem, of course, for the acks I set to 1, and 1.1.0 and 3.5.1 perf tool to pressure test 3.5.1 broker 4c8g * 3, the effect is the same, the jitter is very obvious, but if I reduce the pressure, for example, do not put the pressurized machine's bandwidth to play the full, just use 80% of the bandwidth, that is, 80MB * 2, a total of 2 pressurized machine, then 3.5.1 under the pressure test is very stable, and as long as the pressure to the upper limit of the bandwidth of the network card, jitter is obvious, may be this problem is because of the response of the broker slow, resulting in the subsequent sending of blocking, and 1.1.0 does not have this problem You can see that the 'bytein' on the metrics is consistent, while the in and out of the replication are not as good as 1.1.0, I don't know what the reason is, it is always jitter obvious, in addition to the deployment of 3.5.1+zk is not tested, the rest of the combination, such as 1.1.0client with 3.5.1broker, 3.5. 1client with 1.1.0broker, and 3.5.1client with 3.5.1broker, 1.1.0client and 1.1.0broker I have tested, as long as the broker version for 3.5.1, the limit pressure test jitter is greater than 1.1.0 > Compared with 1.1.0zk, the peak throughput of 3.5.1kraft is very jitter > ----------------------------------------------------------------------- > > Key: KAFKA-15264 > URL: https://issues.apache.org/jira/browse/KAFKA-15264 > Project: Kafka > Issue Type: Bug > Reporter: jianbin.chen > Priority: Major > Attachments: image-2023-07-28-09-51-01-662.png, > image-2023-07-28-09-52-38-941.png > > > I was preparing to upgrade from 1.1.0 to 3.5.1 kraft mode (new cluster > deployment), and when I recently compared and tested, I found that when using > the following stress test command, the throughput gap is obvious > > {code:java} > ./kafka-producer-perf-test.sh --topic test321 --num-records 30000000 > --record-size 1024 --throughput -1 --producer-props > bootstrap.servers=xxx:xxxx acks=1 > 419813 records sent, 83962.6 records/sec (81.99 MB/sec), 241.1 ms avg > latency, 588.0 ms max latency. > 555300 records sent, 111015.6 records/sec (108.41 MB/sec), 275.1 ms avg > latency, 460.0 ms max latency. > 552795 records sent, 110536.9 records/sec (107.95 MB/sec), 265.9 ms avg > latency, 1120.0 ms max latency. > 552600 records sent, 110520.0 records/sec (107.93 MB/sec), 284.5 ms avg > latency, 1097.0 ms max latency. > 538500 records sent, 107656.9 records/sec (105.13 MB/sec), 277.5 ms avg > latency, 610.0 ms max latency. > 511545 records sent, 102309.0 records/sec (99.91 MB/sec), 304.1 ms avg > latency, 1892.0 ms max latency. > 511890 records sent, 102337.1 records/sec (99.94 MB/sec), 288.4 ms avg > latency, 3000.0 ms max latency. > 519165 records sent, 103812.2 records/sec (101.38 MB/sec), 262.1 ms avg > latency, 1781.0 ms max latency. > 513555 records sent, 102669.9 records/sec (100.26 MB/sec), 338.2 ms avg > latency, 2590.0 ms max latency. > 463329 records sent, 92665.8 records/sec (90.49 MB/sec), 276.8 ms avg > latency, 1463.0 ms max latency. > 494248 records sent, 98849.6 records/sec (96.53 MB/sec), 327.2 ms avg > latency, 2362.0 ms max latency. > 506272 records sent, 101254.4 records/sec (98.88 MB/sec), 322.1 ms avg > latency, 2986.0 ms max latency. > 393758 records sent, 78735.9 records/sec (76.89 MB/sec), 387.0 ms avg > latency, 2958.0 ms max latency. > 426435 records sent, 85252.9 records/sec (83.25 MB/sec), 363.3 ms avg > latency, 1959.0 ms max latency. > 412560 records sent, 82298.0 records/sec (80.37 MB/sec), 374.1 ms avg > latency, 1995.0 ms max latency. > 370137 records sent, 73997.8 records/sec (72.26 MB/sec), 396.8 ms avg > latency, 1496.0 ms max latency. > 391781 records sent, 78340.5 records/sec (76.50 MB/sec), 410.7 ms avg > latency, 2446.0 ms max latency. > 355901 records sent, 71166.0 records/sec (69.50 MB/sec), 397.5 ms avg > latency, 2715.0 ms max latency. > 385410 records sent, 77082.0 records/sec (75.28 MB/sec), 417.5 ms avg > latency, 2702.0 ms max latency. > 381160 records sent, 76232.0 records/sec (74.45 MB/sec), 407.7 ms avg > latency, 1846.0 ms max latency. > 333367 records sent, 66660.1 records/sec (65.10 MB/sec), 456.2 ms avg > latency, 1414.0 ms max latency. > 376251 records sent, 75175.0 records/sec (73.41 MB/sec), 401.9 ms avg > latency, 1897.0 ms max latency. > 354434 records sent, 70886.8 records/sec (69.23 MB/sec), 425.8 ms avg > latency, 1601.0 ms max latency. > 353795 records sent, 70744.9 records/sec (69.09 MB/sec), 411.7 ms avg > latency, 1563.0 ms max latency. > 321993 records sent, 64360.0 records/sec (62.85 MB/sec), 447.3 ms avg > latency, 1975.0 ms max latency. > 404075 records sent, 80750.4 records/sec (78.86 MB/sec), 408.4 ms avg > latency, 1753.0 ms max latency. > 384526 records sent, 76905.2 records/sec (75.10 MB/sec), 406.0 ms avg > latency, 1833.0 ms max latency. > 387652 records sent, 77483.9 records/sec (75.67 MB/sec), 397.3 ms avg > latency, 1927.0 ms max latency. > 343286 records sent, 68629.7 records/sec (67.02 MB/sec), 455.6 ms avg > latency, 1685.0 ms max latency. > 333300 records sent, 66646.7 records/sec (65.08 MB/sec), 456.6 ms avg > latency, 2146.0 ms max latency. > 361191 records sent, 72238.2 records/sec (70.55 MB/sec), 409.4 ms avg > latency, 2125.0 ms max latency. > 357525 records sent, 71490.7 records/sec (69.82 MB/sec), 436.0 ms avg > latency, 1502.0 ms max latency. > 340238 records sent, 68047.6 records/sec (66.45 MB/sec), 427.9 ms avg > latency, 1932.0 ms max latency. > 390016 records sent, 77956.4 records/sec (76.13 MB/sec), 418.5 ms avg > latency, 1807.0 ms max latency. > 352830 records sent, 70523.7 records/sec (68.87 MB/sec), 439.4 ms avg > latency, 1892.0 ms max latency. > 354526 records sent, 70905.2 records/sec (69.24 MB/sec), 429.6 ms avg > latency, 2128.0 ms max latency. > 356670 records sent, 71305.5 records/sec (69.63 MB/sec), 408.9 ms avg > latency, 1329.0 ms max latency. > 309204 records sent, 60687.7 records/sec (59.27 MB/sec), 438.6 ms avg > latency, 2566.0 ms max latency. > 366715 records sent, 72316.1 records/sec (70.62 MB/sec), 474.5 ms avg > latency, 2169.0 ms max latency. > 375174 records sent, 75034.8 records/sec (73.28 MB/sec), 429.9 ms avg > latency, 1722.0 ms max latency. > 359400 records sent, 70346.4 records/sec (68.70 MB/sec), 432.1 ms avg > latency, 1961.0 ms max latency. > 312276 records sent, 62430.2 records/sec (60.97 MB/sec), 477.4 ms avg > latency, 2006.0 ms max latency. > 361875 records sent, 72360.5 records/sec (70.66 MB/sec), 441.2 ms avg > latency, 1618.0 ms max latency. > 342449 records sent, 68462.4 records/sec (66.86 MB/sec), 446.7 ms avg > latency, 2233.0 ms max latency. > 338163 records sent, 67619.1 records/sec (66.03 MB/sec), 454.4 ms avg > latency, 1839.0 ms max latency. > 369139 records sent, 73798.3 records/sec (72.07 MB/sec), 388.3 ms avg > latency, 1753.0 ms max latency. > 362476 records sent, 72495.2 records/sec (70.80 MB/sec), 438.4 ms avg > latency, 2037.0 ms max latency. > 321426 records sent, 62267.7 records/sec (60.81 MB/sec), 475.5 ms avg > latency, 2059.0 ms max latency. > 389137 records sent, 77286.4 records/sec (75.47 MB/sec), 359.7 ms avg > latency, 1547.0 ms max latency. > 298050 records sent, 59586.2 records/sec (58.19 MB/sec), 563.9 ms avg > latency, 2761.0 ms max latency. > 325530 records sent, 65028.0 records/sec (63.50 MB/sec), 503.3 ms avg > latency, 2950.0 ms max latency. > 347306 records sent, 69419.5 records/sec (67.79 MB/sec), 404.0 ms avg > latency, 2095.0 ms max latency. > 361035 records sent, 72192.6 records/sec (70.50 MB/sec), 429.5 ms avg > latency, 1698.0 ms max latency. > 334539 records sent, 66907.8 records/sec (65.34 MB/sec), 461.1 ms avg > latency, 1731.0 ms max latency. > 367423 records sent, 73455.2 records/sec (71.73 MB/sec), 433.1 ms avg > latency, 2089.0 ms max latency. > 350940 records sent, 68947.0 records/sec (67.33 MB/sec), 434.8 ms avg > latency, 1317.0 ms max latency. > 351653 records sent, 70316.5 records/sec (68.67 MB/sec), 452.0 ms avg > latency, 2948.0 ms max latency. > 298410 records sent, 58834.8 records/sec (57.46 MB/sec), 479.2 ms avg > latency, 2279.0 ms max latency. > 351750 records sent, 70350.0 records/sec (68.70 MB/sec), 460.2 ms avg > latency, 2496.0 ms max latency. > 355367 records sent, 71073.4 records/sec (69.41 MB/sec), 416.3 ms avg > latency, 2120.0 ms max latency. > 238517 records sent, 47693.9 records/sec (46.58 MB/sec), 678.9 ms avg > latency, 3072.0 ms max latency. > 362347 records sent, 72469.4 records/sec (70.77 MB/sec), 423.8 ms avg > latency, 1714.0 ms max latency. > 308901 records sent, 61767.8 records/sec (60.32 MB/sec), 490.7 ms avg > latency, 2339.0 ms max latency. > 338280 records sent, 66919.9 records/sec (65.35 MB/sec), 422.8 ms avg > latency, 1882.0 ms max latency. > 311888 records sent, 61894.8 records/sec (60.44 MB/sec), 516.1 ms avg > latency, 3857.0 ms max latency. > 319164 records sent, 63832.8 records/sec (62.34 MB/sec), 494.3 ms avg > latency, 2250.0 ms max latency. > 291160 records sent, 58197.1 records/sec (56.83 MB/sec), 468.7 ms avg > latency, 2250.0 ms max latency. > 297599 records sent, 55834.7 records/sec (54.53 MB/sec), 472.1 ms avg > latency, 3019.0 ms max latency. > 314198 records sent, 62814.5 records/sec (61.34 MB/sec), 600.0 ms avg > latency, 2863.0 ms max latency. > 332534 records sent, 66440.4 records/sec (64.88 MB/sec), 479.2 ms avg > latency, 3337.0 ms max latency. > 320974 records sent, 64194.8 records/sec (62.69 MB/sec), 470.8 ms avg > latency, 2644.0 ms max latency. > 364638 records sent, 72825.6 records/sec (71.12 MB/sec), 408.4 ms avg > latency, 2095.0 ms max latency. > 350255 records sent, 70037.0 records/sec (68.40 MB/sec), 422.9 ms avg > latency, 3059.0 ms max latency. > 342961 records sent, 68592.2 records/sec (66.98 MB/sec), 461.5 ms avg > latency, 1779.0 ms max latency. > 348809 records sent, 69733.9 records/sec (68.10 MB/sec), 454.7 ms avg > latency, 2621.0 ms max latency. > 345438 records sent, 69032.4 records/sec (67.41 MB/sec), 439.0 ms avg > latency, 2662.0 ms max latency. > 306454 records sent, 61192.9 records/sec (59.76 MB/sec), 504.6 ms avg > latency, 2513.0 ms max latency. > 300053 records sent, 59843.0 records/sec (58.44 MB/sec), 415.6 ms avg > latency, 1655.0 ms max latency. > 332067 records sent, 66413.4 records/sec (64.86 MB/sec), 527.9 ms avg > latency, 2409.0 ms max latency. > 312132 records sent, 62426.4 records/sec (60.96 MB/sec), 463.3 ms avg > latency, 2042.0 ms max latency. > 30000000 records sent, 73963.402908 records/sec (72.23 MB/sec), 410.86 ms avg > latency, 3857.00 ms max latency, 264 ms 50th, 1259 ms 95th, 2102 ms 99th, > 2955 ms 99.9th. > {code} > !image-2023-07-28-09-51-01-662.png|width=596,height=205! > And on the 1.1.0 test, I guarantee that the command is the same, it can be > said that the stress test on 1.1.0 is basically jitter-free, I have tested > many times, and the result is still the same > {code:java} > 30000000 records sent, 108280.576630 records/sec (105.74 MB/sec), 279.05 ms > avg latency, 1426.00 ms max latency, 185 ms 50th, 646 ms 95th, 758 ms 99th, > 865 ms 99.9th.{code} > !image-2023-07-28-09-52-38-941.png|width=596,height=204! > I haven't used the 3.5.1+ZK deployment method test, I will complete this > piece of test content as soon as possible, but surprisingly, the throughput > jitter under Kraft under extreme stress testing is obvious, the topic > partitions are 30, no obvious jitter traces are found on the CPU and GC, and > the 3.5.1 client to 3.5.1 broker, > 1.1.0 client to 1.1.0 broker > 4c8g*3 > 1.1.0 config > {code:java} > #### > log.cleanup.policy=delete > log.cleaner.enable=true > log.cleaner.delete.retention.ms=300000 > listeners=PLAINTEXT://:9092 > broker.id=1 > num.network.threads=5 > num.io.threads=8 > socket.send.buffer.bytes=102400 > socket.receive.buffer.bytes=102400 > socket.request.max.bytes=104857600 > message.max.bytes=5242880 > replica.fetch.max.bytes=5242880 > log.dirs=/data01/kafka110-logs > num.partitions=3 > default.replication.factor=2 > delete.topic.enable=true > auto.create.topics.enable=true > num.recovery.threads.per.data.dir=1 > offsets.topic.replication.factor=2 > transaction.state.log.replication.factor=2 > transaction.state.log.min.isr=1 > offsets.retention.minutes=1440 > log.retention.minutes=30 > log.segment.bytes=104857600 > log.retention.check.interval.ms=300000 > zookeeper.connect=/kafka110-test2 > zookeeper.connection.timeout.ms=6000 > group.initial.rebalance.delay.ms=2000 > num.replica.fetchers=1{code} > 3.5.1 conf > > {code:java} > #### > listeners=PLAINTEXT://:9092,CONTROLLER://:9093 > # Name of listener used for communication between brokers. > inter.broker.listener.name=PLAINTEXT > # Listener name, hostname and port the broker will advertise to clients. > # If not set, it uses the value for "listeners". > advertised.listeners=PLAINTEXT://10.58.16.231:9092 > # A comma-separated list of the names of the listeners used by the controller. > # If no explicit mapping set in `listener.security.protocol.map`, default > will b > e using PLAINTEXT protocol > # This is required if running in KRaft mode. > controller.listener.names=CONTROLLER > process.roles=broker,controller > broker.id=1 > num.network.threads=5 > num.io.threads=8 > socket.send.buffer.bytes=102400 > socket.receive.buffer.bytes=102400 > socket.request.max.bytes=104857600 > message.max.bytes=52428800 > replica.fetch.max.bytes=52428800 > log.dirs=/data01/kafka-logs-351 > node.id=1 > controller.quorum.voters=1@:9093,2@:9093,3@:9093 > num.partitions=3 > default.replication.factor=2 > delete.topic.enable=true > auto.create.topics.enable=false > num.recovery.threads.per.data.dir=1 > offsets.topic.replication.factor=3 > transaction.state.log.replication.factor=3 > transaction.state.log.min.isr=1 > offsets.retention.minutes=4320 > log.retention.hours=72 > log.segment.bytes=1073741824 > log.retention.check.interval.ms=300000 > num.replica.fetchers=1{code} > One thing to note is that the bandwidth of all 3 brokers is basically maxed > out, the NIC I'm using is a Gigabit NIC, and when I'm using a fixed send per > second of 20,960 and 20MB of traffic per second, there is no jitter! -- This message was sent by Atlassian Jira (v8.20.10#820010)