[ https://issues.apache.org/jira/browse/KAFKA-3565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15249601#comment-15249601 ]
Ismael Juma edited comment on KAFKA-3565 at 4/20/16 10:19 AM: -------------------------------------------------------------- I ran the tests a couple of times with various settings to check if my previous results are reprocible and included three linger_ms parameters (0ms, 10ms, 100ms). I paste below one configuration and will follow-up with full results. Test name and base parameters: test_producer_throughput with replication_factor=3, message_size=100, replication_factor=3, num_producers=1, acks=1 Additional parameters: linger_ms=0 Run 1 {code} no compression 0.9.0.1: {"records_per_sec": 315361.137218, "mb_per_sec": 30.08} no compression trunk: {"records_per_sec": 297798.313734, "mb_per_sec": 28.4} snappy 0.9.0.1: {"records_per_sec": 553246.908491, "mb_per_sec": 52.76} snappy trunk: {"records_per_sec": 577280.430108, "mb_per_sec": 55.05} gzip 0.9.0.1: {"records_per_sec": 77354.44643, "mb_per_sec": 7.38} gzip trunk: {"records_per_sec": 62830.118903, "mb_per_sec": 5.99} {code} Run 2 {code} no compression 0.9.0.1: {"records_per_sec": 315955.037665, "mb_per_sec": 30.13} no compression trunk: {"records_per_sec": 300464.965301, "mb_per_sec": 28.65} snappy 0.9.0.1: {"records_per_sec": 613146.185473, "mb_per_sec": 58.47} snappy trunk: {"records_per_sec": 566080.556727, "mb_per_sec": 53.99} gzip 0.9.0.1: {"records_per_sec": 79531.701825, "mb_per_sec": 7.58} gzip trunk: {"records_per_sec": 64608.501011, "mb_per_sec": 6.16} {code} Additional parameters: linger_ms=10 Run 1 {code} no compression 0.9.0.1: {"records_per_sec": 321710.690316, "mb_per_sec": 30.68} no compression trunk: {"records_per_sec": 295894.400353, "mb_per_sec": 28.22} snappy 0.9.0.1: {"records_per_sec": 626892.573564, "mb_per_sec": 59.79} snappy trunk: {"records_per_sec": 583555.217391, "mb_per_sec": 55.65} gzip 0.9.0.1: {"records_per_sec": 101564.66137, "mb_per_sec": 9.69} gzip trunk: {"records_per_sec": 93290.957114, "mb_per_sec": 8.9} {code} Run 2 {code} no compression 0.9.0.1: {"records_per_sec": 322871.541977, "mb_per_sec": 30.79} no compression trunk: {"records_per_sec": 297139.03033, "mb_per_sec": 28.34} snappy 0.9.0.1: {"records_per_sec": 655040.019522, "mb_per_sec": 62.47} snappy trunk: {"records_per_sec": 584571.864111, "mb_per_sec": 55.75} gzip 0.9.0.1: {"records_per_sec": 106699.817156, "mb_per_sec": 10.18} gzip trunk: {"records_per_sec": 93577.145646, "mb_per_sec": 8.92} {code} Additional parameters: linger_ms=100 Run 1 {code} no compression 0.9.0.1: {"records_per_sec": 318958.412548, "mb_per_sec": 30.42} no compression trunk: {"records_per_sec": 289574.325782, "mb_per_sec": 27.62} snappy 0.9.0.1: {"records_per_sec": 654401.267674, "mb_per_sec": 62.41} snappy trunk: {"records_per_sec": 533244.735797, "mb_per_sec": 50.85} gzip 0.9.0.1: {"records_per_sec": 108845.754602, "mb_per_sec": 10.38} gzip trunk: {"records_per_sec": 95630.708942, "mb_per_sec": 9.12} {code} Run 2 {code} no compression 0.9.0.1: {"records_per_sec": 322561.163182, "mb_per_sec": 30.76} no compression trunk: {"records_per_sec": 291524.10947, "mb_per_sec": 27.8} snappy 0.9.0.1: {"records_per_sec": 626599.906629, "mb_per_sec": 59.76} snappy trunk: {"records_per_sec": 568719.067797, "mb_per_sec": 54.24} gzip 0.9.0.1: {"records_per_sec": 108660.70272, "mb_per_sec": 10.36} gzip trunk: {"records_per_sec": 94786.511299, "mb_per_sec": 9.04} {code} was (Author: ijuma): I ran the tests a couple of times with various settings to check if my previous results are reprocible and included three linger_ms parameters (0ms, 10ms, 100ms). I paste below one configuration and will follow-up with full results. Test name and base parameters: test_producer_throughput with replication_factor=3, message_size=100, replication_factor=3, num_producers=1, acks=1 Additional parameters: linger_ms=0 Run 1 no compression 0.9.0.1: {"records_per_sec": 315361.137218, "mb_per_sec": 30.08} no compression trunk: {"records_per_sec": 297798.313734, "mb_per_sec": 28.4} snappy 0.9.0.1: {"records_per_sec": 553246.908491, "mb_per_sec": 52.76} snappy trunk: {"records_per_sec": 577280.430108, "mb_per_sec": 55.05} gzip 0.9.0.1: {"records_per_sec": 77354.44643, "mb_per_sec": 7.38} gzip trunk: {"records_per_sec": 62830.118903, "mb_per_sec": 5.99} Run 2 no compression 0.9.0.1: {"records_per_sec": 315955.037665, "mb_per_sec": 30.13} no compression trunk: {"records_per_sec": 300464.965301, "mb_per_sec": 28.65} snappy 0.9.0.1: {"records_per_sec": 613146.185473, "mb_per_sec": 58.47} snappy trunk: {"records_per_sec": 566080.556727, "mb_per_sec": 53.99} gzip 0.9.0.1: {"records_per_sec": 79531.701825, "mb_per_sec": 7.58} gzip trunk: {"records_per_sec": 64608.501011, "mb_per_sec": 6.16} Additional parameters: linger_ms=10 Run 1 no compression 0.9.0.1: {"records_per_sec": 321710.690316, "mb_per_sec": 30.68} no compression trunk: {"records_per_sec": 295894.400353, "mb_per_sec": 28.22} snappy 0.9.0.1: {"records_per_sec": 626892.573564, "mb_per_sec": 59.79} snappy trunk: {"records_per_sec": 583555.217391, "mb_per_sec": 55.65} gzip 0.9.0.1: {"records_per_sec": 101564.66137, "mb_per_sec": 9.69} gzip trunk: {"records_per_sec": 93290.957114, "mb_per_sec": 8.9} Run 2 no compression 0.9.0.1: {"records_per_sec": 322871.541977, "mb_per_sec": 30.79} no compression trunk: {"records_per_sec": 297139.03033, "mb_per_sec": 28.34} snappy 0.9.0.1: {"records_per_sec": 655040.019522, "mb_per_sec": 62.47} snappy trunk: {"records_per_sec": 584571.864111, "mb_per_sec": 55.75} gzip 0.9.0.1: {"records_per_sec": 106699.817156, "mb_per_sec": 10.18} gzip trunk: {"records_per_sec": 93577.145646, "mb_per_sec": 8.92} Additional parameters: linger_ms=100 Run 1 no compression 0.9.0.1: {"records_per_sec": 318958.412548, "mb_per_sec": 30.42} no compression trunk: {"records_per_sec": 289574.325782, "mb_per_sec": 27.62} snappy 0.9.0.1: {"records_per_sec": 654401.267674, "mb_per_sec": 62.41} snappy trunk: {"records_per_sec": 533244.735797, "mb_per_sec": 50.85} gzip 0.9.0.1: {"records_per_sec": 108845.754602, "mb_per_sec": 10.38} gzip trunk: {"records_per_sec": 95630.708942, "mb_per_sec": 9.12} Run 2 no compression 0.9.0.1: {"records_per_sec": 322561.163182, "mb_per_sec": 30.76} no compression trunk: {"records_per_sec": 291524.10947, "mb_per_sec": 27.8} snappy 0.9.0.1: {"records_per_sec": 626599.906629, "mb_per_sec": 59.76} snappy trunk: {"records_per_sec": 568719.067797, "mb_per_sec": 54.24} gzip 0.9.0.1: {"records_per_sec": 108660.70272, "mb_per_sec": 10.36} gzip trunk: {"records_per_sec": 94786.511299, "mb_per_sec": 9.04} > Producer's throughput lower with compressed data after KIP-31/32 > ---------------------------------------------------------------- > > Key: KAFKA-3565 > URL: https://issues.apache.org/jira/browse/KAFKA-3565 > Project: Kafka > Issue Type: Bug > Reporter: Ismael Juma > Priority: Critical > Fix For: 0.10.0.0 > > > Relative offsets were introduced by KIP-31 so that the broker does not have > to recompress data (this was previously required after offsets were > assigned). The implicit assumption is that reducing CPU usage required by > recompression would mean that producer throughput for compressed data would > increase. > However, this doesn't seem to be the case: > {code} > Commit: eee95228fabe1643baa016a2d49fb0a9fe2c66bd (one before KIP-31/32) > test_id: > 2016-04-15--012.kafkatest.tests.benchmark_test.Benchmark.test_producer_throughput.topic=topic-replication-factor-three.security_protocol=PLAINTEXT.acks=1.message_size=100.compression_type=snappy > status: PASS > run time: 59.030 seconds > {"records_per_sec": 519418.343653, "mb_per_sec": 49.54} > {code} > Full results: https://gist.github.com/ijuma/0afada4ff51ad6a5ac2125714d748292 > {code} > Commit: fa594c811e4e329b6e7b897bce910c6772c46c0f (KIP-31/32) > test_id: > 2016-04-15--013.kafkatest.tests.benchmark_test.Benchmark.test_producer_throughput.topic=topic-replication-factor-three.security_protocol=PLAINTEXT.acks=1.message_size=100.compression_type=snappy > status: PASS > run time: 1 minute 0.243 seconds > {"records_per_sec": 427308.818848, "mb_per_sec": 40.75} > {code} > Full results: https://gist.github.com/ijuma/e49430f0548c4de5691ad47696f5c87d > The difference for the uncompressed case is smaller (and within what one > would expect given the additional size overhead caused by the timestamp > field): > {code} > Commit: eee95228fabe1643baa016a2d49fb0a9fe2c66bd (one before KIP-31/32) > test_id: > 2016-04-15--010.kafkatest.tests.benchmark_test.Benchmark.test_producer_throughput.topic=topic-replication-factor-three.security_protocol=PLAINTEXT.acks=1.message_size=100 > status: PASS > run time: 1 minute 4.176 seconds > {"records_per_sec": 321018.17747, "mb_per_sec": 30.61} > {code} > Full results: https://gist.github.com/ijuma/5fec369d686751a2d84debae8f324d4f > {code} > Commit: fa594c811e4e329b6e7b897bce910c6772c46c0f (KIP-31/32) > test_id: > 2016-04-15--014.kafkatest.tests.benchmark_test.Benchmark.test_producer_throughput.topic=topic-replication-factor-three.security_protocol=PLAINTEXT.acks=1.message_size=100 > status: PASS > run time: 1 minute 5.079 seconds > {"records_per_sec": 291777.608696, "mb_per_sec": 27.83} > {code} > Full results: https://gist.github.com/ijuma/1d35bd831ff9931448b0294bd9b787ed -- This message was sent by Atlassian JIRA (v6.3.4#6332)