[jira] [Commented] (KAFKA-3565) Producer's throughput lower with compressed data after KIP-31/32

Jiangjie Qin (JIRA) Thu, 21 Apr 2016 00:04:45 -0700

    [ 
https://issues.apache.org/jira/browse/KAFKA-3565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15251422#comment-15251422
 ]


Jiangjie Qin commented on KAFKA-3565:
-------------------------------------

[~ijuma] [~junrao] [~jkreps]

I just finished the parameterized test on the code with and without KIP31/32. 
Here is the result:
https://docs.google.com/spreadsheets/d/1DR13ng6ZMRFki9QCepvsDcT2hfE1sjodql9zR_y2XKs/edit?usp=sharing

A few explanation on the test.

The parameters I used are the following:
max.in.flight.requests.per.connection=1, 5
valueBound=500, 5000, 50000
linger.ms=0, 10, 100
recordSize=100, 1000
compression.type=gzip, snappy

*The tests ran with 1 partition, 1 replica, batch.size=500000, acks=1. 
*The 0.9 test broker and the trunk test broker were running on the same 
standalone machine. The machine was almost idle except for running the two 
brokers.
*The producers were running on another test box. The RTT between producer and 
broker is negligible (the two machines are only 3 feet away from each other)
*The test runs sequentially, i.e. there is only one producer running during 
each test.

In the test result table, the result of the trunk comes first. Out of all the 
72 runs, 0.9 wins in the following configuration combinations:
1. max.in.flight.requests.per.connection=5, valueBound=500, linger.ms=0, 
messageSize=100, compression.type=gzip
2. max.in.flight.requests.per.connection=5, valueBound=500, linger.ms=0, 
messageSize=1000, compression.type=gzip
3. max.in.flight.requests.per.connection=5, valueBound=500, linger.ms=10, 
messageSize=100, compression.type=gzip
4. max.in.flight.requests.per.connection=5, valueBound=500, linger.ms=10, 
messageSize=1000, compression.type=gzip
5. max.in.flight.requests.per.connection=5, valueBound=500, linger.ms=100, 
messageSize=1000, compression.type=gzip
6. max.in.flight.requests.per.connection=5, valueBound=5000, linger.ms=100, 
messageSize=100, compression.type=gzip

The common thing about these combination is that the valueBound is small. This 
seems indicating that the more compressible the message is the bigger negative 
impact would the 8-bytes overhead have.

There are many other interesting things can be seen from the detailed metrics 
in the table. But for now with the tests, it seems there is no obvious 
unexpected performance issue with KIP-31/32. Please let me know if you think 
there is something I missed. Thanks.


> Producer's throughput lower with compressed data after KIP-31/32
> ----------------------------------------------------------------
>
>                 Key: KAFKA-3565
>                 URL: https://issues.apache.org/jira/browse/KAFKA-3565
>             Project: Kafka
>          Issue Type: Bug
>            Reporter: Ismael Juma
>            Priority: Critical
>             Fix For: 0.10.0.0
>
>
> Relative offsets were introduced by KIP-31 so that the broker does not have 
> to recompress data (this was previously required after offsets were 
> assigned). The implicit assumption is that reducing CPU usage required by 
> recompression would mean that producer throughput for compressed data would 
> increase.
> However, this doesn't seem to be the case:
> {code}
> Commit: eee95228fabe1643baa016a2d49fb0a9fe2c66bd (one before KIP-31/32)
> test_id:    
> 2016-04-15--012.kafkatest.tests.benchmark_test.Benchmark.test_producer_throughput.topic=topic-replication-factor-three.security_protocol=PLAINTEXT.acks=1.message_size=100.compression_type=snappy
> status:     PASS
> run time:   59.030 seconds
> {"records_per_sec": 519418.343653, "mb_per_sec": 49.54}
> {code}
> Full results: https://gist.github.com/ijuma/0afada4ff51ad6a5ac2125714d748292
> {code}
> Commit: fa594c811e4e329b6e7b897bce910c6772c46c0f (KIP-31/32)
> test_id:    
> 2016-04-15--013.kafkatest.tests.benchmark_test.Benchmark.test_producer_throughput.topic=topic-replication-factor-three.security_protocol=PLAINTEXT.acks=1.message_size=100.compression_type=snappy
> status:     PASS
> run time:   1 minute 0.243 seconds
> {"records_per_sec": 427308.818848, "mb_per_sec": 40.75}
> {code}
> Full results: https://gist.github.com/ijuma/e49430f0548c4de5691ad47696f5c87d
> The difference for the uncompressed case is smaller (and within what one 
> would expect given the additional size overhead caused by the timestamp 
> field):
> {code}
> Commit: eee95228fabe1643baa016a2d49fb0a9fe2c66bd (one before KIP-31/32)
> test_id:    
> 2016-04-15--010.kafkatest.tests.benchmark_test.Benchmark.test_producer_throughput.topic=topic-replication-factor-three.security_protocol=PLAINTEXT.acks=1.message_size=100
> status:     PASS
> run time:   1 minute 4.176 seconds
> {"records_per_sec": 321018.17747, "mb_per_sec": 30.61}
> {code}
> Full results: https://gist.github.com/ijuma/5fec369d686751a2d84debae8f324d4f
> {code}
> Commit: fa594c811e4e329b6e7b897bce910c6772c46c0f (KIP-31/32)
> test_id:    
> 2016-04-15--014.kafkatest.tests.benchmark_test.Benchmark.test_producer_throughput.topic=topic-replication-factor-three.security_protocol=PLAINTEXT.acks=1.message_size=100
> status:     PASS
> run time:   1 minute 5.079 seconds
> {"records_per_sec": 291777.608696, "mb_per_sec": 27.83}
> {code}
> Full results: https://gist.github.com/ijuma/1d35bd831ff9931448b0294bd9b787ed



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (KAFKA-3565) Producer's throughput lower with compressed data after KIP-31/32

Reply via email to