[jira] [Commented] (KAFKA-3565) Producer's throughput lower with compressed data after KIP-31/32

Jiangjie Qin (JIRA) Thu, 05 May 2016 09:54:28 -0700

    [ 
https://issues.apache.org/jira/browse/KAFKA-3565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15272619#comment-15272619
 ]


Jiangjie Qin commented on KAFKA-3565:
-------------------------------------

[~junrao] I ran the tests again with more data and it looks the result is 
stable now. The results are updated in run 13-15.

Most of the results are similar or reasonable between trunk and 0.9. But the 
difference between trunk and 0.9 is still bigger than expected in the following 
two cases. Especially in the first case where the message size is 1000. I am 
not sure if this is related but this discrepancy only shows up when compression 
codec is snappy and value bound is 500. I can run snappy decompression test to 
see if that is the issue. 

{noformat}
max.in.flight.requests.per.connection=1, valueBound=500, linger.ms=100000, 
messageSize=1000, compression.type=snappy

start.time, end.time, data.consumed.in.MB, MB.sec, data.consumed.in.nMsg, 
nMsg.sec end.time, data.consumed.in.MB, MB.sec, data.consumed.in.nMsg, nMsg.sec
22:54:28:955, 22:54:41:990, 953.6743, 73.1626, 1000000, 76716.5324 
23:19:08:786, 23:19:19:701, 953.6743, 87.3728, 1000000, 91617.0408
----------------------
start.time, end.time, data.consumed.in.MB, MB.sec, data.consumed.in.nMsg, 
nMsg.sec end.time, data.consumed.in.MB, MB.sec, data.consumed.in.nMsg, nMsg.sec
00:35:27:626, 00:35:40:751, 953.6743, 72.6609, 1000000, 76190.4762 
00:59:55:306, 01:00:06:217, 953.6743, 87.4048, 1000000, 91650.6278
----------------------
start.time, end.time, data.consumed.in.MB, MB.sec, data.consumed.in.nMsg, 
nMsg.sec end.time, data.consumed.in.MB, MB.sec, data.consumed.in.nMsg, nMsg.sec
23:45:07:404, 23:45:20:463, 953.6743, 73.0281, 1000000, 76575.5418 
00:09:32:282, 00:09:43:315, 953.6743, 86.4384, 1000000, 90637.1794

{noformat}

and 

{noformat}
max.in.flight.requests.per.connection=1, valueBound=500, linger.ms=100000, 
messageSize=100, compression.type=snappy

start.time, end.time, data.consumed.in.MB, MB.sec, data.consumed.in.nMsg, 
nMsg.sec end.time, data.consumed.in.MB, MB.sec, data.consumed.in.nMsg, nMsg.sec
22:51:24:002, 22:51:43:158, 953.6743, 49.7846, 10000000, 522029.6513 
23:14:43:458, 23:14:59:696, 953.6743, 58.7310, 10000000, 615839.3891
----------------------
start.time, end.time, data.consumed.in.MB, MB.sec, data.consumed.in.nMsg, 
nMsg.sec end.time, data.consumed.in.MB, MB.sec, data.consumed.in.nMsg, nMsg.sec
00:32:23:976, 00:32:43:008, 953.6743, 50.1090, 10000000, 525430.8533 
00:55:30:602, 00:55:46:507, 953.6743, 59.9607, 10000000, 628733.1028
----------------------
start.time, end.time, data.consumed.in.MB, MB.sec, data.consumed.in.nMsg, 
nMsg.sec end.time, data.consumed.in.MB, MB.sec, data.consumed.in.nMsg, nMsg.sec
23:42:03:559, 23:42:22:788, 953.6743, 49.5956, 10000000, 520047.8444 
00:05:09:039, 00:05:25:073, 953.6743, 59.4783, 10000000, 623674.6913

{noformat}

> Producer's throughput lower with compressed data after KIP-31/32
> ----------------------------------------------------------------
>
>                 Key: KAFKA-3565
>                 URL: https://issues.apache.org/jira/browse/KAFKA-3565
>             Project: Kafka
>          Issue Type: Bug
>            Reporter: Ismael Juma
>            Priority: Critical
>             Fix For: 0.10.0.0
>
>
> Relative offsets were introduced by KIP-31 so that the broker does not have 
> to recompress data (this was previously required after offsets were 
> assigned). The implicit assumption is that reducing CPU usage required by 
> recompression would mean that producer throughput for compressed data would 
> increase.
> However, this doesn't seem to be the case:
> {code}
> Commit: eee95228fabe1643baa016a2d49fb0a9fe2c66bd (one before KIP-31/32)
> test_id:    
> 2016-04-15--012.kafkatest.tests.benchmark_test.Benchmark.test_producer_throughput.topic=topic-replication-factor-three.security_protocol=PLAINTEXT.acks=1.message_size=100.compression_type=snappy
> status:     PASS
> run time:   59.030 seconds
> {"records_per_sec": 519418.343653, "mb_per_sec": 49.54}
> {code}
> Full results: https://gist.github.com/ijuma/0afada4ff51ad6a5ac2125714d748292
> {code}
> Commit: fa594c811e4e329b6e7b897bce910c6772c46c0f (KIP-31/32)
> test_id:    
> 2016-04-15--013.kafkatest.tests.benchmark_test.Benchmark.test_producer_throughput.topic=topic-replication-factor-three.security_protocol=PLAINTEXT.acks=1.message_size=100.compression_type=snappy
> status:     PASS
> run time:   1 minute 0.243 seconds
> {"records_per_sec": 427308.818848, "mb_per_sec": 40.75}
> {code}
> Full results: https://gist.github.com/ijuma/e49430f0548c4de5691ad47696f5c87d
> The difference for the uncompressed case is smaller (and within what one 
> would expect given the additional size overhead caused by the timestamp 
> field):
> {code}
> Commit: eee95228fabe1643baa016a2d49fb0a9fe2c66bd (one before KIP-31/32)
> test_id:    
> 2016-04-15--010.kafkatest.tests.benchmark_test.Benchmark.test_producer_throughput.topic=topic-replication-factor-three.security_protocol=PLAINTEXT.acks=1.message_size=100
> status:     PASS
> run time:   1 minute 4.176 seconds
> {"records_per_sec": 321018.17747, "mb_per_sec": 30.61}
> {code}
> Full results: https://gist.github.com/ijuma/5fec369d686751a2d84debae8f324d4f
> {code}
> Commit: fa594c811e4e329b6e7b897bce910c6772c46c0f (KIP-31/32)
> test_id:    
> 2016-04-15--014.kafkatest.tests.benchmark_test.Benchmark.test_producer_throughput.topic=topic-replication-factor-three.security_protocol=PLAINTEXT.acks=1.message_size=100
> status:     PASS
> run time:   1 minute 5.079 seconds
> {"records_per_sec": 291777.608696, "mb_per_sec": 27.83}
> {code}
> Full results: https://gist.github.com/ijuma/1d35bd831ff9931448b0294bd9b787ed



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (KAFKA-3565) Producer's throughput lower with compressed data after KIP-31/32

Reply via email to