[ 
https://issues.apache.org/jira/browse/KAFKA-3565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15246565#comment-15246565
 ] 

Jiangjie Qin edited comment on KAFKA-3565 at 4/18/16 9:21 PM:
--------------------------------------------------------------

[~ijuma] I see. So we do know that the throughput of a single-user-thread 
producer will be lower compared with 0.9, but we are trying to understand why 
the throughput seems even lower than our expectation considering the amount of 
overhead introduced in KIP-32.

I did the following test:
1. Launch a one broker cluster running 0.9
2. Launch another one-broker cluster running trunk
3. Using tweaked 0.9 ProducerPerformance and trunk ProducerPerformance to 
producer to an 8-partition topic.

I am not able to reproduce the result you had for gzip.
{noformat}
./kafka-run-class.sh org.apache.kafka.tools.ProducerPerformance --topic 
becket_test_1_replica_8_partition --num-records 500000 --record-size 1000 
--throughput 100000 --valueBound 50000 --producer-props 
bootstrap.servers=localhost:9092 acks=1 max.in.flight.requests.per.connection=1 
compression.type=gzip batch.size=500000 client.id=becket

The result form 0.9:
500000 records sent, 3734.548306 records/sec (3.56 MB/sec), 368.73 ms avg 
latency, 790.00 ms max latency, 368 ms 50th, 535 ms 95th, 597 ms 99th, 723 ms 
99.9th.

The result from trunk:
500000 records sent, 11028.276501 records/sec (10.52 MB/sec), 4.08 ms avg 
latency, 148.00 ms max latency, 4 ms 50th, 6 ms 95th, 9 ms 99th, 57 ms 99.9th.
{noformat}

The results of snappy with 100B messages are followling
{noformat}
./kafka-run-class.sh org.apache.kafka.tools.ProducerPerformance --topic 
becket_test_1_replica_8_partition --num-records 100000000 --record-size 100 
--throughput 10000 --valueBound 50000 --producer-props 
bootstrap.servers=localhost:9092 acks=1 max.in.flight.requests.per.connection=1 
compression.type=snappy batch.size=500000 client.id=becket

The result from 0.9:
100000000 records sent, 358709.649648 records/sec (34.21 MB/sec), 22.84 ms avg 
latency, 388.00 ms max latency, 21 ms 50th, 30 ms 95th, 44 ms 99th, 237 ms 
99.9th.

The result from trunk:
100000000 records sent, 272133.279995 records/sec (25.95 MB/sec), 13.96 ms avg 
latency, 1057.00 ms max latency, 9 ms 50th, 26 ms 95th, 145 ms 99th, 915 ms 
99.9th.
{noformat}

I took a closer look at the ProducerPerformance metrics, there are a few 
differences with and w/o KIP-31/32.
1. The batch size: 212721(w) vs 475194(w/o)
2. The request rate: 134(w) vs 81(w/o)
3. record queue time: 4.9 ms(w) vs 18(w/o)

This indicates that in general the sender thread is running more iterations 
after KIP31/32 due to the smaller latency from the broker on trunk (in fact I 
think this is the metric we should care about most). That also means more 
batches are rolled out, and more lock grabbing. Those things can impact the 
throughput for a single user thread. While the throughput of a single user 
thread is important, if we take the producer as a system, there are too many 
factors that can affect that. One thing I notice on the producer performance is 
that you have to tune it. e.g. if I change the configuration on the trunk 
ProducerPerformance to batch.size=100000 and linger.ms=100. The result I got is 
similar to the 0.9 result. 
{{100000000 records sent, 349094.971287 records/sec (33.29 MB/sec), 25.34 ms 
avg latency, 540.00 ms max latency, 23 ms 50th, 31 ms 95th, 107 ms 99th, 388 ms 
99.9th.}}

I think we can say that wiht KIP31/32
1. the brokers are able to handle the requests much faster so the throughput of 
the broker increased.
2. each user thread of producer might be slower because of the 8-bytes 
overhead. But user can increase user threads or tune the producer to get better 
throughput.


was (Author: becket_qin):
[~ijuma] I see. So we do know that the throughput of a single-user-thread 
producer will be lower compared with 0.9, but we are trying to understand why 
the throughput seems even lower than our expectation considering the amount of 
overhead introduced in KIP-32.

I did the following test:
1. Launch a one broker cluster running 0.9
2. Launch another one-broker cluster running trunk

I am not able to reproduce the result you had for gzip.
{noformat}
./kafka-run-class.sh org.apache.kafka.tools.ProducerPerformance --topic 
becket_test_1_replica_8_partition --num-records 500000 --record-size 1000 
--throughput 100000 --valueBound 50000 --producer-props 
bootstrap.servers=localhost:9092 acks=1 max.in.flight.requests.per.connection=1 
compression.type=gzip batch.size=500000 client.id=becket

The result form 0.9:
500000 records sent, 3734.548306 records/sec (3.56 MB/sec), 368.73 ms avg 
latency, 790.00 ms max latency, 368 ms 50th, 535 ms 95th, 597 ms 99th, 723 ms 
99.9th.

The result from trunk:
500000 records sent, 11028.276501 records/sec (10.52 MB/sec), 4.08 ms avg 
latency, 148.00 ms max latency, 4 ms 50th, 6 ms 95th, 9 ms 99th, 57 ms 99.9th.
{noformat}

Using tweaked 0.9 ProducerPerformance and trunk ProducerPerformance to producer 
to an 8-partition topic.

{noformat}
./kafka-run-class.sh org.apache.kafka.tools.ProducerPerformance --topic 
becket_test_1_replica_8_partition --num-records 100000000 --record-size 100 
--throughput 10000 --valueBound 50000 --producer-props 
bootstrap.servers=localhost:9092 acks=1 max.in.flight.requests.per.connection=1 
compression.type=snappy batch.size=500000 client.id=becket

The result from 0.9:
100000000 records sent, 358709.649648 records/sec (34.21 MB/sec), 22.84 ms avg 
latency, 388.00 ms max latency, 21 ms 50th, 30 ms 95th, 44 ms 99th, 237 ms 
99.9th.

The result from trunk:
100000000 records sent, 272133.279995 records/sec (25.95 MB/sec), 13.96 ms avg 
latency, 1057.00 ms max latency, 9 ms 50th, 26 ms 95th, 145 ms 99th, 915 ms 
99.9th.
{noformat}

I took a closer look at the ProducerPerformance metrics, there are a few 
differences with and w/o KIP-31/32.
1. The batch size: 212721(w) vs 475194(w/o)
2. The request rate: 134(w) vs 81(w/o)
3. record queue time: 4.9 ms(w) vs 18(w/o)

This indicates that in general the sender thread is running more iterations 
after KIP31/32 due to the smaller latency from the broker on trunk (in fact I 
think this is the metric we should care about most). That also means more 
batches are rolled out, and more lock grabbing. Those things can impact the 
throughput for a single user thread. While the throughput of a single user 
thread is important, if we take the producer as a system, there are too many 
factors that can affect that. One thing I notice on the producer performance is 
that you have to tune it. e.g. if I change the configuration on the trunk 
ProducerPerformance to batch.size=100000 and linger.ms=100. The result I got is 
similar to the 0.9 result. 
{{100000000 records sent, 349094.971287 records/sec (33.29 MB/sec), 25.34 ms 
avg latency, 540.00 ms max latency, 23 ms 50th, 31 ms 95th, 107 ms 99th, 388 ms 
99.9th.}}

I think we can say that wiht KIP31/32
1. the brokers are able to handle the requests much faster so the throughput of 
the broker increased.
2. each user thread of producer might be slower because of the 8-bytes 
overhead. But user can increase user threads or tune the producer to get better 
throughput.

> Producer's throughput lower with compressed data after KIP-31/32
> ----------------------------------------------------------------
>
>                 Key: KAFKA-3565
>                 URL: https://issues.apache.org/jira/browse/KAFKA-3565
>             Project: Kafka
>          Issue Type: Bug
>            Reporter: Ismael Juma
>            Priority: Critical
>             Fix For: 0.10.0.0
>
>
> Relative offsets were introduced by KIP-31 so that the broker does not have 
> to recompress data (this was previously required after offsets were 
> assigned). The implicit assumption is that reducing CPU usage required by 
> recompression would mean that producer throughput for compressed data would 
> increase.
> However, this doesn't seem to be the case:
> {code}
> Commit: eee95228fabe1643baa016a2d49fb0a9fe2c66bd (one before KIP-31/32)
> test_id:    
> 2016-04-15--012.kafkatest.tests.benchmark_test.Benchmark.test_producer_throughput.topic=topic-replication-factor-three.security_protocol=PLAINTEXT.acks=1.message_size=100.compression_type=snappy
> status:     PASS
> run time:   59.030 seconds
> {"records_per_sec": 519418.343653, "mb_per_sec": 49.54}
> {code}
> Full results: https://gist.github.com/ijuma/0afada4ff51ad6a5ac2125714d748292
> {code}
> Commit: fa594c811e4e329b6e7b897bce910c6772c46c0f (KIP-31/32)
> test_id:    
> 2016-04-15--013.kafkatest.tests.benchmark_test.Benchmark.test_producer_throughput.topic=topic-replication-factor-three.security_protocol=PLAINTEXT.acks=1.message_size=100.compression_type=snappy
> status:     PASS
> run time:   1 minute 0.243 seconds
> {"records_per_sec": 427308.818848, "mb_per_sec": 40.75}
> {code}
> Full results: https://gist.github.com/ijuma/e49430f0548c4de5691ad47696f5c87d
> The difference for the uncompressed case is smaller (and within what one 
> would expect given the additional size overhead caused by the timestamp 
> field):
> {code}
> Commit: eee95228fabe1643baa016a2d49fb0a9fe2c66bd (one before KIP-31/32)
> test_id:    
> 2016-04-15--010.kafkatest.tests.benchmark_test.Benchmark.test_producer_throughput.topic=topic-replication-factor-three.security_protocol=PLAINTEXT.acks=1.message_size=100
> status:     PASS
> run time:   1 minute 4.176 seconds
> {"records_per_sec": 321018.17747, "mb_per_sec": 30.61}
> {code}
> Full results: https://gist.github.com/ijuma/5fec369d686751a2d84debae8f324d4f
> {code}
> Commit: fa594c811e4e329b6e7b897bce910c6772c46c0f (KIP-31/32)
> test_id:    
> 2016-04-15--014.kafkatest.tests.benchmark_test.Benchmark.test_producer_throughput.topic=topic-replication-factor-three.security_protocol=PLAINTEXT.acks=1.message_size=100
> status:     PASS
> run time:   1 minute 5.079 seconds
> {"records_per_sec": 291777.608696, "mb_per_sec": 27.83}
> {code}
> Full results: https://gist.github.com/ijuma/1d35bd831ff9931448b0294bd9b787ed



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to