[
https://issues.apache.org/jira/browse/KAFKA-3565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15246565#comment-15246565
]
Jiangjie Qin edited comment on KAFKA-3565 at 4/18/16 9:21 PM:
--------------------------------------------------------------
[~ijuma] I see. So we do know that the throughput of a single-user-thread
producer will be lower compared with 0.9, but we are trying to understand why
the throughput seems even lower than our expectation considering the amount of
overhead introduced in KIP-32.
I did the following test:
1. Launch a one broker cluster running 0.9
2. Launch another one-broker cluster running trunk
3. Using tweaked 0.9 ProducerPerformance and trunk ProducerPerformance to
producer to an 8-partition topic.
I am not able to reproduce the result you had for gzip.
{noformat}
./kafka-run-class.sh org.apache.kafka.tools.ProducerPerformance --topic
becket_test_1_replica_8_partition --num-records 500000 --record-size 1000
--throughput 100000 --valueBound 50000 --producer-props
bootstrap.servers=localhost:9092 acks=1 max.in.flight.requests.per.connection=1
compression.type=gzip batch.size=500000 client.id=becket
The result form 0.9:
500000 records sent, 3734.548306 records/sec (3.56 MB/sec), 368.73 ms avg
latency, 790.00 ms max latency, 368 ms 50th, 535 ms 95th, 597 ms 99th, 723 ms
99.9th.
The result from trunk:
500000 records sent, 11028.276501 records/sec (10.52 MB/sec), 4.08 ms avg
latency, 148.00 ms max latency, 4 ms 50th, 6 ms 95th, 9 ms 99th, 57 ms 99.9th.
{noformat}
The results of snappy with 100B messages are followling
{noformat}
./kafka-run-class.sh org.apache.kafka.tools.ProducerPerformance --topic
becket_test_1_replica_8_partition --num-records 100000000 --record-size 100
--throughput 10000 --valueBound 50000 --producer-props
bootstrap.servers=localhost:9092 acks=1 max.in.flight.requests.per.connection=1
compression.type=snappy batch.size=500000 client.id=becket
The result from 0.9:
100000000 records sent, 358709.649648 records/sec (34.21 MB/sec), 22.84 ms avg
latency, 388.00 ms max latency, 21 ms 50th, 30 ms 95th, 44 ms 99th, 237 ms
99.9th.
The result from trunk:
100000000 records sent, 272133.279995 records/sec (25.95 MB/sec), 13.96 ms avg
latency, 1057.00 ms max latency, 9 ms 50th, 26 ms 95th, 145 ms 99th, 915 ms
99.9th.
{noformat}
I took a closer look at the ProducerPerformance metrics, there are a few
differences with and w/o KIP-31/32.
1. The batch size: 212721(w) vs 475194(w/o)
2. The request rate: 134(w) vs 81(w/o)
3. record queue time: 4.9 ms(w) vs 18(w/o)
This indicates that in general the sender thread is running more iterations
after KIP31/32 due to the smaller latency from the broker on trunk (in fact I
think this is the metric we should care about most). That also means more
batches are rolled out, and more lock grabbing. Those things can impact the
throughput for a single user thread. While the throughput of a single user
thread is important, if we take the producer as a system, there are too many
factors that can affect that. One thing I notice on the producer performance is
that you have to tune it. e.g. if I change the configuration on the trunk
ProducerPerformance to batch.size=100000 and linger.ms=100. The result I got is
similar to the 0.9 result.
{{100000000 records sent, 349094.971287 records/sec (33.29 MB/sec), 25.34 ms
avg latency, 540.00 ms max latency, 23 ms 50th, 31 ms 95th, 107 ms 99th, 388 ms
99.9th.}}
I think we can say that wiht KIP31/32
1. the brokers are able to handle the requests much faster so the throughput of
the broker increased.
2. each user thread of producer might be slower because of the 8-bytes
overhead. But user can increase user threads or tune the producer to get better
throughput.
was (Author: becket_qin):
[~ijuma] I see. So we do know that the throughput of a single-user-thread
producer will be lower compared with 0.9, but we are trying to understand why
the throughput seems even lower than our expectation considering the amount of
overhead introduced in KIP-32.
I did the following test:
1. Launch a one broker cluster running 0.9
2. Launch another one-broker cluster running trunk
I am not able to reproduce the result you had for gzip.
{noformat}
./kafka-run-class.sh org.apache.kafka.tools.ProducerPerformance --topic
becket_test_1_replica_8_partition --num-records 500000 --record-size 1000
--throughput 100000 --valueBound 50000 --producer-props
bootstrap.servers=localhost:9092 acks=1 max.in.flight.requests.per.connection=1
compression.type=gzip batch.size=500000 client.id=becket
The result form 0.9:
500000 records sent, 3734.548306 records/sec (3.56 MB/sec), 368.73 ms avg
latency, 790.00 ms max latency, 368 ms 50th, 535 ms 95th, 597 ms 99th, 723 ms
99.9th.
The result from trunk:
500000 records sent, 11028.276501 records/sec (10.52 MB/sec), 4.08 ms avg
latency, 148.00 ms max latency, 4 ms 50th, 6 ms 95th, 9 ms 99th, 57 ms 99.9th.
{noformat}
Using tweaked 0.9 ProducerPerformance and trunk ProducerPerformance to producer
to an 8-partition topic.
{noformat}
./kafka-run-class.sh org.apache.kafka.tools.ProducerPerformance --topic
becket_test_1_replica_8_partition --num-records 100000000 --record-size 100
--throughput 10000 --valueBound 50000 --producer-props
bootstrap.servers=localhost:9092 acks=1 max.in.flight.requests.per.connection=1
compression.type=snappy batch.size=500000 client.id=becket
The result from 0.9:
100000000 records sent, 358709.649648 records/sec (34.21 MB/sec), 22.84 ms avg
latency, 388.00 ms max latency, 21 ms 50th, 30 ms 95th, 44 ms 99th, 237 ms
99.9th.
The result from trunk:
100000000 records sent, 272133.279995 records/sec (25.95 MB/sec), 13.96 ms avg
latency, 1057.00 ms max latency, 9 ms 50th, 26 ms 95th, 145 ms 99th, 915 ms
99.9th.
{noformat}
I took a closer look at the ProducerPerformance metrics, there are a few
differences with and w/o KIP-31/32.
1. The batch size: 212721(w) vs 475194(w/o)
2. The request rate: 134(w) vs 81(w/o)
3. record queue time: 4.9 ms(w) vs 18(w/o)
This indicates that in general the sender thread is running more iterations
after KIP31/32 due to the smaller latency from the broker on trunk (in fact I
think this is the metric we should care about most). That also means more
batches are rolled out, and more lock grabbing. Those things can impact the
throughput for a single user thread. While the throughput of a single user
thread is important, if we take the producer as a system, there are too many
factors that can affect that. One thing I notice on the producer performance is
that you have to tune it. e.g. if I change the configuration on the trunk
ProducerPerformance to batch.size=100000 and linger.ms=100. The result I got is
similar to the 0.9 result.
{{100000000 records sent, 349094.971287 records/sec (33.29 MB/sec), 25.34 ms
avg latency, 540.00 ms max latency, 23 ms 50th, 31 ms 95th, 107 ms 99th, 388 ms
99.9th.}}
I think we can say that wiht KIP31/32
1. the brokers are able to handle the requests much faster so the throughput of
the broker increased.
2. each user thread of producer might be slower because of the 8-bytes
overhead. But user can increase user threads or tune the producer to get better
throughput.
> Producer's throughput lower with compressed data after KIP-31/32
> ----------------------------------------------------------------
>
> Key: KAFKA-3565
> URL: https://issues.apache.org/jira/browse/KAFKA-3565
> Project: Kafka
> Issue Type: Bug
> Reporter: Ismael Juma
> Priority: Critical
> Fix For: 0.10.0.0
>
>
> Relative offsets were introduced by KIP-31 so that the broker does not have
> to recompress data (this was previously required after offsets were
> assigned). The implicit assumption is that reducing CPU usage required by
> recompression would mean that producer throughput for compressed data would
> increase.
> However, this doesn't seem to be the case:
> {code}
> Commit: eee95228fabe1643baa016a2d49fb0a9fe2c66bd (one before KIP-31/32)
> test_id:
> 2016-04-15--012.kafkatest.tests.benchmark_test.Benchmark.test_producer_throughput.topic=topic-replication-factor-three.security_protocol=PLAINTEXT.acks=1.message_size=100.compression_type=snappy
> status: PASS
> run time: 59.030 seconds
> {"records_per_sec": 519418.343653, "mb_per_sec": 49.54}
> {code}
> Full results: https://gist.github.com/ijuma/0afada4ff51ad6a5ac2125714d748292
> {code}
> Commit: fa594c811e4e329b6e7b897bce910c6772c46c0f (KIP-31/32)
> test_id:
> 2016-04-15--013.kafkatest.tests.benchmark_test.Benchmark.test_producer_throughput.topic=topic-replication-factor-three.security_protocol=PLAINTEXT.acks=1.message_size=100.compression_type=snappy
> status: PASS
> run time: 1 minute 0.243 seconds
> {"records_per_sec": 427308.818848, "mb_per_sec": 40.75}
> {code}
> Full results: https://gist.github.com/ijuma/e49430f0548c4de5691ad47696f5c87d
> The difference for the uncompressed case is smaller (and within what one
> would expect given the additional size overhead caused by the timestamp
> field):
> {code}
> Commit: eee95228fabe1643baa016a2d49fb0a9fe2c66bd (one before KIP-31/32)
> test_id:
> 2016-04-15--010.kafkatest.tests.benchmark_test.Benchmark.test_producer_throughput.topic=topic-replication-factor-three.security_protocol=PLAINTEXT.acks=1.message_size=100
> status: PASS
> run time: 1 minute 4.176 seconds
> {"records_per_sec": 321018.17747, "mb_per_sec": 30.61}
> {code}
> Full results: https://gist.github.com/ijuma/5fec369d686751a2d84debae8f324d4f
> {code}
> Commit: fa594c811e4e329b6e7b897bce910c6772c46c0f (KIP-31/32)
> test_id:
> 2016-04-15--014.kafkatest.tests.benchmark_test.Benchmark.test_producer_throughput.topic=topic-replication-factor-three.security_protocol=PLAINTEXT.acks=1.message_size=100
> status: PASS
> run time: 1 minute 5.079 seconds
> {"records_per_sec": 291777.608696, "mb_per_sec": 27.83}
> {code}
> Full results: https://gist.github.com/ijuma/1d35bd831ff9931448b0294bd9b787ed
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)