[jira] [Comment Edited] (KAFKA-7012) Performance issue upgrading to kafka 1.0.1 or 1.1

radai rosenblatt (JIRA) Fri, 15 Jun 2018 13:49:35 -0700


    [ 
https://issues.apache.org/jira/browse/KAFKA-7012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16513037#comment-16513037
 ]


radai rosenblatt edited comment on KAFKA-7012 at 6/15/18 8:48 PM:
------------------------------------------------------------------

i dont have the time to pick this up right now.

IIRC the original PR (https://github.com/apache/kafka/pull/2330) had a more 
complicated condition for when a key (channel) gets picked into 
`keysWithBufferedRead`. the codition is currently
{code}
                if (channel.hasBytesBuffered()) {
                    //this channel has bytes enqueued in intermediary buffers 
that we could not read
                    //(possibly because no memory). it may be the case that the 
underlying socket will
                    //not come up in the next poll() and so we need to remember 
this channel for the
                    //next poll call otherwise data may be stuck in said 
buffers forever.
                    keysWithBufferedRead.add(key);
                }
{code}
which results in lots of "false positives" - keys that have something left over 
in ssl buffers (likely, since request sizes are rarely a multiple of ssl cipher 
block sizes) that cause the next poll() cycle to be inefficient.

the conditions needs to check if a channel has something left *that could not 
be read out due to memory pressure*.

alternatively - the doomsday scenario this is meant to handle is pretty rare: 
if a channel has a request fully inside the ssl buffers that cannot be read due 
to memory pressure, *and* the underlying channel will never have any more 
incoming bytes (so will never come back from select) the request will sit there 
and rot resulting in a client timeout.

the alternative to making the condition more complicated is not treating this 
case at all and suffering the (rare?) timeout?


was (Author: radai):
i dont have the time to pick this up right now.

IIRC the original PR (https://github.com/apache/kafka/pull/2330) had a more 
complicated condition for when a key (channel) gets picked into 
`keysWithBufferedRead`. the codition is currently
{code}
                if (channel.hasBytesBuffered()) {
                    //this channel has bytes enqueued in intermediary buffers 
that we could not read
                    //(possibly because no memory). it may be the case that the 
underlying socket will
                    //not come up in the next poll() and so we need to remember 
this channel for the
                    //next poll call otherwise data may be stuck in said 
buffers forever.
                    keysWithBufferedRead.add(key);
                }
{code}
which results in lots of "false positives" - keys that have something left over 
in ssl buffers (likely, since request sizes are rarely a multiple of ssl cipher 
block sizes) that cause the next poll() cycle to be inefficient.

the conditions needs to check if a channel has something left *that could not 
be read out due to memory pressure*.

alternatively - the doomsday scenario this is meant to handle is pretty rare: 
if a channel has a request fully inside the ssl buffers that cannot be read due 
to memory pressure, *and* the underlying channel will never have any more 
incoming bytes (so will never come back from select) the request will sit there 
anr rot resulting in a client timeout.

the alternative to making the condition more complicated is not treating this 
case at all and suffering the (rare?) timeout?

> Performance issue upgrading to kafka 1.0.1 or 1.1
> -------------------------------------------------
>
>                 Key: KAFKA-7012
>                 URL: https://issues.apache.org/jira/browse/KAFKA-7012
>             Project: Kafka
>          Issue Type: Bug
>    Affects Versions: 1.1.0, 1.0.1
>            Reporter: rajadayalan perumalsamy
>            Assignee: praveen
>            Priority: Major
>              Labels: regression
>         Attachments: Commit-47ee8e954-0607-bufferkeys-nopoll-profile.png, 
> Commit-47ee8e954-0607-memory.png, Commit-47ee8e954-0607-profile.png, 
> Commit-47ee8e954-profile.png, Commit-47ee8e954-profile2.png, 
> Commit-f15cdbc91b-profile.png, Commit-f15cdbc91b-profile2.png
>
>
> We are trying to upgrade kafka cluster from Kafka 0.11.0.1 to Kafka 1.0.1. 
> After upgrading 1 node on the cluster, we notice that network threads use 
> most of the cpu. It is a 3 node cluster with 15k messages/sec on each node. 
> With Kafka 0.11.0.1 typical usage of the servers is around 50 to 60% 
> vcpu(using less than 1 vcpu). After upgrade we are noticing that cpu usage is 
> high depending on the number of network threads used. If networks threads is 
> set to 8, then the cpu usage is around 850%(9 vcpus) and if it is set to 4 
> then the cpu usage is around 450%(5 vcpus). Using the same kafka 
> server.properties for both.
> Did further analysis with git bisect, couple of build and deploys, traced the 
> issue to commit 47ee8e954df62b9a79099e944ec4be29afe046f6. CPU usage is fine 
> for commit f15cdbc91b240e656d9a2aeb6877e94624b21f8d. But with commit 
> 47ee8e954df62b9a79099e944ec4be29afe046f6 cpu usage has increased. Have 
> attached screenshots of profiling done with both the commits. Screenshot 
> Commit-f15cdbc91b-profile shows less cpu usage by network threads and 
> Screenshots Commit-47ee8e954-profile and Commit-47ee8e954-profile2 show 
> higher cpu usage(almost entire cpu usage) by network threads. Also noticed 
> that kafka.network.Processor.poll() method is invoked 10 times more with 
> commit 47ee8e954df62b9a79099e944ec4be29afe046f6.
> We need the issue to be resolved to upgrade the cluster. Please let me know 
> if you need any additional information.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Comment Edited] (KAFKA-7012) Performance issue upgrading to kafka 1.0.1 or 1.1

Reply via email to