[ 
https://issues.apache.org/jira/browse/KAFKA-2992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15059891#comment-15059891
 ] 

Ismael Juma commented on KAFKA-2992:
------------------------------------

I had a look at the code in question and I'm surprised by the findings. The 
code for `processPartitionData` follows:

{code}
      val TopicAndPartition(topic, partitionId) = topicAndPartition
      val replica = replicaMgr.getReplica(topic, partitionId).get
      val messageSet = partitionData.toByteBufferMessageSet
      warnIfMessageOversized(messageSet)

      if (fetchOffset != replica.logEndOffset.messageOffset)
        throw new RuntimeException("Offset mismatch: fetched offset = %d, log 
end offset = %d.".format(fetchOffset, replica.logEndOffset.messageOffset))
      trace("Follower %d has replica log end offset %d for partition %s. 
Received %d messages and leader hw %d"
            .format(replica.brokerId, replica.logEndOffset.messageOffset, 
topicAndPartition, messageSet.sizeInBytes, partitionData.highWatermark))
      replica.log.get.append(messageSet, assignOffsets = false)
      trace("Follower %d has replica log end offset %d after appending %d bytes 
of messages for partition %s"
            .format(replica.brokerId, replica.logEndOffset.messageOffset, 
messageSet.sizeInBytes, topicAndPartition))
      val followerHighWatermark = 
replica.logEndOffset.messageOffset.min(partitionData.highWatermark)
      // for the follower replica, we do not need to keep
      // its segment base offset the physical position,
      // these values will be computed upon making the leader
      replica.highWatermark = new LogOffsetMetadata(followerHighWatermark)
      trace("Follower %d set replica high watermark for partition [%s,%d] to %s"
            .format(replica.brokerId, topic, partitionId, 
followerHighWatermark))
{code}

There are a number of allocations there, so I don't see why the thunk 
allocations would be responsible for 98% of the allocations by object count (as 
per the original description). If we actually want to solve the issue at hand, 
I think more investigation would be needed and we could do more to reduce 
allocations. As it is though, it is unclear if there is a real issue or it was 
just a profiler artifact (personally, I never trust profiler data in isolation, 
it needs to be verified via other means too).

> Trace log statements in the replica fetcher inner loop create large amounts 
> of garbage
> --------------------------------------------------------------------------------------
>
>                 Key: KAFKA-2992
>                 URL: https://issues.apache.org/jira/browse/KAFKA-2992
>             Project: Kafka
>          Issue Type: Improvement
>          Components: core
>    Affects Versions: 0.8.2.1, 0.9.0.0
>         Environment: Centos 6, Java 1.8.0_20
>            Reporter: Cory Kolbeck
>            Priority: Minor
>              Labels: garbage, logging, trace
>             Fix For: 0.9.1.0
>
>
> We're seeing some GC pause issues in production, and during our investigation 
> found that the thunks created during invocation of three trace statements 
> guarded in the attached PR were responsible for ~98% of all allocations by 
> object count and ~90% by size. While I'm not sure that this was actually the 
> cause of our issue, it seems prudent to avoid useless allocations in a tight 
> loop.
> I realize that the trace() call does its own guarding internally, however 
> it's insufficient to prevent allocation of the thunk. I can work on getting 
> profiling results to attach here, but I used YourKit and the license has 
> since expired.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to