I just want to bring up that idea of no server side de/recompression again. Features like KAFKA-1499 <https://issues.apache.org/jira/browse/KAFKA-1499> seem to steer the project into a different direction and the fact that tickets like KAFKA-845 <https://issues.apache.org/jira/browse/KAFKA-845> are not getting much attention gives the same impression. This is something my head keeps spinning around almost 24/7 recently.

The problem I see is that CPU's are not the cheapest part of a new server and if you can spare a gigahertz or some cores by just making sure your configs are the same across all producers I would always opt for the operational overhead instead of the bigger servers. I think this will usually decrease the tco's of kafka installations.

I am currently not familiar enough with the codebase to judge if server side decompression happens before acknowledge. If so, these would be some additional milliseconds to respond faster if we could spare de/recompression.

Those are my thoughts about server side de/recompression. It would be great if I could get some responses and thoughts back.


On 07.11.2014 00:23, Jay Kreps wrote:
I suspect it is possible to save and reuse the CRCs though it might be a
bit of an invasive change. I suspect the first usage is when we are
checking the validity of the messages and the second is from when we
rebuild the compressed message set (I'm assuming you guys are using
compression because I think we optimize this out otherwise). Technically I
think the CRCs stay the same.

An alternative approach, though, would be working to remove the need for
recompression entirely on the broker side by making the offsets in the
compressed message relative to the base offset of the message set. This is
a much more invasive change but potentially better as it would also remove
the recompression done on the broker which is also CPU heavy.


On Thu, Nov 6, 2014 at 2:36 PM, Allen Wang <aw...@netflix.com.invalid>

Sure. Here is the link to the screen shot of jmc with the JTR file loaded:


On Thu, Nov 6, 2014 at 2:12 PM, Neha Narkhede <neha.narkh...@gmail.com>


Apache mailing lists don't allow attachments. Could you please link to a
pastebin or something?


On Thu, Nov 6, 2014 at 12:02 PM, Allen Wang <aw...@netflix.com.invalid>

After digging more into the stack trace got from flight recorder (which
attached), it seems that Kafka ( can optimize the usage of
The stack trace shows that Crc32 is invoked twice from Log.append().
is from the line number 231:

val appendInfo = analyzeAndValidateMessageSet(messages)

The second time is from line 252 in the same function:

validMessages = validMessages.assignOffsets(offset, appendInfo.codec)

If one of the Crc32 invocation can be eliminated, we are looking at
at least 7% of CPU usage.


On Wed, Nov 5, 2014 at 6:32 PM, Allen Wang <aw...@netflix.com> wrote:


Using flight recorder, we have observed high CPU usage of CRC32
(kafka.utils.Crc32.update()) on Kafka broker. It uses as much as 25%
on an instance. Tracking down stack trace, this method is invoked by

Is there any tuning we can do to reduce this?

Also on the topic of CPU utilization, we observed that overall CPU
utilization is proportional to AllTopicsBytesInPerSec metric. Does
metric include incoming replication traffic?


Reply via email to