[ https://issues.apache.org/jira/browse/KAFKA-3973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15420195#comment-15420195 ]
Ben Manes commented on KAFKA-3973: ---------------------------------- Measuring on both put() and removeEldestEntry() is slow, as you will perform 4 measurements (key + value) per insertion on a full cache. It is also error prone if the objects change, e.g. lazy initialized field, which would cause the byte tracking to skew. The safest approach is to weigh the entry only on insertion and retain that with the value for eviction. If the user modifies the cached objects it may not reflect the size any longer, but does not corrupt the eviction policy. That change should improve your benchmark scores. > Investigate feasibility of caching bytes vs. records > ---------------------------------------------------- > > Key: KAFKA-3973 > URL: https://issues.apache.org/jira/browse/KAFKA-3973 > Project: Kafka > Issue Type: Sub-task > Components: streams > Reporter: Eno Thereska > Assignee: Bill Bejeck > Fix For: 0.10.1.0 > > Attachments: MemBytesBenchmark.txt > > > Currently the cache stores and accounts for records, not bytes or objects. > This investigation would be around measuring any performance overheads that > come from storing bytes or objects. As an outcome we should know whether 1) > we should store bytes or 2) we should store objects. > If we store objects, the cache still needs to know their size (so that it can > know if the object fits in the allocated cache space, e.g., if the cache is > 100MB and the object is 10MB, we'd have space for 10 such objects). The > investigation needs to figure out how to find out the size of the object > efficiently in Java. > If we store bytes, then we are serialising an object into bytes before > caching it, i.e., we take a serialisation cost. The investigation needs > measure how bad this cost can be especially for the case when all objects fit > in cache (and thus any extra serialisation cost would show). -- This message was sent by Atlassian JIRA (v6.3.4#6332)