[ https://issues.apache.org/jira/browse/KAFKA-3973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15411821#comment-15411821 ]
Bill Bejeck edited comment on KAFKA-3973 at 8/8/16 4:45 PM: ------------------------------------------------------------ I used JMH to benchmark the performance of caching bytes vs object (tracking by memory size using jamm) here are the results: EDIT: Needed to refactor tests, and use Bytes to wrap byte array for keys in cache Run complete. Total time: 00:02:42 Benchmark Mode Cnt Score Error Units MemoryBytesCacheBenchmark.testCacheByMemory thrpt 40 251002.444 ± 20683.129 ops/s MemoryBytesCacheBenchmark.testCacheBySizeBytes thrpt 40 1477170.674 ± 12772.196 ops/s After refactoring the JMH test the gap between tracking by memory and serialization has close, but it still appears that serialization has the advantage. The test used for benchmarking will be included in the PR for KAFKA-3989 (coming soon). was (Author: bbejeck): I used JMH to benchmark the performance of caching bytes vs object (tracking by memory size using jamm) here are the results: EDIT: New results from updated test # Run complete. Total time: 00:02:41 Benchmark Mode Cnt Score Error Units MemoryBytesCacheBenchmark.testCacheByMemory thrpt 40 536694.504 ± 4177.019 ops/s MemoryBytesCacheBenchmark.testCacheBySizeBytes thrpt 40 4713360.286 ± 60874.723 ops/s Using JMH it still appears that serialization has the advantage. The test used for benchmarking will be included in the PR for KAFKA-3989 (coming soon). > Investigate feasibility of caching bytes vs. records > ---------------------------------------------------- > > Key: KAFKA-3973 > URL: https://issues.apache.org/jira/browse/KAFKA-3973 > Project: Kafka > Issue Type: Sub-task > Components: streams > Reporter: Eno Thereska > Assignee: Bill Bejeck > Fix For: 0.10.1.0 > > Attachments: CachingPerformanceBenchmarks.java, MemoryLRUCache.java > > > Currently the cache stores and accounts for records, not bytes or objects. > This investigation would be around measuring any performance overheads that > come from storing bytes or objects. As an outcome we should know whether 1) > we should store bytes or 2) we should store objects. > If we store objects, the cache still needs to know their size (so that it can > know if the object fits in the allocated cache space, e.g., if the cache is > 100MB and the object is 10MB, we'd have space for 10 such objects). The > investigation needs to figure out how to find out the size of the object > efficiently in Java. > If we store bytes, then we are serialising an object into bytes before > caching it, i.e., we take a serialisation cost. The investigation needs > measure how bad this cost can be especially for the case when all objects fit > in cache (and thus any extra serialisation cost would show). -- This message was sent by Atlassian JIRA (v6.3.4#6332)