[ 
https://issues.apache.org/jira/browse/KAFKA-8027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17169189#comment-17169189
 ] 

Guozhang Wang commented on KAFKA-8027:
--------------------------------------

We encountered similar issues in our benchmarks which is based on recent Kafka 
versions as well. Looking at the profiler graph, there are three big buckets:

1) byte-buffer allocation for concatenating the segmented key from raw key / 
timestamp. ~10%
2) synchronization on the cache layer to access cache to get the iterator. ~20%
3) putting all the range keys into a tree-map (i.e. a putAll will be called) 
before iterating them to achieve thread safety. ~60%

Among those, I've had some ideas to optimize 1), and is still digging around 
how to make 2) / 3) to be less costly. I will try to prepare a PR in our 
benchmarks and post the results here.

> Gradual decline in performance of CachingWindowStore provider when number of 
> keys grow
> --------------------------------------------------------------------------------------
>
>                 Key: KAFKA-8027
>                 URL: https://issues.apache.org/jira/browse/KAFKA-8027
>             Project: Kafka
>          Issue Type: Bug
>          Components: streams
>    Affects Versions: 2.1.0
>            Reporter: Prashant
>            Priority: Major
>              Labels: interactivequ, kafka-streams
>
> We observed this during a performance test of our stream application which 
> tracks user's activity and provides REST interface to query the window state 
> store.  We used default configuration of Materialized i.e. withCachingEnabled 
> for storing user behaviour stats in a window state store 
> (CompositeWindowStore with CachingWindowStore as underlyin which internally 
> uses RocksDBStore for persistent).  
> While querying window store with store.fetch(key, long, long), it internally 
> tries to fetch the range from ThreadCache which uses a byte iterator to 
> search for a key in cache and on a cache miss it goes to RocksDBStore for 
> result. So, when number of keys in cache becomes large this ThreadCache 
> search starts taking time (range Iterator on all keys) which impacts 
> WindowStore query performance.
>  
> Workaround: If we disable cache with switch on Materialized instance i.e. 
> withCachingDisabled, key search is delegated directly to RocksDBStore which 
> is way faster and completed search in microseconds against millis in case of 
> CachingWindowStore.  
>  
> Stats: With Unique users > 0.5M, random search for a key i.e. UserId:
>  
> withCachingEnabled :  40 < t < 80ms (upper bound increases as unique users 
> grow)
> withCahingDisabled: t < 1ms (Almost constant time)      



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to