Re: Kafka Stream: RocksDBKeyValueStoreSupplier performance

Eno Thereska Wed, 15 Mar 2017 13:36:16 -0700

Tianji,

A couple of things:


- for now could you use RocksDb without the cache? I've opened a JIRA to verify 
why it's slower with the cache: 
https://issues.apache.org/jira/browse/KAFKA-4904 
<https://issues.apache.org/jira/browse/KAFKA-4904> 

- you can tune the RocksDb performance further by increasing "its" cache (yes, 
RocksDb has a separate cache and its size is set to quite small by default). 
Look at this question on how to do that with the RocksDbConfigSetter: 
https://groups.google.com/forum/#!topic/confluent-platform/RgkaUy1TUno 
<https://groups.google.com/forum/#!topic/confluent-platform/RgkaUy1TUno>. This 
might be a bit too much to start with, but it's possible. You'd have to set the 
blockCacheSize option, for example as done in the openDb call in 
RocksDbStore.java 
<https://github.com/apache/kafka/blob/trunk/streams/src/main/java/org/apache/kafka/streams/state/internals/RocksDBStore.java#L115>

- in summary, I'd recommend you use RocksDb as is since 7 vs 5 is a reasonable 
difference.

However, the real performance will be when you actually enable logging, right? 
You might want RocksDb to be backed to Kafka for fault tolerance.

Finally, make sure to use 0.10.2, the latest release.

Thanks
Eno


> On 15 Mar 2017, at 18:14, Tianji Li <skyah...@gmail.com> wrote:
> 
> Hi Eno,
> 
> Rocksdb without caching took around 7 minutes.
> 
> Tianji
> 
> 
> On Wed, Mar 15, 2017 at 9:40 AM, Eno Thereska <eno.there...@gmail.com>
> wrote:
> 
>> Tianji,
>> 
>> Could you provide a third data point, running with RocksDb, but without
>> caching, i.e:
>> 
>>> StateStoreSupplier stateStoreSupplier = Stores.create(storeName)
>>>       .withKeys(stringSerde)
>>>       .withValues(avroSerde)
>>>       .persistent()
>>>       .disableLogging()
>>>       .build();
>> 
>> 
>> Thanks
>> Eno
>> 
>> 
>>> On 15 Mar 2017, at 13:02, Tianji Li <skyah...@gmail.com> wrote:
>>> 
>>> Hi there,
>>> 
>>> It seems that the RocksDB state store is quite slow in my case and I
>> wonder
>>> if I did anything wrong.
>>> 
>>> I have a topic, that I groupBy() and then aggregate() 50 times. That is,
>> I
>>> will create 50 results topics and a lot more changelog and repartition
>>> topics.
>>> 
>>> There are a few things that are weird and here I report one, which is the
>>> State store speed.
>>> 
>>> If I use:
>>> 
>>>     StateStoreSupplier stateStoreSupplier = Stores.create(storeName)
>>>       .withKeys(stringSerde)
>>>       .withValues(avroSerde)
>>>       .inMemory()
>>>       .build();
>>> 
>>> Then processing 1 millions records takes around 5 minutes on my coding
>>> computer.
>>> 
>>> If I use:
>>> 
>>>     StateStoreSupplier stateStoreSupplier = Stores.create(storeName)
>>>       .withKeys(stringSerde)
>>>       .withValues(avroSerde)
>>>       .persistent()
>>>       .disableLogging()
>>>       .enableCaching()
>>>       .build();
>>> 
>>> Processing the same 1 million records takes around 10 minutes.
>>> 
>>> I believe in the first case, changelog is backed up to Kafka and in the
>>> second case, only RocketsDB is used.
>>> 
>>> But why the RocketsDB is so slow?
>>> 
>>> Eventually, I am hoping to do windowed aggregation and it seems I have to
>>> use RocketsDB, but given the performance, I am hesitating.
>>> 
>>> Thanks
>>> Tianji
>> 
>>

Re: Kafka Stream: RocksDBKeyValueStoreSupplier performance

Reply via email to