Hi Eno, Thanks for your help. Very appreciated.
Thanks Tianji On Wed, Mar 15, 2017 at 4:29 PM, Eno Thereska <eno.there...@gmail.com> wrote: > Tianji, > > A couple of things: > > - for now could you use RocksDb without the cache? I've opened a JIRA to > verify why it's slower with the cache: https://issues.apache.org/ > jira/browse/KAFKA-4904 <https://issues.apache.org/jira/browse/KAFKA-4904> > > - you can tune the RocksDb performance further by increasing "its" cache > (yes, RocksDb has a separate cache and its size is set to quite small by > default). Look at this question on how to do that with the > RocksDbConfigSetter: https://groups.google.com/forum/#!topic/confluent- > platform/RgkaUy1TUno <https://groups.google.com/forum/#!topic/confluent- > platform/RgkaUy1TUno>. This might be a bit too much to start with, but > it's possible. You'd have to set the blockCacheSize option, for example as > done in the openDb call in RocksDbStore.java <https://github.com/apache/ > kafka/blob/trunk/streams/src/main/java/org/apache/kafka/ > streams/state/internals/RocksDBStore.java#L115> > > - in summary, I'd recommend you use RocksDb as is since 7 vs 5 is a > reasonable difference. > > However, the real performance will be when you actually enable logging, > right? You might want RocksDb to be backed to Kafka for fault tolerance. > > Finally, make sure to use 0.10.2, the latest release. > > Thanks > Eno > > > > On 15 Mar 2017, at 18:14, Tianji Li <skyah...@gmail.com> wrote: > > > > Hi Eno, > > > > Rocksdb without caching took around 7 minutes. > > > > Tianji > > > > > > On Wed, Mar 15, 2017 at 9:40 AM, Eno Thereska <eno.there...@gmail.com> > > wrote: > > > >> Tianji, > >> > >> Could you provide a third data point, running with RocksDb, but without > >> caching, i.e: > >> > >>> StateStoreSupplier stateStoreSupplier = Stores.create(storeName) > >>> .withKeys(stringSerde) > >>> .withValues(avroSerde) > >>> .persistent() > >>> .disableLogging() > >>> .build(); > >> > >> > >> Thanks > >> Eno > >> > >> > >>> On 15 Mar 2017, at 13:02, Tianji Li <skyah...@gmail.com> wrote: > >>> > >>> Hi there, > >>> > >>> It seems that the RocksDB state store is quite slow in my case and I > >> wonder > >>> if I did anything wrong. > >>> > >>> I have a topic, that I groupBy() and then aggregate() 50 times. That > is, > >> I > >>> will create 50 results topics and a lot more changelog and repartition > >>> topics. > >>> > >>> There are a few things that are weird and here I report one, which is > the > >>> State store speed. > >>> > >>> If I use: > >>> > >>> StateStoreSupplier stateStoreSupplier = Stores.create(storeName) > >>> .withKeys(stringSerde) > >>> .withValues(avroSerde) > >>> .inMemory() > >>> .build(); > >>> > >>> Then processing 1 millions records takes around 5 minutes on my coding > >>> computer. > >>> > >>> If I use: > >>> > >>> StateStoreSupplier stateStoreSupplier = Stores.create(storeName) > >>> .withKeys(stringSerde) > >>> .withValues(avroSerde) > >>> .persistent() > >>> .disableLogging() > >>> .enableCaching() > >>> .build(); > >>> > >>> Processing the same 1 million records takes around 10 minutes. > >>> > >>> I believe in the first case, changelog is backed up to Kafka and in the > >>> second case, only RocketsDB is used. > >>> > >>> But why the RocketsDB is so slow? > >>> > >>> Eventually, I am hoping to do windowed aggregation and it seems I have > to > >>> use RocketsDB, but given the performance, I am hesitating. > >>> > >>> Thanks > >>> Tianji > >> > >> > >