Hi Eno,

Thanks for your help. Very appreciated.

Thanks
Tianji


On Wed, Mar 15, 2017 at 4:29 PM, Eno Thereska <eno.there...@gmail.com>
wrote:

> Tianji,
>
> A couple of things:
>
> - for now could you use RocksDb without the cache? I've opened a JIRA to
> verify why it's slower with the cache: https://issues.apache.org/
> jira/browse/KAFKA-4904 <https://issues.apache.org/jira/browse/KAFKA-4904>
>
> - you can tune the RocksDb performance further by increasing "its" cache
> (yes, RocksDb has a separate cache and its size is set to quite small by
> default). Look at this question on how to do that with the
> RocksDbConfigSetter: https://groups.google.com/forum/#!topic/confluent-
> platform/RgkaUy1TUno <https://groups.google.com/forum/#!topic/confluent-
> platform/RgkaUy1TUno>. This might be a bit too much to start with, but
> it's possible. You'd have to set the blockCacheSize option, for example as
> done in the openDb call in RocksDbStore.java <https://github.com/apache/
> kafka/blob/trunk/streams/src/main/java/org/apache/kafka/
> streams/state/internals/RocksDBStore.java#L115>
>
> - in summary, I'd recommend you use RocksDb as is since 7 vs 5 is a
> reasonable difference.
>
> However, the real performance will be when you actually enable logging,
> right? You might want RocksDb to be backed to Kafka for fault tolerance.
>
> Finally, make sure to use 0.10.2, the latest release.
>
> Thanks
> Eno
>
>
> > On 15 Mar 2017, at 18:14, Tianji Li <skyah...@gmail.com> wrote:
> >
> > Hi Eno,
> >
> > Rocksdb without caching took around 7 minutes.
> >
> > Tianji
> >
> >
> > On Wed, Mar 15, 2017 at 9:40 AM, Eno Thereska <eno.there...@gmail.com>
> > wrote:
> >
> >> Tianji,
> >>
> >> Could you provide a third data point, running with RocksDb, but without
> >> caching, i.e:
> >>
> >>> StateStoreSupplier stateStoreSupplier = Stores.create(storeName)
> >>>       .withKeys(stringSerde)
> >>>       .withValues(avroSerde)
> >>>       .persistent()
> >>>       .disableLogging()
> >>>       .build();
> >>
> >>
> >> Thanks
> >> Eno
> >>
> >>
> >>> On 15 Mar 2017, at 13:02, Tianji Li <skyah...@gmail.com> wrote:
> >>>
> >>> Hi there,
> >>>
> >>> It seems that the RocksDB state store is quite slow in my case and I
> >> wonder
> >>> if I did anything wrong.
> >>>
> >>> I have a topic, that I groupBy() and then aggregate() 50 times. That
> is,
> >> I
> >>> will create 50 results topics and a lot more changelog and repartition
> >>> topics.
> >>>
> >>> There are a few things that are weird and here I report one, which is
> the
> >>> State store speed.
> >>>
> >>> If I use:
> >>>
> >>>     StateStoreSupplier stateStoreSupplier = Stores.create(storeName)
> >>>       .withKeys(stringSerde)
> >>>       .withValues(avroSerde)
> >>>       .inMemory()
> >>>       .build();
> >>>
> >>> Then processing 1 millions records takes around 5 minutes on my coding
> >>> computer.
> >>>
> >>> If I use:
> >>>
> >>>     StateStoreSupplier stateStoreSupplier = Stores.create(storeName)
> >>>       .withKeys(stringSerde)
> >>>       .withValues(avroSerde)
> >>>       .persistent()
> >>>       .disableLogging()
> >>>       .enableCaching()
> >>>       .build();
> >>>
> >>> Processing the same 1 million records takes around 10 minutes.
> >>>
> >>> I believe in the first case, changelog is backed up to Kafka and in the
> >>> second case, only RocketsDB is used.
> >>>
> >>> But why the RocketsDB is so slow?
> >>>
> >>> Eventually, I am hoping to do windowed aggregation and it seems I have
> to
> >>> use RocketsDB, but given the performance, I am hesitating.
> >>>
> >>> Thanks
> >>> Tianji
> >>
> >>
>
>

Reply via email to