Hi Frank, If you are running on a single node then the RocksDB state should be re-used by your app. However, it relies on the app being cleanly shutdown and the existence of ".checkpoint" files in the state directory for the store, .i.e, /tmp/kafka-streams/application-id/0_0/.checkpoint. If the file doesn't exist then the entire state will be restored from the changelog - which could take some time. I suspect this is what is happening?
As for the RocksDB memory settings, yes the off heap memory usage does sneak under the radar. There is a memory management story for Kafka Streams that is yet to be started. This would involve limiting the off-heap memory that RocksDB uses. Thanks, Damian On Fri, 25 Nov 2016 at 21:14 Frank Lyaruu <flya...@gmail.com> wrote: > I'm running all on a single node, so there is no 'data mobility' involved. > So if Streams does not use any existing data, I might as well wipe the > whole RocksDb before starting, right? > > As for the RocksDb tuning, I am using a RocksDBConfigSetter, to reduce the > memory usage a bit: > > options.setWriteBufferSize(3000000); > options.setMaxBytesForLevelBase(30000000); > options.setMaxBytesForLevelMultiplier(3); > > I needed to do this as my 16Gb machine would die otherwise but I honestly > was just reducing values more or less randomly until it wouldn't fall over. > I have to say this is a big drawback of Rocks, I monitor Java memory usage > but this just sneaks under the radar as it is off heap, and it isn't very > clear what the implications are of different settings, as I can't says > something like the Xmx heap setting, meaning: Take whatever you need up to > this maximum. Also, if I get this right, in the long run, as the data set > changes and grows, I can never be sure it won't take too much memory. > > I get the impression I'll be better off with an external store, something I > can monitor, tune and restart separately. > > But I'm getting ahead of myself. I'll wipe the data before I start, see if > that gets me any stability > > > > > On Fri, Nov 25, 2016 at 4:54 PM, Damian Guy <damian....@gmail.com> wrote: > > > Hi Frank, > > > > If you have run the app before with the same applicationId, completely > shut > > it down, and then restarted it again, it will need to restore all of the > > state which will take some time depending on the amount of data you have. > > In this case the placement of the partitions doesn't take into account > any > > existing state stores, so it might need to load quite a lot of data if > > nodes assigned certain partitions don't have that state-store (this is > > something we should look at improving). > > > > As for RocksDB tuning - you can provide an implementation of > > RocksDBConfigSetter via config: StreamsConfig.ROCKSDB_CONFIG_SETTER_CLASS > > it has a single method: > > > > public void setConfig(final String storeName, final Options options, > > final Map<String, Object> configs) > > > > in this method you can set various options on the provided Options > object. > > The options that might help in this case are: > > options.setWriteBufferSize(..) - default in streams is 32MB > > options.setMaxWriteBufferNumer(..) - default in streams is 3 > > > > However, i'm no expert on RocksDB and i suggest you have look at > > https://github.com/facebook/rocksdb/wiki/RocksDB-Tuning-Guide for more > > info. > > > > Thanks, > > Damian > > > > On Fri, 25 Nov 2016 at 13:02 Frank Lyaruu <flya...@gmail.com> wrote: > > > > > @Damian: > > > > > > Yes, it ran before, and it has that 200gb blob worth of Rocksdb stuff > > > > > > @Svente: It's on a pretty high end san in a managed private cloud, I'm > > > unsure what the ultimate storage is, but I doubt there is a performance > > > problem there. > > > > > > On Fri, 25 Nov 2016 at 13:37, Svante Karlsson <svante.karls...@csi.se> > > > wrote: > > > > > > > What kind of disk are you using for the rocksdb store? ie spinning or > > > ssd? > > > > > > > > 2016-11-25 12:51 GMT+01:00 Damian Guy <damian....@gmail.com>: > > > > > > > > > Hi Frank, > > > > > > > > > > Is this on a restart of the application? > > > > > > > > > > Thanks, > > > > > Damian > > > > > > > > > > On Fri, 25 Nov 2016 at 11:09 Frank Lyaruu <flya...@gmail.com> > wrote: > > > > > > > > > > > Hi y'all, > > > > > > > > > > > > I have a reasonably simple KafkaStream application, which merges > > > about > > > > 20 > > > > > > topics a few times. > > > > > > The thing is, some of those topic datasets are pretty big, about > > 10M > > > > > > messages. In total I've got > > > > > > about 200Gb worth of state in RocksDB, the largest topic is 38 > Gb. > > > > > > > > > > > > I had set the MAX_POLL_INTERVAL_MS_CONFIG to one hour to cover > the > > > > > > initialization time, > > > > > > but that does not seem nearly enough, I'm looking at more than > two > > > hour > > > > > > startup times, and > > > > > > that starts to be a bit ridiculous. > > > > > > > > > > > > Any tips / experiences on how to deal with this case? Move away > > from > > > > > Rocks > > > > > > and use an external > > > > > > data store? Any tuning tips on how to tune Rocks to be a bit more > > > > useful > > > > > > here? > > > > > > > > > > > > regards, Frank > > > > > > > > > > > > > > > > > > > > >