On Jun 29, 2016, at 11:49, Eno Thereska <eno.there...@gmail.com> wrote:
> These are internal files to RockDb. Yeah, that makes sense. However, since Streams is encapsulating/employing RocksDB, in my view it’s Streams’ responsibility to configure RocksDB well with good defaults and/or at least provide a way for me to configure it. I’d hope that people operating Streams apps wouldn’t have to learn much about operating RocksDB; ideally it would be fully or mostly encapsulated. > Depending on your load in the system I suppose they could contain quite a bit > of data. How large was the load in the system these past two weeks so we can > calibrate? I’m really not sure how to quantify it. I’m fairly new to Kafka and Streams so I’m not sure what metrics we’d use to describe load nor how to measure them. (I’d guess maybe throughput in bytes + number of records? I don’t know how to measure that… JMX? My topic’s current offset is 1.5mm so that’s approximately how many records the app has processed.) > Otherwise I'm not sure if 1-2GB is a lot or not (sounds like not that big to > make the disk full, was there something else as well that ate up space?) I agree that these specific numbers aren’t that much. My concern is that I don’t have a mental model for what’s going on here, nor for what will happen over longer periods of time — is RocksDB going to continue to generate these big files? Is there some cleanup process built in to RocksDB and/or Streams that just hasn’t kicked in yet? Is there a config setting I can use to tune this? Do I just need to set up a cron job? My longstanding impression of Kafka Streams, since it was first proposed through today, is that one of its goals is to produce apps that are easy to deploy and operate — easy in terms of time and effort, but also in cognitive load. This behavior with RocksDB log files might be a negative factor WRT that goal… of course it might also be a fluke or an error on my side ;)