On Jun 29, 2016, at 11:49, Eno Thereska <eno.there...@gmail.com> wrote:

> These are internal files to RockDb.

Yeah, that makes sense.

However, since Streams is encapsulating/employing RocksDB, in my view it’s 
Streams’ responsibility to configure RocksDB well with good defaults and/or at 
least provide a way for me to configure it. I’d hope that people operating 
Streams apps wouldn’t have to learn much about operating RocksDB; ideally it 
would be fully or mostly encapsulated.

> Depending on your load in the system I suppose they could contain quite a bit 
> of data. How large was the load in the system these past two weeks so we can 
> calibrate? 

I’m really not sure how to quantify it. I’m fairly new to Kafka and Streams so 
I’m not sure what metrics we’d use to describe load nor how to measure them. 
(I’d guess maybe throughput in bytes + number of records? I don’t know how to 
measure that… JMX? My topic’s current offset is 1.5mm so that’s approximately 
how many records the app has processed.)

> Otherwise I'm not sure if 1-2GB is a lot or not (sounds like not that big to 
> make the disk full, was there something else as well that ate up space?)

I agree that these specific numbers aren’t that much. My concern is that I 
don’t have a mental model for what’s going on here, nor for what will happen 
over longer periods of time — is RocksDB going to continue to generate these 
big files? Is there some cleanup process built in to RocksDB and/or Streams 
that just hasn’t kicked in yet? Is there a config setting I can use to tune 
this? Do I just need to set up a cron job?

My longstanding impression of Kafka Streams, since it was first proposed 
through today, is that one of its goals is to produce apps that are easy to 
deploy and operate — easy in terms of time and effort, but also in cognitive 
load. This behavior with RocksDB log files might be a negative factor WRT that 
goal… of course it might also be a fluke or an error on my side ;)

Reply via email to