Hi,

During a catch-up job that might require reprocessing of 100s of millions
of records, I wanted to tweak RocksDB configuration to ensure that it's
optimized for bulk writes. According to the documentation here
<https://samza.apache.org/learn/documentation/0.9/jobs/configuration-table.html#task-opts>,
setting stores.store-name.container.write.buffer.size.bytes would set the
size of the memtable, and also "determines the size of RocksDB's segment
files". For a job, I went ahead and set this property to 268435456 (256MB),
and verified that the configuration was correctly picked-up and displayed
in the task log. However, the task still ended up creating hundreds of *2.3
MB* SST files, eventually leading ulimit issues. There were 4 tasks running
in each container, so I would have expected SST file sizes of 64 MB, but
that was not to be.

Is my understanding of this configuration wrong? How do I control the size
of the SST files produced by RocksDB?

Thanks,

KN.

Reply via email to