Hi Kishore,

I am not sure what do you mean by ulimit issues, could you help to explain
a little bit?

And I am not sure if user could control the size of SST files as each SST
file corresponds with one sorted run. My understanding is that the SST file
size could depends on how many memtables get flushed. In your case, if only
memtable is flush(64MB), the raw SST file size will be (64MB +index file
size). The index file is used to locate data in get time. But since Samza
by default will apply rocksdb compression(snappy), the actual SST file size
would be (64M+index file)* snappy compress ratio.

But rocksdb compaction could also create SST files as well. I just wrote a
simple benchmark to mimic what you describes.
I also observe lots of 2~3MB files are created. I am not very familiar with
the rocksdb compaction process. But if you take a look at your Samza
rocksdb log in "state" directory, you could find that the
"table_file_creation" event which corresponds with SST file creation. For
those small SST file creation in my case, it is triggered by the
compaction. This may be the reason why you see many small SST files.

HTH,
-Tao

On Wed, Dec 16, 2015 at 5:06 AM, Kishore N C <kishor...@gmail.com> wrote:

> Hi,
>
> During a catch-up job that might require reprocessing of 100s of millions
> of records, I wanted to tweak RocksDB configuration to ensure that it's
> optimized for bulk writes. According to the documentation here
> <
> https://samza.apache.org/learn/documentation/0.9/jobs/configuration-table.html#task-opts
> >,
> setting stores.store-name.container.write.buffer.size.bytes would set the
> size of the memtable, and also "determines the size of RocksDB's segment
> files". For a job, I went ahead and set this property to 268435456 (256MB),
> and verified that the configuration was correctly picked-up and displayed
> in the task log. However, the task still ended up creating hundreds of *2.3
> MB* SST files, eventually leading ulimit issues. There were 4 tasks running
> in each container, so I would have expected SST file sizes of 64 MB, but
> that was not to be.
>
> Is my understanding of this configuration wrong? How do I control the size
> of the SST files produced by RocksDB?
>
> Thanks,
>
> KN.
>

Reply via email to