Hi, Kishore, Could you open a JIRA for this small SST files issue? It is good to track it s.t. we won't forget this one.
Thanks! -Yi On Thu, Dec 17, 2015 at 4:16 AM, Kishore N C <kishor...@gmail.com> wrote: > Hi Tao, > > > I am not sure what do you mean by ulimit issues > > When so many small SST files are created, I run into limits on maximum open > files (ulimit -n). > > I dug into RocksDB's (plethora of) options today and identified the option > that causes the 2.3 MB sizes: target_file_size_base > < > https://github.com/facebook/rocksdb/blob/167fb919a55e8dc5d12d4debe7965208029e3505/include/rocksdb/options.h#L396 > > > > That sets the target file size for compaction. Since that's not available > as a configuration from Samza, I had to create another > RocksDbKeyValueStorageEngineFactory and set the required options on the > RocksDB handle directly like this: > > class RocksDbBulkKeyValueStorageEngineFactory [K, V] extends > BaseKeyValueStorageEngineFactory[K, V]{ > /** > * A KeyValueStore instance optimized for bulk write and read use case > */ > override def getKVStore(storeName: String, > storeDir: File, > registry: MetricsRegistry, > changeLogSystemStreamPartition: > SystemStreamPartition, > containerContext: SamzaContainerContext): > KeyValueStore[Array[Byte], Array[Byte]] = { > val storageConfig = containerContext.config.subset("stores." + > storeName + ".", true) > val rocksDbMetrics = new KeyValueStoreMetrics(storeName, registry) > val rocksDbOptions = RocksDbKeyValueStore.options(storageConfig, > containerContext) > val rocksDbWriteOptions = new WriteOptions().setDisableWAL(true) > > > > * rocksDbOptions.setTargetFileSizeBase(Integer.MAX_VALUE) > rocksDbOptions.setMaxBytesForLevelBase(Integer.MAX_VALUE) > rocksDbOptions.setSourceCompactionFactor(Integer.MAX_VALUE)* > * rocksDbOptions.setLevelZeroSlowdownWritesTrigger(-1) // // no slowdown > at-all* > > val rocksDb = new RocksDbKeyValueStore(storeDir, rocksDbOptions, > rocksDbWriteOptions, rocksDbMetrics) > rocksDb > } > } > > This produced large SST files that's great for bulk writes and reads during > joins. > > Those are useful configurations, and should probably be exposed via Samza's > RocksDB configuration. > > Thanks, > > KN. > > > On Thu, Dec 17, 2015 at 2:53 AM, Tao Feng <fengta...@gmail.com> wrote: > > > Hi Kishore, > > > > I am not sure what do you mean by ulimit issues, could you help to > explain > > a little bit? > > > > And I am not sure if user could control the size of SST files as each SST > > file corresponds with one sorted run. My understanding is that the SST > file > > size could depends on how many memtables get flushed. In your case, if > only > > memtable is flush(64MB), the raw SST file size will be (64MB +index file > > size). The index file is used to locate data in get time. But since Samza > > by default will apply rocksdb compression(snappy), the actual SST file > size > > would be (64M+index file)* snappy compress ratio. > > > > But rocksdb compaction could also create SST files as well. I just wrote > a > > simple benchmark to mimic what you describes. > > I also observe lots of 2~3MB files are created. I am not very familiar > with > > the rocksdb compaction process. But if you take a look at your Samza > > rocksdb log in "state" directory, you could find that the > > "table_file_creation" event which corresponds with SST file creation. For > > those small SST file creation in my case, it is triggered by the > > compaction. This may be the reason why you see many small SST files. > > > > HTH, > > -Tao > > > > On Wed, Dec 16, 2015 at 5:06 AM, Kishore N C <kishor...@gmail.com> > wrote: > > > > > Hi, > > > > > > During a catch-up job that might require reprocessing of 100s of > millions > > > of records, I wanted to tweak RocksDB configuration to ensure that it's > > > optimized for bulk writes. According to the documentation here > > > < > > > > > > https://samza.apache.org/learn/documentation/0.9/jobs/configuration-table.html#task-opts > > > >, > > > setting stores.store-name.container.write.buffer.size.bytes would set > the > > > size of the memtable, and also "determines the size of RocksDB's > segment > > > files". For a job, I went ahead and set this property to 268435456 > > (256MB), > > > and verified that the configuration was correctly picked-up and > displayed > > > in the task log. However, the task still ended up creating hundreds of > > *2.3 > > > MB* SST files, eventually leading ulimit issues. There were 4 tasks > > running > > > in each container, so I would have expected SST file sizes of 64 MB, > but > > > that was not to be. > > > > > > Is my understanding of this configuration wrong? How do I control the > > size > > > of the SST files produced by RocksDB? > > > > > > Thanks, > > > > > > KN. > > > > > > > > > -- > It is our choices that show what we truly are, > far more than our abilities. >