Hi, Kishore,

Could you open a JIRA for this small SST files issue? It is good to track
it s.t. we won't forget this one.

Thanks!

-Yi

On Thu, Dec 17, 2015 at 4:16 AM, Kishore N C <kishor...@gmail.com> wrote:

> Hi Tao,
>
> > I am not sure what do you mean by ulimit issues
>
> When so many small SST files are created, I run into limits on maximum open
> files (ulimit -n).
>
> I dug into RocksDB's (plethora of) options today and identified the option
> that causes the 2.3 MB sizes: target_file_size_base
> <
> https://github.com/facebook/rocksdb/blob/167fb919a55e8dc5d12d4debe7965208029e3505/include/rocksdb/options.h#L396
> >
>
> That sets the target file size for compaction. Since that's not available
> as a configuration from Samza, I had to create another
> RocksDbKeyValueStorageEngineFactory and set the required options on the
> RocksDB handle directly like this:
>
> class RocksDbBulkKeyValueStorageEngineFactory [K, V] extends
> BaseKeyValueStorageEngineFactory[K, V]{
>   /**
>    * A KeyValueStore instance optimized for bulk write and read use case
>    */
>   override def getKVStore(storeName: String,
>                           storeDir: File,
>                           registry: MetricsRegistry,
>                           changeLogSystemStreamPartition:
> SystemStreamPartition,
>                           containerContext: SamzaContainerContext):
> KeyValueStore[Array[Byte], Array[Byte]] = {
>     val storageConfig = containerContext.config.subset("stores." +
> storeName + ".", true)
>     val rocksDbMetrics = new KeyValueStoreMetrics(storeName, registry)
>     val rocksDbOptions = RocksDbKeyValueStore.options(storageConfig,
> containerContext)
>     val rocksDbWriteOptions = new WriteOptions().setDisableWAL(true)
>
>
>
> *    rocksDbOptions.setTargetFileSizeBase(Integer.MAX_VALUE)
> rocksDbOptions.setMaxBytesForLevelBase(Integer.MAX_VALUE)
> rocksDbOptions.setSourceCompactionFactor(Integer.MAX_VALUE)*
> *    rocksDbOptions.setLevelZeroSlowdownWritesTrigger(-1) // // no slowdown
> at-all*
>
>     val rocksDb = new RocksDbKeyValueStore(storeDir, rocksDbOptions,
> rocksDbWriteOptions, rocksDbMetrics)
>     rocksDb
>   }
> }
>
> This produced large SST files that's great for bulk writes and reads during
> joins.
>
> Those are useful configurations, and should probably be exposed via Samza's
> RocksDB configuration.
>
> Thanks,
>
> KN.
>
>
> On Thu, Dec 17, 2015 at 2:53 AM, Tao Feng <fengta...@gmail.com> wrote:
>
> > Hi Kishore,
> >
> > I am not sure what do you mean by ulimit issues, could you help to
> explain
> > a little bit?
> >
> > And I am not sure if user could control the size of SST files as each SST
> > file corresponds with one sorted run. My understanding is that the SST
> file
> > size could depends on how many memtables get flushed. In your case, if
> only
> > memtable is flush(64MB), the raw SST file size will be (64MB +index file
> > size). The index file is used to locate data in get time. But since Samza
> > by default will apply rocksdb compression(snappy), the actual SST file
> size
> > would be (64M+index file)* snappy compress ratio.
> >
> > But rocksdb compaction could also create SST files as well. I just wrote
> a
> > simple benchmark to mimic what you describes.
> > I also observe lots of 2~3MB files are created. I am not very familiar
> with
> > the rocksdb compaction process. But if you take a look at your Samza
> > rocksdb log in "state" directory, you could find that the
> > "table_file_creation" event which corresponds with SST file creation. For
> > those small SST file creation in my case, it is triggered by the
> > compaction. This may be the reason why you see many small SST files.
> >
> > HTH,
> > -Tao
> >
> > On Wed, Dec 16, 2015 at 5:06 AM, Kishore N C <kishor...@gmail.com>
> wrote:
> >
> > > Hi,
> > >
> > > During a catch-up job that might require reprocessing of 100s of
> millions
> > > of records, I wanted to tweak RocksDB configuration to ensure that it's
> > > optimized for bulk writes. According to the documentation here
> > > <
> > >
> >
> https://samza.apache.org/learn/documentation/0.9/jobs/configuration-table.html#task-opts
> > > >,
> > > setting stores.store-name.container.write.buffer.size.bytes would set
> the
> > > size of the memtable, and also "determines the size of RocksDB's
> segment
> > > files". For a job, I went ahead and set this property to 268435456
> > (256MB),
> > > and verified that the configuration was correctly picked-up and
> displayed
> > > in the task log. However, the task still ended up creating hundreds of
> > *2.3
> > > MB* SST files, eventually leading ulimit issues. There were 4 tasks
> > running
> > > in each container, so I would have expected SST file sizes of 64 MB,
> but
> > > that was not to be.
> > >
> > > Is my understanding of this configuration wrong? How do I control the
> > size
> > > of the SST files produced by RocksDB?
> > >
> > > Thanks,
> > >
> > > KN.
> > >
> >
>
>
>
> --
> It is our choices that show what we truly are,
> far more than our abilities.
>

Reply via email to