Is it smart enough to coordinate with the other partitions to ensure not more than 25% (just a plug number) of the partitions are compacting at the same time? It would seem to me there's the possibility for a performance drop if you had the perfect storm of too many shards compacting at the same time.
-J On Fri, Jun 11, 2010 at 4:54 AM, Justin Sheehy <jus...@basho.com> wrote: > Hi, Germain. > > On Fri, Jun 11, 2010 at 11:07 AM, Germain Maurice > <germain.maur...@linkfluence.net> wrote: > >> Because of its append-only nature, stale data are created, so, how does >> Bitcask to remove stale data ? > > An excellent question, and one that we haven't yet written enough about. > >> With CouchDB the compaction process on our data never succeed, too much >> data. >> I really don't like to have to launch manually this kind of process. > > Bitcask's merging (compaction) process is automated and very tunable. > These parameters are the most relevant in your bitcask section of > app.config: > > (see the whole thing at http://hg.basho.com/bitcask/src/tip/ebin/bitcask.app) > > %% Merge trigger variables. Files exceeding ANY of these > %% values will cause bitcask:needs_merge/1 to return true. > %% > {frag_merge_trigger, 60}, % >= 60% fragmentation > {dead_bytes_merge_trigger, 536870912}, % Dead bytes > 512 MB > > %% Merge thresholds. Files exceeding ANY of these values > %% will be included in the list of files marked for merging > %% by bitcask:needs_merge/1. > %% > {frag_threshold, 40}, % >= 40% fragmentation > {dead_bytes_threshold, 134217728}, % Dead bytes > 128 MB > {small_file_threshold, 10485760}, % File is < 10 MB > > Every few minutes, the Riak storage backend for a given partition will > send a message to bitcask, requesting that it queue up a possible > merge job. (only one partition will be in the merge process at once > as a result of that queue) The bitcask application will examine that > partition when that request reaches the front of the queue. If any of > the trigger values have been exceeded, then all of the files in that > partition which exceed any threshold values will be run through > compaction. > > This allows you a great deal of flexibility in your demands, and also > provides reasonable amortization of the cost since each partition is > processed independently. > > -Justin > > _______________________________________________ > riak-users mailing list > riak-users@lists.basho.com > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > _______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com