Is it smart enough to coordinate with the other partitions to ensure
not more than 25% (just a plug number) of the partitions are
compacting at the same time? It would seem to me there's the
possibility for a performance drop if you had the perfect storm of too
many shards compacting at the same time.

-J

On Fri, Jun 11, 2010 at 4:54 AM, Justin Sheehy <jus...@basho.com> wrote:
> Hi, Germain.
>
> On Fri, Jun 11, 2010 at 11:07 AM, Germain Maurice
> <germain.maur...@linkfluence.net> wrote:
>
>> Because of its append-only nature, stale data are created, so, how does
>> Bitcask to remove stale data ?
>
> An excellent question, and one that we haven't yet written enough about.
>
>> With CouchDB the compaction process on our data never succeed, too much
>> data.
>> I really don't like to have to launch manually this kind of process.
>
> Bitcask's merging (compaction) process is automated and very tunable.
> These parameters are the most relevant in your bitcask section of
> app.config:
>
> (see the whole thing at http://hg.basho.com/bitcask/src/tip/ebin/bitcask.app)
>
> %% Merge trigger variables. Files exceeding ANY of these
> %% values will cause bitcask:needs_merge/1 to return true.
> %%
> {frag_merge_trigger, 60},              % >= 60% fragmentation
> {dead_bytes_merge_trigger, 536870912}, % Dead bytes > 512 MB
>
> %% Merge thresholds. Files exceeding ANY of these values
> %% will be included in the list of files marked for merging
> %% by bitcask:needs_merge/1.
> %%
> {frag_threshold, 40},                  % >= 40% fragmentation
> {dead_bytes_threshold, 134217728},     % Dead bytes > 128 MB
> {small_file_threshold, 10485760},      % File is < 10 MB
>
> Every few minutes, the Riak storage backend for a given partition will
> send a message to bitcask, requesting that it queue up a possible
> merge job.  (only one partition will be in the merge process at once
> as a result of that queue)  The bitcask application will examine that
> partition when that request reaches the front of the queue.  If any of
> the trigger values have been exceeded, then all of the files in that
> partition which exceed any threshold values will be run through
> compaction.
>
> This allows you a great deal of flexibility in your demands, and also
> provides reasonable amortization of the cost since each partition is
> processed independently.
>
> -Justin
>
> _______________________________________________
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>

_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Reply via email to