Justin -

My current bitcask settings are:

 %% Bitcask Config
 {bitcask, [
             {data_root, "/var/lib/riaksearch/bitcask" },
             {dead_bytes_merge_trigger, 10242880 },
             {dead_bytes_threshold, 5242880 },
             {expiry_secs, 86400}
           ]},

My understanding of these settings mean that the data should auto-expire after one day. Also, once each bitcask file in .../riaksearch/bitcask/xxx/*.data once it has 10M of "dead" or expired data in it, should be merged, right?

I'm collecting the spritzer twitter stream and loading it into two buckets (one non-indexed bucket holds the full tweet, one indexed bucket holds the tweet string, id, date and username). I used to see about 10 GB of data total, but it's growing and currently at 26GB of data total.

I'm seeing these in the logs:

INFO REPORT==== 13-Jun-2011::08:28:19 ===
Pid <0.6844.0> compacted 3 segments for 942232 bytes in 4.900694 seconds, 0.18 MB/sec

=INFO REPORT==== 13-Jun-2011::08:29:01 ===
Pid <0.6267.0> compacted 3 segments for 1721790 bytes in 9.690511 seconds, 0.17 MB/sec

=INFO REPORT==== 13-Jun-2011::08:31:23 ===
Pid <0.6924.0> compacted 3 segments for 6988416 bytes in 44.659753 seconds, 0.15 MB/sec

... but I'm not seeing any "merging" related entries.

- Steve

--
Steve Webb - Senior System Administrator for gnip.com
http://twitter.com/GnipWebb

On Wed, 8 Jun 2011, Justin Sheehy wrote:

Hi, Steve.

Check out this page: http://wiki.basho.com/Bitcask-Configuration.html#Disk-Usage-and-Merging-Settings

Basically, a "merge trigger" must be met in order to have the merge process occur. When it does occur, it will affect all existing files that meet a "merge threshold."

One note that is relevant for your specific use: the expiry_secs parameter will cause a given item to disappear from the client API immediately after expiry, and to be cleaned if it is in a file already being merged, but will not currently contribute toward merge triggers or thresholds on its own if not otherwise "dead".

-Justin


On Jun 7, 2011, at 4:29 PM, Steve Webb wrote:

Hello there.


I'm curious - I'm up to about 10GB of storage and I'm guessing that I'll be full in 3-4 more days of ingesting data. I have no idea if/when a merge will run to expire the older data.

I'm loading a 2-node (1GB mem, 20GB storage, vmware VMs) riaksearch cluster with the spritzer twitter feed. I used the bitcask 'expiry_secs' to expire data after 3 days. Q: Is there a method or command to force a merge at any time? Q: Is there a way to run a merge when the storage size reaches a specific threshold?

- Steve

--
Steve Webb - Senior System Administrator for gnip.com
http://twitter.com/GnipWebb

_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Reply via email to