Justin -
My current bitcask settings are:
%% Bitcask Config
{bitcask, [
{data_root, "/var/lib/riaksearch/bitcask" },
{dead_bytes_merge_trigger, 10242880 },
{dead_bytes_threshold, 5242880 },
{expiry_secs, 86400}
]},
My understanding of these settings mean that the data should auto-expire
after one day. Also, once each bitcask file in
.../riaksearch/bitcask/xxx/*.data once it has 10M of "dead" or expired
data in it, should be merged, right?
I'm collecting the spritzer twitter stream and loading it into two buckets
(one non-indexed bucket holds the full tweet, one indexed bucket holds the
tweet string, id, date and username). I used to see about 10 GB of data
total, but it's growing and currently at 26GB of data total.
I'm seeing these in the logs:
INFO REPORT==== 13-Jun-2011::08:28:19 ===
Pid <0.6844.0> compacted 3 segments for 942232 bytes in 4.900694 seconds,
0.18 MB/sec
=INFO REPORT==== 13-Jun-2011::08:29:01 ===
Pid <0.6267.0> compacted 3 segments for 1721790 bytes in 9.690511 seconds,
0.17 MB/sec
=INFO REPORT==== 13-Jun-2011::08:31:23 ===
Pid <0.6924.0> compacted 3 segments for 6988416 bytes in 44.659753
seconds, 0.15 MB/sec
... but I'm not seeing any "merging" related entries.
- Steve
--
Steve Webb - Senior System Administrator for gnip.com
http://twitter.com/GnipWebb
On Wed, 8 Jun 2011, Justin Sheehy wrote:
Hi, Steve.
Check out this page:
http://wiki.basho.com/Bitcask-Configuration.html#Disk-Usage-and-Merging-Settings
Basically, a "merge trigger" must be met in order to have the merge
process occur. When it does occur, it will affect all existing files
that meet a "merge threshold."
One note that is relevant for your specific use: the expiry_secs
parameter will cause a given item to disappear from the client API
immediately after expiry, and to be cleaned if it is in a file already
being merged, but will not currently contribute toward merge triggers or
thresholds on its own if not otherwise "dead".
-Justin
On Jun 7, 2011, at 4:29 PM, Steve Webb wrote:
Hello there.
I'm curious - I'm up to about 10GB of storage and I'm guessing that
I'll be full in 3-4 more days of ingesting data. I have no idea
if/when a merge will run to expire the older data.
I'm loading a 2-node (1GB mem, 20GB storage, vmware VMs) riaksearch
cluster with the spritzer twitter feed. I used the bitcask
'expiry_secs' to expire data after 3 days. Q: Is there a method or
command to force a merge at any time? Q: Is there a way to run a merge
when the storage size reaches a specific threshold?
- Steve
--
Steve Webb - Senior System Administrator for gnip.com
http://twitter.com/GnipWebb
_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com