Thanks Brian. This is very helpful and informative! Only question is we do indeed have data files that are > 2GB that haven't merged. I have a data file at 2.1G, as of today. The last merged happened Jan 20, which was two days ago.
Anything we are missing here? On Sat, Jan 19, 2013 at 2:01 PM, Brian Sparrow <bspar...@basho.com> wrote: > Hi Ian, > > Q: Why does this happen on a restart only and not at other times, even > though we have set our merge_window config setting to 'always'? > > A: The merge_window setting defaults to always and that simply means that > bitcask will merge anytime the merge_triggers and thresholds set in the > app.config are satisfied on a closed bitcask file. A bitcask data file is > closed when it reaches the max_file_size threshold set in > app.config(default of 2GB) or the node is stopped. Merging is happening on > restart because this file is now closed and eligible for merge. You are not > seeing merging during normal operation because the files are not reaching > the 2GB max_file_size. > > Q: I'm assuming our config values are correct as the restart considers the > thresholds met to go ahead with a merge. > > A: I do not see anything wrong with your configuration with the exception > of no max_file_size being set. If you want to merge more often, reduce the > max_file size to 1GB or lower(the lower the file size, the more merging). A > good way to estimate the appropriate size is to look at how large your > .data files are getting in your bitcask data directory and deciding from > there what size threshold you would like to set based on your disk usage > needs. > > Q: Also: Are there any tools out there that I can run on my data directory > that will tell me the size of the dead byte ratios? Would be nice to see > what the dead byte 'state' of my data dir is in so I can tell whether > indeed merge conditions are met. > > A: The command you are looking for is riak-admin vnode_status. This will > list all partitions on the local node as well as the number of keys in each > partition and a list of closed bitcask data files in the format: > > {CLOSE_DATA_FILE_NAME, FRAG_PERCENTAGE, DEAD_BYTES, TOTAL_FILE_SIZE} > > Again, a data file must be closed(reached max_file_size or closed by node > stop) to be reported in this command. > > Hope this answers all your questions. > > Thanks! > > > -- > Brian Sparrow > Customer Service Engineer > Basho Technologies > > On Friday, January 18, 2013 at 5:14 PM, Ian Ha wrote: > > Hi, > > In our production system, we notice that merges are not taking place. We > have noticed, however, that when we restart riak (via a 'riak stop' then a > 'riak start'), then the merges are triggered and our disk usage goes way > day (which is what we want). We use bitcask. > > Why does this happen on a restart only and not at other times, even though > we have set our merge_window config setting to 'always'? > > I'm assuming our config values are correct as the restart considers the > thresholds met to go ahead with a merge. > > Also: Are there any tools out there that I can run on my data directory > that will tell me the size of the dead byte ratios? Would be nice to see > what the dead byte 'state' of my data dir is in so I can tell whether > indeed merge conditions are met. > > Our bit cask configs are as follows in app.config: > > {bitcask, [ > {data_root, "/var/lib/riak/bitcask"}, > {dead_bytes_merge_trigger, 268435456}, > {dead_bytes_threshold, 134217728}, > {expiry_secs, 3888000}, > {frag_merge_trigger, 40}, > {frag_threshold, 20}, > {merge_window, always}, > {small_file_threshold, 10485760}, > {sync_strategy, none} > ]}, > > > Thanks! > _______________________________________________ > riak-users mailing list > riak-users@lists.basho.com > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > > >
_______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com