Hi Charl, The problem is that even though documents seem to no longer be available (doing a GET on a deleted document returns an expected 404) the disk usage is not seeming reducing much and has currently been at ~80% utilisation across all nodes for almost a week. When you delete a document, a tombstone record is written to bitcask, and the reference to the key is removed from memory (which is why you get 404's). The old entry isn't actually removed until the next bitcask merge.
At first I though the large amount of deletes being performed might be causing fragmentation of the merge index so I've been regularly running forced compaction as documented here: https://gist.github.com/rzezeski/3996286. That merge index is for Riak Search, not bitcask. There are ways of forcing a merge, but let's double check your settings/logs first. Can you send me your app.config and a console.log from one of your nodes? Thanks, Alex -- Alex Moore Sent with Airmail On September 16, 2013 at 4:43:07 AM, Charl Matthee (ch...@ntrippy.net) wrote: Hi, We have a 8-node riak v1.4.0 cluster writing data to bitcask backends. We've recently started running out of disk across all nodes and so implemented a 30-day sliding window data retention policy. This policy is enforced by a go app that concurrently deletes documents outside the window. The problem is that even though documents seem to no longer be available (doing a GET on a deleted document returns an expected 404) the disk usage is not seeming reducing much and has currently been at ~80% utilisation across all nodes for almost a week. At first I though the large amount of deletes being performed might be causing fragmentation of the merge index so I've been regularly running forced compaction as documented here: https://gist.github.com/rzezeski/3996286. This has helped somewhat but I suspect it has reached the limits of what can be done so I wonder if there is not further fragmentation elsewhere that is not being compacted. Could this be an issue? How can I tell whether merge indexes or something else needs compaction/attention? Our nodes were initially configured to run with the default settings for the bitcask backend but when this all started I switched to the following to try and see if I can trigger compaction more frequently: {bitcask, [ %% Configure how Bitcask writes data to disk. %% erlang: Erlang's built-in file API %% nif: Direct calls to the POSIX C API %% %% The NIF mode provides higher throughput for certain %% workloads, but has the potential to negatively impact %% the Erlang VM, leading to higher worst-case latencies %% and possible throughput collapse. {io_mode, erlang}, {data_root, "/var/lib/riak/bitcask"}, {frag_merge_trigger, 40}, %% trigger merge if framentation is > 40% default is 60% {dead_bytes_merge_trigger, 67108864}, %% trigger if dead bytes for keys > 64MB default is 512MB {frag_threshold, 20}, %% framentation >= 20% default is 40 {dead_bytes_threshold, 67108864} %% trigger if dead bytes for data > 64MB default is 128MB ]}, From my observations this change did not make much of a difference. The data we're inserting is hierarchical JSON data that roughly falls into the following size (in bytes) profile: Max: 10320 Min: 1981 Avg: 3707 Med: 2905 -- Ciao Charl "I will either find a way, or make one." -- Hannibal _______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
_______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com