Deleting data from bitcask backend
Hi, We have a 8-node riak v1.4.0 cluster writing data to bitcask backends. We've recently started running out of disk across all nodes and so implemented a 30-day sliding window data retention policy. This policy is enforced by a go app that concurrently deletes documents outside the window. The problem is that even though documents seem to no longer be available (doing a GET on a deleted document returns an expected 404) the disk usage is not seeming reducing much and has currently been at ~80% utilisation across all nodes for almost a week. At first I though the large amount of deletes being performed might be causing fragmentation of the merge index so I've been regularly running forced compaction as documented here: https://gist.github.com/rzezeski/3996286. This has helped somewhat but I suspect it has reached the limits of what can be done so I wonder if there is not further fragmentation elsewhere that is not being compacted. Could this be an issue? How can I tell whether merge indexes or something else needs compaction/attention? Our nodes were initially configured to run with the default settings for the bitcask backend but when this all started I switched to the following to try and see if I can trigger compaction more frequently: {bitcask, [ %% Configure how Bitcask writes data to disk. %% erlang: Erlang's built-in file API %% nif: Direct calls to the POSIX C API %% %% The NIF mode provides higher throughput for certain %% workloads, but has the potential to negatively impact %% the Erlang VM, leading to higher worst-case latencies %% and possible throughput collapse. {io_mode, erlang}, {data_root, "/var/lib/riak/bitcask"}, {frag_merge_trigger, 40}, %% trigger merge if framentation is > 40% default is 60% {dead_bytes_merge_trigger, 67108864}, %% trigger if dead bytes for keys > 64MB default is 512MB {frag_threshold, 20}, %% framentation >= 20% default is 40 {dead_bytes_threshold, 67108864} %% trigger if dead bytes for data > 64MB default is 128MB ]}, >From my observations this change did not make much of a difference. The data we're inserting is hierarchical JSON data that roughly falls into the following size (in bytes) profile: Max: 10320 Min: 1981 Avg: 3707 Med: 2905 -- Ciao Charl "I will either find a way, or make one." -- Hannibal ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Secondary indexes in ruby (using riak-ruby-client)
Hi, On 17 September 2013 23:43, Wagner Camarao wrote: > bucket.get_index 'bars_bin', 'foo' > > But am failing with: > > Zlib::DataError: incorrect header check > from > /Users/wagner/.rbenv/versions/2.0.0-p195/lib/ruby/2.0.0/net/http/response.rb:357:in > `finish' I think the Zlib error is obscuring what's really happening in the background. What backend are you using? If it is bitcask then this will not work and you need to switch to one that supports 2I, like levelDB: https://github.com/basho/riak-ruby-client/wiki/Secondary-Indexes#how-secondary-indexes-aka-2i-work -- Ciao Charl "I will either find a way, or make one." -- Hannibal ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Debugging mapreduce
Hi, I am trying to run the following mapreduce query across my cluster: # curl -XPOST http://10.179.229.209:8098/mapred -H "Content-Type: application/json" -d '{"inputs":"tweets", "query":[{"map":{"language":"javascript", "source":"function(value, keyData, arg) {t = JSON.parse(value.values[0].data)[0]; if ((new Date - new Date(t.created_at)) / 1000 > 2592000) return [t.id]; else return []}", "keep":true}}]}' {"lineno":466,"message":"SyntaxError: syntax error","source":"()"} The riak logs only have the following to report: ==> /var/log/riak/crash.log <== 2013-09-24 05:42:51 =ERROR REPORT webmachine error: path="/mapred" "Internal Server Error" ==> /var/log/riak/console.log <== 2013-09-24 05:42:51.272 [error] <0.20367.1441> Webmachine error at path "/mapred" : "Internal Server Error" ==> /var/log/riak/error.log <== 2013-09-24 05:42:51.272 [error] <0.20367.1441> Webmachine error at path "/mapred" : "Internal Server Error" Is there any way to get some more info on this to debug it further? I have tried using ejsLog() (from http://docs.basho.com/riak/1.3.2/references/appendices/MapReduce-Implementation/#Debugging-Javascript-Map-Reduce-Phases) to inspect the data in the function body but that simply gives me: # curl -XPOST http://10.179.229.209:8098/mapred -H "Content-Type: application/json" -d '{"inputs":"tweets", "query":[{"map":{"language":"javascript", "source":"function(value, keyData, arg) {t = JSON.parse(value.values[0].data)[0]; ejsLog('/tmp/map_reduce.log', JSON.stringify(t)); if ((new Date - new Date(t.created_at)) / 1000 > 2592000) return [t.id]; else return []}", "keep":true}}]}' {"lineno":1,"message":"SyntaxError: invalid flag after regular expression","source":"JSON.stringify(function(value, keyData, arg) {t = JSON.parse(value.values[0].data)[0]; ejsLog(/tmp/map_reduce.log, JSON.stringify(t)); if ((new Date - new Date(t.created_at)) / 1000 > 2592000) return [t.id]; else return []}({\"bucket\":\"tweets\",\"key\":\"37456"} I have also tried checking for already deleted documents in case that was what tripping things up but adding a check in for the X-Riak-Deleted header also results in an error: # curl -XPOST http://10.179.229.209:8098/mapred -H "Content-Type: application/json" -d '{"inputs":"tweets", "query":[{"map":{"language":"javascript", "source":"function(value, keyData, arg) {if (value.values[0].metadata['X-Riak-Deleted'] == 'true') return []; t = JSON.parse(value.values[0].data)[0]; if ((new Date - new Date(t.created_at)) / 1000 > 2592000) return [t.id]; else return []}", "keep":true}}]}' {"lineno":1,"message":"ReferenceError: X is not defined","source":"unknown"} -- Ciao Charl "I will either find a way, or make one." -- Hannibal ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Debugging mapreduce
Hi, I am trying to run the following mapreduce query across my cluster: # curl -XPOST http://10.179.229.209:8098/mapred -H "Content-Type: application/json" -d '{"inputs":"tweets", "query":[{"map":{"language":"javascript", "source":"function(value, keyData, arg) {t = JSON.parse(value.values[0].data)[0]; if ((new Date - new Date(t.created_at)) / 1000 > 2592000) return [t.id]; else return []}", "keep":true}}]}' {"lineno":466,"message":"SyntaxError: syntax error","source":"()"} The riak logs only have the following to report: ==> /var/log/riak/crash.log <== 2013-09-24 05:42:51 =ERROR REPORT webmachine error: path="/mapred" "Internal Server Error" ==> /var/log/riak/console.log <== 2013-09-24 05:42:51.272 [error] <0.20367.1441> Webmachine error at path "/mapred" : "Internal Server Error" ==> /var/log/riak/error.log <== 2013-09-24 05:42:51.272 [error] <0.20367.1441> Webmachine error at path "/mapred" : "Internal Server Error" Is there any way to get some more info on this to debug it further? I have tried using ejsLog() (from http://docs.basho.com/riak/1.3.2/references/appendices/MapReduce-Implementation/#Debugging-Javascript-Map-Reduce-Phases) to inspect the data in the function body but that simply gives me: # curl -XPOST http://10.179.229.209:8098/mapred -H "Content-Type: application/json" -d '{"inputs":"tweets", "query":[{"map":{"language":"javascript", "source":"function(value, keyData, arg) {t = JSON.parse(value.values[0].data)[0]; ejsLog('/tmp/map_reduce.log', JSON.stringify(t)); if ((new Date - new Date(t.created_at)) / 1000 > 2592000) return [t.id]; else return []}", "keep":true}}]}' {"lineno":1,"message":"SyntaxError: invalid flag after regular expression","source":"JSON.stringify(function(value, keyData, arg) {t = JSON.parse(value.values[0].data)[0]; ejsLog(/tmp/map_reduce.log, JSON.stringify(t)); if ((new Date - new Date(t.created_at)) / 1000 > 2592000) return [t.id]; else return []}({\"bucket\":\"tweets\",\"key\":\"37456"} I have also tried checking for already deleted documents in case that was what tripping things up but adding a check in for the X-Riak-Deleted header also results in an error: # curl -XPOST http://10.179.229.209:8098/mapred -H "Content-Type: application/json" -d '{"inputs":"tweets", "query":[{"map":{"language":"javascript", "source":"function(value, keyData, arg) {if (value.values[0].metadata['X-Riak-Deleted'] == 'true') return []; t = JSON.parse(value.values[0].data)[0]; if ((new Date - new Date(t.created_at)) / 1000 > 2592000) return [t.id]; else return []}", "keep":true}}]}' {"lineno":1,"message":"ReferenceError: X is not defined","source":"unknown"} -- Ciao Charl "I will either find a way, or make one." -- Hannibal ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Debugging mapreduce
Hi, On 25 September 2013 03:44, Toby Corkindale wrote: > Have you tried executing your javascript outside of Riak? > ie. paste the function into the Chrome debugger, then call it with a > Riak-like data structure. The problem with this approach is I need to make some assumptions on what the data as inout to my function looks like. > Also, consider wrapping the code in your function with an eval so you can > catch errors that occur. (Then either ejslog them or return them as results > of the map phase) With ejsLog() also not working for me I am finding it hard to inspect what riak is passing into my function to debug it elsewhere (like a js repl). -- Ciao Charl "I will either find a way, or make one." -- Hannibal ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com