Deleting data from bitcask backend

2013-09-16 Thread Charl Matthee
Hi,

We have a 8-node riak v1.4.0 cluster writing data to bitcask backends.

We've recently started running out of disk across all nodes and so
implemented a 30-day sliding window data retention policy. This policy
is enforced by a go app that concurrently deletes documents outside
the window.

The problem is that even though documents seem to no longer be
available (doing a GET on a deleted document returns an expected 404)
the disk usage is not seeming reducing much and has currently been at
~80% utilisation across all nodes for almost a week.

At first I though the large amount of deletes being performed might be
causing fragmentation of the merge index so I've been regularly
running forced compaction as documented here:
https://gist.github.com/rzezeski/3996286.

This has helped somewhat but I suspect it has reached the limits of
what can be done so I wonder if there is not further fragmentation
elsewhere that is not being compacted.

Could this be an issue? How can I tell whether merge indexes or
something else needs compaction/attention?

Our nodes were initially configured to run with the default settings
for the bitcask backend but when this all started I switched to the
following to try and see if I can trigger compaction more frequently:

 {bitcask, [
 %% Configure how Bitcask writes data to disk.
 %%   erlang: Erlang's built-in file API
 %%  nif: Direct calls to the POSIX C API
 %%
 %% The NIF mode provides higher throughput for certain
 %% workloads, but has the potential to negatively impact
 %% the Erlang VM, leading to higher worst-case latencies
 %% and possible throughput collapse.
 {io_mode, erlang},

 {data_root, "/var/lib/riak/bitcask"},

 {frag_merge_trigger, 40}, %% trigger merge if
framentation is > 40% default is 60%
 {dead_bytes_merge_trigger, 67108864}, %% trigger if dead
bytes for keys > 64MB default is 512MB
 {frag_threshold, 20}, %% framentation >= 20% default is 40
 {dead_bytes_threshold, 67108864} %% trigger if dead bytes
for data > 64MB default is 128MB
   ]},

>From my observations this change did not make much of a difference.

The data we're inserting is hierarchical JSON data that roughly falls
into the following size (in bytes) profile:

Max: 10320
Min: 1981
Avg: 3707
Med: 2905

-- 
Ciao

Charl

"I will either find a way, or make one." -- Hannibal

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Secondary indexes in ruby (using riak-ruby-client)

2013-09-17 Thread Charl Matthee
Hi,

On 17 September 2013 23:43, Wagner Camarao  wrote:

> bucket.get_index 'bars_bin', 'foo'
>
> But am failing with:
>
> Zlib::DataError: incorrect header check
> from
> /Users/wagner/.rbenv/versions/2.0.0-p195/lib/ruby/2.0.0/net/http/response.rb:357:in
> `finish'

I think the Zlib error is obscuring what's really happening in the background.

What backend are you using?

If it is bitcask then this will not work and you need to switch to one
that supports 2I, like levelDB:

https://github.com/basho/riak-ruby-client/wiki/Secondary-Indexes#how-secondary-indexes-aka-2i-work


-- 
Ciao

Charl

"I will either find a way, or make one." -- Hannibal

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Debugging mapreduce

2013-09-23 Thread Charl Matthee
Hi,

I am trying to run the following mapreduce query across my cluster:

# curl -XPOST http://10.179.229.209:8098/mapred -H "Content-Type:
application/json" -d '{"inputs":"tweets",
"query":[{"map":{"language":"javascript", "source":"function(value,
keyData, arg) {t = JSON.parse(value.values[0].data)[0]; if ((new Date
- new Date(t.created_at)) / 1000 > 2592000) return [t.id]; else return
[]}", "keep":true}}]}'
{"lineno":466,"message":"SyntaxError: syntax error","source":"()"}

The riak logs only have the following to report:

==> /var/log/riak/crash.log <==
2013-09-24 05:42:51 =ERROR REPORT
webmachine error: path="/mapred"
"Internal Server Error"

==> /var/log/riak/console.log <==
2013-09-24 05:42:51.272 [error] <0.20367.1441> Webmachine error at
path "/mapred" : "Internal Server Error"

==> /var/log/riak/error.log <==
2013-09-24 05:42:51.272 [error] <0.20367.1441> Webmachine error at
path "/mapred" : "Internal Server Error"

Is there any way to get some more info on this to debug it further?

I have tried using ejsLog() (from
http://docs.basho.com/riak/1.3.2/references/appendices/MapReduce-Implementation/#Debugging-Javascript-Map-Reduce-Phases)
to inspect the data in the function body but that simply gives me:

# curl -XPOST http://10.179.229.209:8098/mapred -H "Content-Type:
application/json" -d '{"inputs":"tweets",
"query":[{"map":{"language":"javascript", "source":"function(value,
keyData, arg) {t = JSON.parse(value.values[0].data)[0];
ejsLog('/tmp/map_reduce.log', JSON.stringify(t)); if ((new Date - new
Date(t.created_at)) / 1000 > 2592000) return [t.id]; else return []}",
"keep":true}}]}'
{"lineno":1,"message":"SyntaxError: invalid flag after regular
expression","source":"JSON.stringify(function(value, keyData, arg) {t
= JSON.parse(value.values[0].data)[0]; ejsLog(/tmp/map_reduce.log,
JSON.stringify(t)); if ((new Date - new Date(t.created_at)) / 1000 >
2592000) return [t.id]; else return
[]}({\"bucket\":\"tweets\",\"key\":\"37456"}

I have also tried checking for already deleted documents in case that
was what tripping things up but adding a check in for the
X-Riak-Deleted header also results in an error:

# curl -XPOST http://10.179.229.209:8098/mapred -H "Content-Type:
application/json" -d '{"inputs":"tweets",
"query":[{"map":{"language":"javascript", "source":"function(value,
keyData, arg) {if (value.values[0].metadata['X-Riak-Deleted'] ==
'true') return []; t = JSON.parse(value.values[0].data)[0]; if ((new
Date - new Date(t.created_at)) / 1000 > 2592000) return [t.id]; else
return []}", "keep":true}}]}'
{"lineno":1,"message":"ReferenceError: X is not defined","source":"unknown"}


-- 
Ciao

Charl

"I will either find a way, or make one." -- Hannibal

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Debugging mapreduce

2013-09-24 Thread Charl Matthee
Hi,

I am trying to run the following mapreduce query across my cluster:

# curl -XPOST http://10.179.229.209:8098/mapred -H "Content-Type:
application/json" -d '{"inputs":"tweets",
"query":[{"map":{"language":"javascript", "source":"function(value,
keyData, arg) {t = JSON.parse(value.values[0].data)[0]; if ((new Date
- new Date(t.created_at)) / 1000 > 2592000) return [t.id]; else return
[]}", "keep":true}}]}'
{"lineno":466,"message":"SyntaxError: syntax error","source":"()"}

The riak logs only have the following to report:

==> /var/log/riak/crash.log <==
2013-09-24 05:42:51 =ERROR REPORT
webmachine error: path="/mapred"
"Internal Server Error"

==> /var/log/riak/console.log <==
2013-09-24 05:42:51.272 [error] <0.20367.1441> Webmachine error at
path "/mapred" : "Internal Server Error"

==> /var/log/riak/error.log <==
2013-09-24 05:42:51.272 [error] <0.20367.1441> Webmachine error at
path "/mapred" : "Internal Server Error"

Is there any way to get some more info on this to debug it further?

I have tried using ejsLog() (from
http://docs.basho.com/riak/1.3.2/references/appendices/MapReduce-Implementation/#Debugging-Javascript-Map-Reduce-Phases)
to inspect the data in the function body but that simply gives me:

# curl -XPOST http://10.179.229.209:8098/mapred -H "Content-Type:
application/json" -d '{"inputs":"tweets",
"query":[{"map":{"language":"javascript", "source":"function(value,
keyData, arg) {t = JSON.parse(value.values[0].data)[0];
ejsLog('/tmp/map_reduce.log', JSON.stringify(t)); if ((new Date - new
Date(t.created_at)) / 1000 > 2592000) return [t.id]; else return []}",
"keep":true}}]}'
{"lineno":1,"message":"SyntaxError: invalid flag after regular
expression","source":"JSON.stringify(function(value, keyData, arg) {t
= JSON.parse(value.values[0].data)[0]; ejsLog(/tmp/map_reduce.log,
JSON.stringify(t)); if ((new Date - new Date(t.created_at)) / 1000 >
2592000) return [t.id]; else return
[]}({\"bucket\":\"tweets\",\"key\":\"37456"}

I have also tried checking for already deleted documents in case that
was what tripping things up but adding a check in for the
X-Riak-Deleted header also results in an error:

# curl -XPOST http://10.179.229.209:8098/mapred -H "Content-Type:
application/json" -d '{"inputs":"tweets",
"query":[{"map":{"language":"javascript", "source":"function(value,
keyData, arg) {if (value.values[0].metadata['X-Riak-Deleted'] ==
'true') return []; t = JSON.parse(value.values[0].data)[0]; if ((new
Date - new Date(t.created_at)) / 1000 > 2592000) return [t.id]; else
return []}", "keep":true}}]}'
{"lineno":1,"message":"ReferenceError: X is not defined","source":"unknown"}


-- 
Ciao

Charl

"I will either find a way, or make one." -- Hannibal

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Debugging mapreduce

2013-09-24 Thread Charl Matthee
Hi,

On 25 September 2013 03:44, Toby Corkindale
 wrote:

> Have you tried executing your javascript outside of Riak?
> ie. paste the function into the Chrome debugger, then call it with a
> Riak-like data structure.

The problem with this approach is I need to make some assumptions on
what the data as inout to my function looks like.

> Also, consider wrapping the code in your function with an eval so you can
> catch errors that occur. (Then either ejslog them or return them as results
> of the map phase)

With ejsLog() also not working for me I am finding it hard to inspect
what riak is passing into my function to debug it elsewhere (like a js
repl).

-- 
Ciao

Charl

"I will either find a way, or make one." -- Hannibal

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com