Re: [ANN] Ruby Client v1.1.1 Release

2013-04-03 Thread Sean McKibben
I'm wondering if we could get a 1.1.2 version bump pretty soon. Not being able 
to do 2i over PBC with 1.1.1 is rather painful and I kind of need a released 
version to send it to production.

Thanks,
Sean McKibben

On Jan 10, 2013, at 2:05 PM, Sean Cribbs  wrote:

> Hey riak-users,
> 
> Today we released the Ruby Riak Client (riak-client gem), version
> 1.1.1. The only change from version 1.1.0 was a fix for older
> patchlevels of Ruby 1.8.7 (before p315) that had a bug in Net::HTTP.
> We encountered this bug when testing the upcoming Riak 1.3 release on
> Ubuntu 10.04LTS, which has a maximum Ruby version of 1.8.7p249. If you
> are on one of those old versions, it is definitely recommended to
> upgrade to a later patchlevel or Ubuntu release and avoid this bug
> altogether.
> 
> Cheers,
> 
> -- 
> Sean Cribbs 
> Software Engineer
> Basho Technologies, Inc.
> http://basho.com/
> 
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: [ANN] Ruby Client v1.1.1 Release

2013-04-03 Thread Sean McKibben
IIRC it was fixed in 1.1.0 but came back in 1.1.1.

I did a simple test just now and sanitized it for this gist:
https://gist.github.com/graphex/5305274

I also have to store items with ? in their keys, which has issues over http 
(noted in https://github.com/basho/riak-ruby-client/issues/80 )

So I have to have two different clients at all times, one PBC so i can write ? 
in keys, and one HTTP so I can get 2i queries back…

Sean

On Apr 3, 2013, at 2:51 PM, Sean Cribbs  wrote:

> Can you clarify the problem you're having? That feature was merged in
> July, long before the 1.1.0 release:
> https://github.com/basho/riak-ruby-client/commit/4fe52756d7df6ee494bfbc40552ec017f3ff4da4
> 
> On Wed, Apr 3, 2013 at 3:35 PM, Sean McKibben  wrote:
>> I'm wondering if we could get a 1.1.2 version bump pretty soon. Not being 
>> able to do 2i over PBC with 1.1.1 is rather painful and I kind of need a 
>> released version to send it to production.
>> 
>> Thanks,
>> Sean McKibben
>> 
>> On Jan 10, 2013, at 2:05 PM, Sean Cribbs  wrote:
>> 
>>> Hey riak-users,
>>> 
>>> Today we released the Ruby Riak Client (riak-client gem), version
>>> 1.1.1. The only change from version 1.1.0 was a fix for older
>>> patchlevels of Ruby 1.8.7 (before p315) that had a bug in Net::HTTP.
>>> We encountered this bug when testing the upcoming Riak 1.3 release on
>>> Ubuntu 10.04LTS, which has a maximum Ruby version of 1.8.7p249. If you
>>> are on one of those old versions, it is definitely recommended to
>>> upgrade to a later patchlevel or Ubuntu release and avoid this bug
>>> altogether.
>>> 
>>> Cheers,
>>> 
>>> --
>>> Sean Cribbs 
>>> Software Engineer
>>> Basho Technologies, Inc.
>>> http://basho.com/
>>> 
>>> ___
>>> riak-users mailing list
>>> riak-users@lists.basho.com
>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>> 
> 
> 
> 
> -- 
> Sean Cribbs 
> Software Engineer
> Basho Technologies, Inc.
> http://basho.com/


___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


2i timeouts in 1.4

2013-07-26 Thread Sean McKibben
We just upgraded to 1.4 and are having a big problem with some of our larger 2i 
queries. We have a few key queries that takes longer than 60 seconds (usually 
about 110 seconds) to execute, but after going to 1.4 we can't seem to get 
around a 60 second timeout.

I've tried:
curl -H "X-Riak-Timeout: 26" 
"http://127.0.0.1:8098/buckets/mybucket/index/test_bin/myval?x-riak-timeout=26";
 -i

But I always get
HTTP/1.1 500 Internal Server Error
Vary: Accept-Encoding
Server: MochiWeb/1.1 WebMachine/1.10.0 (never breaks eye contact)
Date: Fri, 26 Jul 2013 21:41:28 GMT
Content-Type: text/html
Content-Length: 265
Connection: close

500 Internal Server ErrorInternal 
Server ErrorThe server encountered an error while processing this 
request:{error,{error,timeout}}mochiweb+webmachine
 web server

Right at the 60 second mark. What can I set to give my secondary index queries 
more time??

This is causing major problems for us :(

Sean___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: 2i timeouts in 1.4

2013-07-26 Thread Sean McKibben
I should have mentioned that I also tried:
curl -H "X-Riak-Timeout: 26" 
"http://127.0.0.1:8098/buckets/mybucket/index/test_bin/myval?timeout=26"; -i
but still receive the 500 error below exactly at the 60 second mark. Is this a 
bug?

Secondary to getting this working at all, is this documented anywhere? and any 
way to set this timeout using the ruby riak client?

Stream may well work, but I'm going to have to make a number of changes on the 
client side to deal with the results.

Sean

On Jul 26, 2013, at 3:53 PM, Brian Roach  wrote:

> Sean -
> 
> The timeout isn't via a header, it's a query param -> &timeout=
> 
> You can also use stream=true to stream the results.
> 
> - Roach
> 
> Sent from my iPhone
> 
> On Jul 26, 2013, at 3:43 PM, Sean McKibben  wrote:
> 
>> We just upgraded to 1.4 and are having a big problem with some of our larger 
>> 2i queries. We have a few key queries that takes longer than 60 seconds 
>> (usually about 110 seconds) to execute, but after going to 1.4 we can't seem 
>> to get around a 60 second timeout.
>> 
>> I've tried:
>> curl -H "X-Riak-Timeout: 26" 
>> "http://127.0.0.1:8098/buckets/mybucket/index/test_bin/myval?x-riak-timeout=26";
>>  -i
>> 
>> But I always get
>> HTTP/1.1 500 Internal Server Error
>> Vary: Accept-Encoding
>> Server: MochiWeb/1.1 WebMachine/1.10.0 (never breaks eye contact)
>> Date: Fri, 26 Jul 2013 21:41:28 GMT
>> Content-Type: text/html
>> Content-Length: 265
>> Connection: close
>> 
>> 500 Internal Server 
>> ErrorInternal Server ErrorThe server 
>> encountered an error while processing this 
>> request:{error,{error,timeout}}mochiweb+webmachine
>>  web server
>> 
>> Right at the 60 second mark. What can I set to give my secondary index 
>> queries more time??
>> 
>> This is causing major problems for us :(
>> 
>> Sean
>> ___
>> riak-users mailing list
>> riak-users@lists.basho.com
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: 2i timeouts in 1.4

2013-07-26 Thread Sean McKibben
Thank you for looking in to this. This is a major problem for our production 
cluster, and we're in a bit of a bind right now trying to figure out a 
workaround in the interim. It sounds like maybe a mapreduce might handle the 
timeout properly, so hopefully we can do that in the meantime.
If there is any way we can have a hotfix ASAP though, that would be preferable. 
Certainly would not be a problem for us to edit a value in the config file (and 
given the lack of support in the ruby client for the timeout setting, the 
ability to edit the global default would be preferred).
In the ruby client i had to monkeypatch it like this to even submit the timeout 
value, which is not ideal:

module Riak
  class Client
class HTTPBackend
  def get_index(bucket, index, query)
bucket = bucket.name if Bucket === bucket
path = case query
   when Range
 raise ArgumentError, t('invalid_index_query', :value => 
query.inspect) unless String === query.begin || Integer === query.end
 index_range_path(bucket, index, query.begin, query.end)
   when String, Integer
 index_eq_path(bucket, index, query, 'timeout' => '26')
   else
 raise ArgumentError, t('invalid_index_query', :value => 
query.inspect)
   end
response = get(200, path)
JSON.parse(response[:body])['keys']
  end
end
  end
end

Thanks for the update,
Sean



On Jul 26, 2013, at 4:49 PM, Russell Brown  wrote:

> Hi Sean,
> I'm very sorry to say that you've found a featurebug.
> 
> There was a fix put in here https://github.com/basho/riak_core/pull/332
> 
> But that means that the default timeout of 60 seconds is now honoured. In the 
> past it was not.
> 
> As far as I can see the 2i endpoint never accepted a timeout argument, and it 
> still does not.
> 
> The fix would be to add the timeout to the 2i API endpoints, and I'll do that 
> straight away.
> 
> In the meantime, I wonder if streaming the results would help, or if you'd 
> still hit the overall timeout?
> 
> Very sorry that you've run into this. Let me know if streaming helps, I've 
> raised an issue here[1] if you want to track this bug
> 
> Cheers
> 
> Russell
> 
> [1] https://github.com/basho/riak_kv/issues/610
> 
> 
> On 26 Jul 2013, at 17:59, Sean McKibben  wrote:
> 
>> I should have mentioned that I also tried:
>> curl -H "X-Riak-Timeout: 26" 
>> "http://127.0.0.1:8098/buckets/mybucket/index/test_bin/myval?timeout=26"; 
>> -i
>> but still receive the 500 error below exactly at the 60 second mark. Is this 
>> a bug?
>> 
>> Secondary to getting this working at all, is this documented anywhere? and 
>> any way to set this timeout using the ruby riak client?
>> 
>> Stream may well work, but I'm going to have to make a number of changes on 
>> the client side to deal with the results.
>> 
>> Sean
>> 
>> On Jul 26, 2013, at 3:53 PM, Brian Roach  wrote:
>> 
>>> Sean -
>>> 
>>> The timeout isn't via a header, it's a query param -> &timeout=
>>> 
>>> You can also use stream=true to stream the results.
>>> 
>>> - Roach
>>> 
>>> Sent from my iPhone
>>> 
>>> On Jul 26, 2013, at 3:43 PM, Sean McKibben  wrote:
>>> 
>>>> We just upgraded to 1.4 and are having a big problem with some of our 
>>>> larger 2i queries. We have a few key queries that takes longer than 60 
>>>> seconds (usually about 110 seconds) to execute, but after going to 1.4 we 
>>>> can't seem to get around a 60 second timeout.
>>>> 
>>>> I've tried:
>>>> curl -H "X-Riak-Timeout: 26" 
>>>> "http://127.0.0.1:8098/buckets/mybucket/index/test_bin/myval?x-riak-timeout=26";
>>>>  -i
>>>> 
>>>> But I always get
>>>> HTTP/1.1 500 Internal Server Error
>>>> Vary: Accept-Encoding
>>>> Server: MochiWeb/1.1 WebMachine/1.10.0 (never breaks eye contact)
>>>> Date: Fri, 26 Jul 2013 21:41:28 GMT
>>>> Content-Type: text/html
>>>> Content-Length: 265
>>>> Connection: close
>>>> 
>>>> 500 Internal Server 
>>>> ErrorInternal Server ErrorThe server 
>>>> encountered an error while processing this 
>>>> request:{error,{error,timeout}}mochiweb+webmachine
>>>>  web server
>>>> 
>>>> Right at the 60 second mark. What can I set to give my secondary index 
>>>> queries more time??
>>>> 
>>>> This is causing major problems for us :(
>>>> 
>>>> Sean
>>>> ___
>>>> riak-users mailing list
>>>> riak-users@lists.basho.com
>>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>> 
>> ___
>> riak-users mailing list
>> riak-users@lists.basho.com
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> 


___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: 2i timeouts in 1.4

2013-07-26 Thread Sean McKibben
So when I try to use pagination, it doesn't seem to be picking up my 
continuation. I'm having trouble parsing the json I get back using stream=true 
(and there is still a timeout) so I went to just using pagination. Perhaps I'm 
doing it wrong, (likely, it has been a long day) but riak seems to be ignoring 
my continuation:

(pardon the sanitization)
curl 'http://127.0.0.1:8098/buckets/mybucket/index/test_bin/myval?max_results=5'
{"keys":["1","2","3","4","5"],"continuation":"g20AAABAMDAwMDE1ZWVjMmNiZjY3Y2Y4YmU3ZTVkMWNiZTVjM2ZkYjg2YWU0MGIwNzNjMTE3NDYyZjEzMTNlMDQ5YmI2ZQ=="}

curl 
'http://127.0.0.1:8098/buckets/mybucket/index/test_bin/myval?max_results=5&continuation=g20AAABAMDAwMDE1ZWVjMmNiZjY3Y2Y4YmU3ZTVkMWNiZTVjM2ZkYjg2YWU0MGIwNzNjMTE3NDYyZjEzMTNlMDQ5YmI2ZQ=='
{"keys":["1","2","3","4","5"],"continuation":"g20AAABAMDAwMDE1ZWVjMmNiZjY3Y2Y4YmU3ZTVkMWNiZTVjM2ZkYjg2YWU0MGIwNzNjMTE3NDYyZjEzMTNlMDQ5YmI2ZQ=="}

The same keys and continuation value are returned regardless of whether my 
request contains a continuation value. I've tried swapping the order of 
max_results and continuation without any luck. I also made sure that my 
continuation value was url encoded. Hopefully I'm not missing something obvious 
here. Well, come to think of it, hopefully I am missing something obvious!

Sean

On Jul 26, 2013, at 6:43 PM, Russell Brown  wrote:

> For a work around you could use streaming and pagination.
> 
> Request smaller pages of data (i.e. sub 60 seconds worth) and use streaming 
> to get the results to your client sooner.
> 
> In HTTP this would look like
> 
> http://127.0.0.1:8098/buckets/mybucket/index/test_bin/myval?max_results=1&stream=true
> 
> your results will include a continuation like
> 
>{"continuation":"g2gCYgAAFXttBDU0OTk="}
> 
> and you can use that to get the next N results. Breaking your query up that 
> way should duck the timeout.
> 
> Furthermore, adding &stream=true will mean the first results is received very 
> rapidly.
> 
> I don't think the Ruby client is up to date for the new 2i features, but you 
> could monkeypatch as before.
> 
> Cheers
> 
> Russell
> 
> On 26 Jul 2013, at 19:00, Sean McKibben  wrote:
> 
>> Thank you for looking in to this. This is a major problem for our production 
>> cluster, and we're in a bit of a bind right now trying to figure out a 
>> workaround in the interim. It sounds like maybe a mapreduce might handle the 
>> timeout properly, so hopefully we can do that in the meantime.
>> If there is any way we can have a hotfix ASAP though, that would be 
>> preferable. Certainly would not be a problem for us to edit a value in the 
>> config file (and given the lack of support in the ruby client for the 
>> timeout setting, the ability to edit the global default would be preferred).
>> In the ruby client i had to monkeypatch it like this to even submit the 
>> timeout value, which is not ideal:
>> 
>> module Riak
>> class Client
>>   class HTTPBackend
>> def get_index(bucket, index, query)
>>   bucket = bucket.name if Bucket === bucket
>>   path = case query
>>  when Range
>>raise ArgumentError, t('invalid_index_query', :value => 
>> query.inspect) unless String === query.begin || Integer === query.end
>>index_range_path(bucket, index, query.begin, query.end)
>>  when String, Integer
>>index_eq_path(bucket, index, query, 'timeout' => '26')
>>  else
>>raise ArgumentError, t('invalid_index_query', :value => 
>> query.inspect)
>>  end
>>   response = get(200, path)
>>   JSON.parse(response[:body])['keys']
>> end
>>   end
>> end
>> end
>> 
>> Thanks for the update,
>> Sean
>> 
>> 
>> 
>> On Jul 26, 2013, at 4:49 PM, Russell Brown  wrote:
>> 
>>> Hi Sean,
>>> I'm very sorry to say that you've found a featurebug.
>>> 
>>> There was a fix put in here https://github.com/basho/riak_core/pull/332
>>> 
>>> But that means that the default timeout of 60 seconds is now honoured. In 
>>> the past it was not.
>>> 
>>> As far as I can see the 2i endpoint never accepted a timeout argument, and 
>>> it still does not.
>>> 
>>> The fix would be to add the timeout to the 2i API endpoints, and I&#x

Re: Keys that won't disappear from indexes

2013-12-04 Thread Sean McKibben
This same thing is happening to me, where both $bucket index and my own custom 
indexes are returning keys that have been deleted and I can’t remove them.
I am hoping there is a way to fix this as it is causing significant problems 
for us in production. It seems to be happening with some frequency, and every 
once in a while an index will just go bad completely and either return subsets 
of what it should return (even with a healthy cluster), or keys that have been 
deleted.

I had a case yesterday where $bucket was returning 6 keys that my custom 
all-inclusive index wasn’t returning. They all produced 404s when I tried to 
retrieve a value. I was hoping read repair would repair the index when a 404 
occurred, or at least AAE might pick it up, but as it stands now, is there any 
way corrupted indexes like this can ever get back to normal?

Sean


On Nov 4, 2013, at 9:44 AM, Evan Vigil-McClanahan  wrote:

> Hi Toby.
> 
> It's possible, since they're stored separately, that the objects were
> deleted but the indices were left in place because of some error (e.g.
> the operation failed for some reason between the object removal and
> the index removal).  One of the things on the feature list for the
> next release is AAE of index values, which should take care of this
> case.  This is really rare, but not unknown.  It'd be interesting to
> know how you ended up with so many.
> 
> In the mean time, the only way I can think of to get rid of them
> (other than deleting them from the console, which would require taking
> nodes down and a lot of manual effort), would be to write another
> value that would have the same index, then delete it, which should
> normally succeed.
> 
> I'll ask around to see if there is anything that might work better.
> 
> On Sun, Nov 3, 2013 at 7:40 PM, Toby Corkindale
>  wrote:
>> On 01/11/13 14:04, Toby Corkindale wrote:
>>> 
>>> Hi,
>>> I have around 5000 keys which just won't die.
>>> No matter how many times I delete them, they still show up in the 2i
>>> $bucket=_ index.
>>> 
>>> Actually attempting to retrieve the keys results in a not-found - even
>>> 
>>> if I've requested that tombstones be returned.
>>> 
>>> I'm interested to know what is going on here?
>> 
>> 
>> Anyone?
>> 
>> Should I report this as a bug against 1.4.2?
>> 
>> 
>> ___
>> riak-users mailing list
>> riak-users@lists.basho.com
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> 
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Single node causing cluster to be extremely slow (leveldb)

2014-01-09 Thread Sean McKibben
We have a 5 node cluster using elevelDB (1.4.2) and 2i, and this afternoon it 
started responding extremely slowly. CPU on member 4 was extremely high and we 
restarted that process, but it didn’t help. We temporarily shut down member 4 
and cluster speed returned to normal, but as soon as we boot member 4 back up, 
the cluster performance goes to shit.

We’ve run in to this before but were able to just start with a fresh set of 
data after wiping machines as it was before we migrated to this bare-metal 
cluster. Now it is causing some pretty significant issues and we’re not sure 
what we can do to get it back to normal, many of our queues are filling up and 
we’ll probably have to take node 4 off again just so we can provide a regular 
quality of service.

We’ve turned off AAE on node 4 but it hasn’t helped. We have some transfers 
that need to happen but they are going very slowly.

'riak-admin top’ on node 4 reports this:
 Load:  cpu   610   Memory:  total  503852binary 
231544
procs 804processes  179850code
11588
runq  134atom  533ets  
4581

Pid Name or Initial Func Time   Reds Memory 
  MsgQ Current Function
---
<6175.29048.3>  proc_lib:init_p/5 '-' 462231   51356760 
 0 mochijson2:json_bin_is_safe/1
<6175.12281.6>  proc_lib:init_p/5 '-' 307183   64195856 
 1 gen_fsm:loop/7
<6175.1581.5>   proc_lib:init_p/5 '-' 286143   41085600 
 0 mochijson2:json_bin_is_safe/1
<6175.6659.0>   proc_lib:init_p/5 '-' 281845  13752 
 0 sext:decode_binary/3
<6175..0>   proc_lib:init_p/5 '-' 209113  21648 
 0 sext:decode_binary/3
<6175.12219.6>  proc_lib:init_p/5 '-' 168832   16829200 
 0 riak_client:wait_for_query_results/4
<6175.8403.0>   proc_lib:init_p/5 '-' 13  13880 
 1 eleveldb:iterator_move/2
<6175.8813.0>   proc_lib:init_p/5 '-' 119548   9000 
 1 eleveldb:iterator/3
<6175.8411.0>   proc_lib:init_p/5 '-' 115759  34472 
 0 riak_kv_vnode:'-result_fun_ack/2-fun-0-'
<6175.5679.0>   proc_lib:init_p/5 '-' 109577   8952 
 0 riak_kv_vnode:'-result_fun_ack/2-fun-0-'
Output server crashed: connection_lost

Based on that, is there anything anyone can think to do to try to bring 
performance back in to the land of usability? Does this thing appear to be 
something that may have been resolved in 1.4.6 or 1.4.7?

Only thing we can think of at this point might be to remove or force remove the 
member and join in a new freshly built one, but last time we attempted that (on 
a different cluster) our secondary indexes got irreparably damaged and only 
regained consistency when we copied every individual key to (this) new cluster! 
Not a good experience :( but i’m hopeful that 1.4.6 may have addressed some of 
our issues.

Any help is appreciated.

Thank you,
Sean McKibben


___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Single node causing cluster to be extremely slow (leveldb)

2014-01-10 Thread Sean McKibben
We need all the results right away anyway, so we don't paginate, so
once we get to 1.4.6+, being able to skip sorting ought to return some
speed to us (and maybe we will leave +S at 6:6). With our small ring
size and SSDs we see 3M keys returning in about 120 sec. While that
case isn't rare, there are only a handful of queries we run that
return over 1M. Will be interesting to compare the speed of unordered
result sets in 1.4.6.

So far, we did run into one case for a few days where some servers had
gotten some 2i corruption and were returning subsets. We had to make
multiple simultaneous requests and union the result sets to
compensate. Luckily we completely migrated (manually) to a new cluster
soon after, which resolved the issue. The additions to 1.4.6 seem like
they will be very helpful, should we encounter something similar
again.

I realize we use 2i in an atypical way, but by using meaningful keys
it is the fastest solution we've come across for retrieval of a set of
high churn, tag-indexed keys that won't fit in RAM. We do hope that
Yokozuna may replace 2i for us in a more horizontally-scalable way
with 2.0, but we haven't yet tested with that.

Thanks,
Sean

> On Jan 10, 2014, at 7:09 AM, Matthew Von-Maszewski  wrote:
>
> Sean,
>
> Also you mentioned concern about +S 6:6.  2i queries in 1.4 added "sorting".  
> Another heavy 2i user noticed that the sorting need more CPU for Erlang.  
> They were happier after removing the +S.
>
> And finally, those 2i queries that return "millions of results" … how long do 
> those queries take to execute?
>
> Matthew
>
>> On Jan 9, 2014, at 9:33 PM, Sean McKibben  wrote:
>>
>> We have a 5 node cluster using elevelDB (1.4.2) and 2i, and this afternoon 
>> it started responding extremely slowly. CPU on member 4 was extremely high 
>> and we restarted that process, but it didn’t help. We temporarily shut down 
>> member 4 and cluster speed returned to normal, but as soon as we boot member 
>> 4 back up, the cluster performance goes to shit.
>>
>> We’ve run in to this before but were able to just start with a fresh set of 
>> data after wiping machines as it was before we migrated to this bare-metal 
>> cluster. Now it is causing some pretty significant issues and we’re not sure 
>> what we can do to get it back to normal, many of our queues are filling up 
>> and we’ll probably have to take node 4 off again just so we can provide a 
>> regular quality of service.
>>
>> We’ve turned off AAE on node 4 but it hasn’t helped. We have some transfers 
>> that need to happen but they are going very slowly.
>>
>> 'riak-admin top’ on node 4 reports this:
>> Load:  cpu   610   Memory:  total  503852binary 
>> 231544
>>   procs 804processes  179850code
>> 11588
>>   runq  134atom  533ets  
>> 4581
>>
>> Pid Name or Initial Func Time   Reds Memory  
>>  MsgQ Current Function
>> ---
>> <6175.29048.3>  proc_lib:init_p/5 '-' 462231   51356760  
>> 0 mochijson2:json_bin_is_safe/1
>> <6175.12281.6>  proc_lib:init_p/5 '-' 307183   64195856  
>> 1 gen_fsm:loop/7
>> <6175.1581.5>   proc_lib:init_p/5 '-' 286143   41085600  
>> 0 mochijson2:json_bin_is_safe/1
>> <6175.6659.0>   proc_lib:init_p/5 '-' 281845  13752  
>> 0 sext:decode_binary/3
>> <6175..0>   proc_lib:init_p/5 '-' 209113  21648  
>> 0 sext:decode_binary/3
>> <6175.12219.6>  proc_lib:init_p/5 '-' 168832   16829200  
>> 0 riak_client:wait_for_query_results/4
>> <6175.8403.0>   proc_lib:init_p/5 '-' 13  13880  
>> 1 eleveldb:iterator_move/2
>> <6175.8813.0>   proc_lib:init_p/5 '-' 119548   9000  
>> 1 eleveldb:iterator/3
>> <6175.8411.0>   proc_lib:init_p/5 '-' 115759  34472  
>> 0 riak_kv_vnode:'-result_fun_ack/2-fun-0-'
>> <6175.5679.0>   proc_lib:init_p/5 '-' 109577   8952  
>> 0 riak_kv_vnode:'-result_fun_ack/2-fun-0-'
>> Output server crashed: connection_lost
>>
>> B

Re: Single node causing cluster to be extremely slow (leveldb)

2014-01-10 Thread Sean McKibben
Excellent and informative explanation, thank you very much. We’re very happy 
that our adjustments have returned the cluster to its normal operating 
parameters. Also glad that Riak 2 will be handling this stuff programmatically, 
as prior to your spreadsheet and explanation it was pure voodoo for us. I think 
the automation will significantly decrease the number of animal sacrifices 
needed to appease the levelDB gods! :)

Sean McKibben


On Jan 10, 2014, at 9:18 AM, Matthew Von-Maszewski  wrote:

> Attached is the spreadsheet I used for deriving the cache_size and 
> max_open_files.  The general guidelines of the spreadsheet are:
> 
> vnode count:  ring size divided by (number of nodes minus one)
> write_buf_min/max:  don't touch … you will screw up my leveldb tuning
> cache_size:  8Mbytes is hard minimum
> max_open_files:  this is NOT a file count in 1.4.  It is 4Mbytes times the 
> value.  File cache is meta-data size based, not file count.
> 
> lower cache_size and raise max_open_files as necessary to keep "remaining" 
> close to zero AND cover your total file metadata size
> 
> What is file metadata size? I looked at one vnode's LOG file for rough 
> estimates:
> 
> - Your total file count was 1,479 in one vnode
> - You typically hit the 75,000 key limit
> - Key count (75,000) divided into a typical file size is 496 bytes … used 496 
> as average value size
> - Block_size is 4096.  496 value size goes into block size about 10 times (no 
> need for fractions since block_size is a threshold, not fixed value)
> - 75,000 total keys in file, 10 keys per block … that means 7,500 keys in 
> file's index … 100 bytes per key is 750,000 bytes of keys in index.
> - bloom filter is 2 bytes per key (all 75,000 keys) or 150,00 bytes
> - metadata loaded into file cache is therefore 750,000 + 150,000 bytes per 
> file or 900,000 bytes.
> - 900,000 bytes per file times 1,479 files is 1,331,100,000 bytes of file 
> cache needed …
> 
> Your original 315 max_open_files is 1,279,262,720 in size (315 * 4Mbytes) … 
> file cache is thrashing since 1,279,262,720 is less than 1,331,100,000.
> 
> I told you 425 as a max_open_files setting, spreadsheet has 400 as more 
> conservative number.
> 
> Matthew
> 
> 
> 
> On Jan 10, 2014, at 9:41 AM, Martin May  wrote:
> 
>> Hi Matthew,
>> 
>> We applied this change to node 4, started it up, and it seems much happier 
>> (no crazy CPU). We’re going to keep an eye on it for a little while, and 
>> then apply this setting to all the other nodes as well.
>> 
>> Is there anything we can do to prevent this scenario in the future, or 
>> should the settings you suggested take care of that?
>> 
>> Thanks,
>> Martin
>> 
>> On Jan 10, 2014, at 6:42 AM, Matthew Von-Maszewski  
>> wrote:
>> 
>>> Sean,
>>> 
>>> I did some math based upon the app.config and LOG files.  I am guessing 
>>> that you are starting to thrash your file cache.
>>> 
>>> This theory should be easy to prove / disprove.  On that one node, change 
>>> the cache_size and max_open_files to:
>>> 
>>> cache_size 68435456
>>> max_open_files 425
>>> 
>>> If I am correct, the node should come up and not cause problems.  We are 
>>> trading block cache space for file cache space.  A miss in the file cache 
>>> is far more costly than a miss in the block cache.
>>> 
>>> Let me know how this works for you.  It is possible that we might want to 
>>> talk about raising your block size slightly to reduce file cache overhead.
>>> 
>>> Matthew
>>> 
>>> On Jan 9, 2014, at 9:33 PM, Sean McKibben  wrote:
>>> 
>>>> We have a 5 node cluster using elevelDB (1.4.2) and 2i, and this afternoon 
>>>> it started responding extremely slowly. CPU on member 4 was extremely high 
>>>> and we restarted that process, but it didn’t help. We temporarily shut 
>>>> down member 4 and cluster speed returned to normal, but as soon as we boot 
>>>> member 4 back up, the cluster performance goes to shit.
>>>> 
>>>> We’ve run in to this before but were able to just start with a fresh set 
>>>> of data after wiping machines as it was before we migrated to this 
>>>> bare-metal cluster. Now it is causing some pretty significant issues and 
>>>> we’re not sure what we can do to get it back to normal, many of our queues 
>>>> are filling up and we’ll probably have to take node 4 off again just so we 
>>>> can provide a regular quality of service.
>>>&

Re: Riak Search and Yokozuna Backup Strategy

2014-01-21 Thread Sean McKibben
+1 LevelDB backup information is important to us


On Jan 20, 2014, at 4:38 PM, Elias Levy  wrote:

> Anyone from Basho care to comment?
> 
> 
> On Thu, Jan 16, 2014 at 10:19 AM, Elias Levy  
> wrote:
> 
> Also, while LevelDB appears to be largely an append only format, the 
> documentation currently does not recommend live backups, presumably because 
> there are some issues that can crop up if restoring a DB that was not cleanly 
> shutdown.  
> 
> I am guessing those issues are the ones documented as edge cases here: 
> https://github.com/basho/leveldb/wiki/repair-notes
> 
> That said, it looks like as of 1.4 those are largely cleared up, at least 
> from what I gather from that page, and that one must only ensure that data is 
> copied in a certain order and that you run the LevelDB repair algorithm when 
> retiring the files.  
> 
> So is the backup documentation on LevelDB still correct?  Will Basho will 
> enable hot backups on LevelDB backends any time soon?
> 
> 
> 
> On Thu, Jan 16, 2014 at 10:05 AM, Elias Levy  
> wrote:
> How well does Riak Search play with backups?  Can you backup the Riak Search 
> data without bringing the node down?
> 
> The Riak documentation backup page is completely silent on Riak Search and 
> its merge_index backend.
> 
> And looking forward, what is the backup strategy for Yokozuna?  Will it make 
> use of Solr's Replication Handler, or something more lower level?  Will the 
> node need to be offline to backup it up?
> 
> 
> 
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Ruby eventmachine async client

2012-03-15 Thread Sean McKibben
Given Mathias Meyer's talk at Scotland Ruby Conference about eventmachine 
programming (while wearing a Riak t-shirt!), I was hoping to see a little bit 
more in terms of eventmachine clients and Riak.

Has anyone used EventMachine and/or em-synchrony with Riak and could give me 
some advice? I'm using ruby-riak-client at this point and going the route of 
trying to wrap my workflow in fibers so riak client plays nice with it.

Am I better off just using a HTTP client like EM::HttpRequest or 
EM::Synchrony::Multi, or is there some good way to use ruby-riak-client or 
ripple with eventmachine that requires less manual intervention?

Sorry if this has been covered somewhere else but I haven't had much luck 
finding anyone else using EM with Riak.

Thanks,
Sean McKibben
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com