Re: [ANN] Ruby Client v1.1.1 Release
I'm wondering if we could get a 1.1.2 version bump pretty soon. Not being able to do 2i over PBC with 1.1.1 is rather painful and I kind of need a released version to send it to production. Thanks, Sean McKibben On Jan 10, 2013, at 2:05 PM, Sean Cribbs wrote: > Hey riak-users, > > Today we released the Ruby Riak Client (riak-client gem), version > 1.1.1. The only change from version 1.1.0 was a fix for older > patchlevels of Ruby 1.8.7 (before p315) that had a bug in Net::HTTP. > We encountered this bug when testing the upcoming Riak 1.3 release on > Ubuntu 10.04LTS, which has a maximum Ruby version of 1.8.7p249. If you > are on one of those old versions, it is definitely recommended to > upgrade to a later patchlevel or Ubuntu release and avoid this bug > altogether. > > Cheers, > > -- > Sean Cribbs > Software Engineer > Basho Technologies, Inc. > http://basho.com/ > > ___ > riak-users mailing list > riak-users@lists.basho.com > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: [ANN] Ruby Client v1.1.1 Release
IIRC it was fixed in 1.1.0 but came back in 1.1.1. I did a simple test just now and sanitized it for this gist: https://gist.github.com/graphex/5305274 I also have to store items with ? in their keys, which has issues over http (noted in https://github.com/basho/riak-ruby-client/issues/80 ) So I have to have two different clients at all times, one PBC so i can write ? in keys, and one HTTP so I can get 2i queries back… Sean On Apr 3, 2013, at 2:51 PM, Sean Cribbs wrote: > Can you clarify the problem you're having? That feature was merged in > July, long before the 1.1.0 release: > https://github.com/basho/riak-ruby-client/commit/4fe52756d7df6ee494bfbc40552ec017f3ff4da4 > > On Wed, Apr 3, 2013 at 3:35 PM, Sean McKibben wrote: >> I'm wondering if we could get a 1.1.2 version bump pretty soon. Not being >> able to do 2i over PBC with 1.1.1 is rather painful and I kind of need a >> released version to send it to production. >> >> Thanks, >> Sean McKibben >> >> On Jan 10, 2013, at 2:05 PM, Sean Cribbs wrote: >> >>> Hey riak-users, >>> >>> Today we released the Ruby Riak Client (riak-client gem), version >>> 1.1.1. The only change from version 1.1.0 was a fix for older >>> patchlevels of Ruby 1.8.7 (before p315) that had a bug in Net::HTTP. >>> We encountered this bug when testing the upcoming Riak 1.3 release on >>> Ubuntu 10.04LTS, which has a maximum Ruby version of 1.8.7p249. If you >>> are on one of those old versions, it is definitely recommended to >>> upgrade to a later patchlevel or Ubuntu release and avoid this bug >>> altogether. >>> >>> Cheers, >>> >>> -- >>> Sean Cribbs >>> Software Engineer >>> Basho Technologies, Inc. >>> http://basho.com/ >>> >>> ___ >>> riak-users mailing list >>> riak-users@lists.basho.com >>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com >> > > > > -- > Sean Cribbs > Software Engineer > Basho Technologies, Inc. > http://basho.com/ ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
2i timeouts in 1.4
We just upgraded to 1.4 and are having a big problem with some of our larger 2i queries. We have a few key queries that takes longer than 60 seconds (usually about 110 seconds) to execute, but after going to 1.4 we can't seem to get around a 60 second timeout. I've tried: curl -H "X-Riak-Timeout: 26" "http://127.0.0.1:8098/buckets/mybucket/index/test_bin/myval?x-riak-timeout=26"; -i But I always get HTTP/1.1 500 Internal Server Error Vary: Accept-Encoding Server: MochiWeb/1.1 WebMachine/1.10.0 (never breaks eye contact) Date: Fri, 26 Jul 2013 21:41:28 GMT Content-Type: text/html Content-Length: 265 Connection: close 500 Internal Server ErrorInternal Server ErrorThe server encountered an error while processing this request:{error,{error,timeout}}mochiweb+webmachine web server Right at the 60 second mark. What can I set to give my secondary index queries more time?? This is causing major problems for us :( Sean___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: 2i timeouts in 1.4
I should have mentioned that I also tried: curl -H "X-Riak-Timeout: 26" "http://127.0.0.1:8098/buckets/mybucket/index/test_bin/myval?timeout=26"; -i but still receive the 500 error below exactly at the 60 second mark. Is this a bug? Secondary to getting this working at all, is this documented anywhere? and any way to set this timeout using the ruby riak client? Stream may well work, but I'm going to have to make a number of changes on the client side to deal with the results. Sean On Jul 26, 2013, at 3:53 PM, Brian Roach wrote: > Sean - > > The timeout isn't via a header, it's a query param -> &timeout= > > You can also use stream=true to stream the results. > > - Roach > > Sent from my iPhone > > On Jul 26, 2013, at 3:43 PM, Sean McKibben wrote: > >> We just upgraded to 1.4 and are having a big problem with some of our larger >> 2i queries. We have a few key queries that takes longer than 60 seconds >> (usually about 110 seconds) to execute, but after going to 1.4 we can't seem >> to get around a 60 second timeout. >> >> I've tried: >> curl -H "X-Riak-Timeout: 26" >> "http://127.0.0.1:8098/buckets/mybucket/index/test_bin/myval?x-riak-timeout=26"; >> -i >> >> But I always get >> HTTP/1.1 500 Internal Server Error >> Vary: Accept-Encoding >> Server: MochiWeb/1.1 WebMachine/1.10.0 (never breaks eye contact) >> Date: Fri, 26 Jul 2013 21:41:28 GMT >> Content-Type: text/html >> Content-Length: 265 >> Connection: close >> >> 500 Internal Server >> ErrorInternal Server ErrorThe server >> encountered an error while processing this >> request:{error,{error,timeout}}mochiweb+webmachine >> web server >> >> Right at the 60 second mark. What can I set to give my secondary index >> queries more time?? >> >> This is causing major problems for us :( >> >> Sean >> ___ >> riak-users mailing list >> riak-users@lists.basho.com >> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: 2i timeouts in 1.4
Thank you for looking in to this. This is a major problem for our production cluster, and we're in a bit of a bind right now trying to figure out a workaround in the interim. It sounds like maybe a mapreduce might handle the timeout properly, so hopefully we can do that in the meantime. If there is any way we can have a hotfix ASAP though, that would be preferable. Certainly would not be a problem for us to edit a value in the config file (and given the lack of support in the ruby client for the timeout setting, the ability to edit the global default would be preferred). In the ruby client i had to monkeypatch it like this to even submit the timeout value, which is not ideal: module Riak class Client class HTTPBackend def get_index(bucket, index, query) bucket = bucket.name if Bucket === bucket path = case query when Range raise ArgumentError, t('invalid_index_query', :value => query.inspect) unless String === query.begin || Integer === query.end index_range_path(bucket, index, query.begin, query.end) when String, Integer index_eq_path(bucket, index, query, 'timeout' => '26') else raise ArgumentError, t('invalid_index_query', :value => query.inspect) end response = get(200, path) JSON.parse(response[:body])['keys'] end end end end Thanks for the update, Sean On Jul 26, 2013, at 4:49 PM, Russell Brown wrote: > Hi Sean, > I'm very sorry to say that you've found a featurebug. > > There was a fix put in here https://github.com/basho/riak_core/pull/332 > > But that means that the default timeout of 60 seconds is now honoured. In the > past it was not. > > As far as I can see the 2i endpoint never accepted a timeout argument, and it > still does not. > > The fix would be to add the timeout to the 2i API endpoints, and I'll do that > straight away. > > In the meantime, I wonder if streaming the results would help, or if you'd > still hit the overall timeout? > > Very sorry that you've run into this. Let me know if streaming helps, I've > raised an issue here[1] if you want to track this bug > > Cheers > > Russell > > [1] https://github.com/basho/riak_kv/issues/610 > > > On 26 Jul 2013, at 17:59, Sean McKibben wrote: > >> I should have mentioned that I also tried: >> curl -H "X-Riak-Timeout: 26" >> "http://127.0.0.1:8098/buckets/mybucket/index/test_bin/myval?timeout=26"; >> -i >> but still receive the 500 error below exactly at the 60 second mark. Is this >> a bug? >> >> Secondary to getting this working at all, is this documented anywhere? and >> any way to set this timeout using the ruby riak client? >> >> Stream may well work, but I'm going to have to make a number of changes on >> the client side to deal with the results. >> >> Sean >> >> On Jul 26, 2013, at 3:53 PM, Brian Roach wrote: >> >>> Sean - >>> >>> The timeout isn't via a header, it's a query param -> &timeout= >>> >>> You can also use stream=true to stream the results. >>> >>> - Roach >>> >>> Sent from my iPhone >>> >>> On Jul 26, 2013, at 3:43 PM, Sean McKibben wrote: >>> >>>> We just upgraded to 1.4 and are having a big problem with some of our >>>> larger 2i queries. We have a few key queries that takes longer than 60 >>>> seconds (usually about 110 seconds) to execute, but after going to 1.4 we >>>> can't seem to get around a 60 second timeout. >>>> >>>> I've tried: >>>> curl -H "X-Riak-Timeout: 26" >>>> "http://127.0.0.1:8098/buckets/mybucket/index/test_bin/myval?x-riak-timeout=26"; >>>> -i >>>> >>>> But I always get >>>> HTTP/1.1 500 Internal Server Error >>>> Vary: Accept-Encoding >>>> Server: MochiWeb/1.1 WebMachine/1.10.0 (never breaks eye contact) >>>> Date: Fri, 26 Jul 2013 21:41:28 GMT >>>> Content-Type: text/html >>>> Content-Length: 265 >>>> Connection: close >>>> >>>> 500 Internal Server >>>> ErrorInternal Server ErrorThe server >>>> encountered an error while processing this >>>> request:{error,{error,timeout}}mochiweb+webmachine >>>> web server >>>> >>>> Right at the 60 second mark. What can I set to give my secondary index >>>> queries more time?? >>>> >>>> This is causing major problems for us :( >>>> >>>> Sean >>>> ___ >>>> riak-users mailing list >>>> riak-users@lists.basho.com >>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com >> >> ___ >> riak-users mailing list >> riak-users@lists.basho.com >> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: 2i timeouts in 1.4
So when I try to use pagination, it doesn't seem to be picking up my continuation. I'm having trouble parsing the json I get back using stream=true (and there is still a timeout) so I went to just using pagination. Perhaps I'm doing it wrong, (likely, it has been a long day) but riak seems to be ignoring my continuation: (pardon the sanitization) curl 'http://127.0.0.1:8098/buckets/mybucket/index/test_bin/myval?max_results=5' {"keys":["1","2","3","4","5"],"continuation":"g20AAABAMDAwMDE1ZWVjMmNiZjY3Y2Y4YmU3ZTVkMWNiZTVjM2ZkYjg2YWU0MGIwNzNjMTE3NDYyZjEzMTNlMDQ5YmI2ZQ=="} curl 'http://127.0.0.1:8098/buckets/mybucket/index/test_bin/myval?max_results=5&continuation=g20AAABAMDAwMDE1ZWVjMmNiZjY3Y2Y4YmU3ZTVkMWNiZTVjM2ZkYjg2YWU0MGIwNzNjMTE3NDYyZjEzMTNlMDQ5YmI2ZQ==' {"keys":["1","2","3","4","5"],"continuation":"g20AAABAMDAwMDE1ZWVjMmNiZjY3Y2Y4YmU3ZTVkMWNiZTVjM2ZkYjg2YWU0MGIwNzNjMTE3NDYyZjEzMTNlMDQ5YmI2ZQ=="} The same keys and continuation value are returned regardless of whether my request contains a continuation value. I've tried swapping the order of max_results and continuation without any luck. I also made sure that my continuation value was url encoded. Hopefully I'm not missing something obvious here. Well, come to think of it, hopefully I am missing something obvious! Sean On Jul 26, 2013, at 6:43 PM, Russell Brown wrote: > For a work around you could use streaming and pagination. > > Request smaller pages of data (i.e. sub 60 seconds worth) and use streaming > to get the results to your client sooner. > > In HTTP this would look like > > http://127.0.0.1:8098/buckets/mybucket/index/test_bin/myval?max_results=1&stream=true > > your results will include a continuation like > >{"continuation":"g2gCYgAAFXttBDU0OTk="} > > and you can use that to get the next N results. Breaking your query up that > way should duck the timeout. > > Furthermore, adding &stream=true will mean the first results is received very > rapidly. > > I don't think the Ruby client is up to date for the new 2i features, but you > could monkeypatch as before. > > Cheers > > Russell > > On 26 Jul 2013, at 19:00, Sean McKibben wrote: > >> Thank you for looking in to this. This is a major problem for our production >> cluster, and we're in a bit of a bind right now trying to figure out a >> workaround in the interim. It sounds like maybe a mapreduce might handle the >> timeout properly, so hopefully we can do that in the meantime. >> If there is any way we can have a hotfix ASAP though, that would be >> preferable. Certainly would not be a problem for us to edit a value in the >> config file (and given the lack of support in the ruby client for the >> timeout setting, the ability to edit the global default would be preferred). >> In the ruby client i had to monkeypatch it like this to even submit the >> timeout value, which is not ideal: >> >> module Riak >> class Client >> class HTTPBackend >> def get_index(bucket, index, query) >> bucket = bucket.name if Bucket === bucket >> path = case query >> when Range >>raise ArgumentError, t('invalid_index_query', :value => >> query.inspect) unless String === query.begin || Integer === query.end >>index_range_path(bucket, index, query.begin, query.end) >> when String, Integer >>index_eq_path(bucket, index, query, 'timeout' => '26') >> else >>raise ArgumentError, t('invalid_index_query', :value => >> query.inspect) >> end >> response = get(200, path) >> JSON.parse(response[:body])['keys'] >> end >> end >> end >> end >> >> Thanks for the update, >> Sean >> >> >> >> On Jul 26, 2013, at 4:49 PM, Russell Brown wrote: >> >>> Hi Sean, >>> I'm very sorry to say that you've found a featurebug. >>> >>> There was a fix put in here https://github.com/basho/riak_core/pull/332 >>> >>> But that means that the default timeout of 60 seconds is now honoured. In >>> the past it was not. >>> >>> As far as I can see the 2i endpoint never accepted a timeout argument, and >>> it still does not. >>> >>> The fix would be to add the timeout to the 2i API endpoints, and I
Re: Keys that won't disappear from indexes
This same thing is happening to me, where both $bucket index and my own custom indexes are returning keys that have been deleted and I can’t remove them. I am hoping there is a way to fix this as it is causing significant problems for us in production. It seems to be happening with some frequency, and every once in a while an index will just go bad completely and either return subsets of what it should return (even with a healthy cluster), or keys that have been deleted. I had a case yesterday where $bucket was returning 6 keys that my custom all-inclusive index wasn’t returning. They all produced 404s when I tried to retrieve a value. I was hoping read repair would repair the index when a 404 occurred, or at least AAE might pick it up, but as it stands now, is there any way corrupted indexes like this can ever get back to normal? Sean On Nov 4, 2013, at 9:44 AM, Evan Vigil-McClanahan wrote: > Hi Toby. > > It's possible, since they're stored separately, that the objects were > deleted but the indices were left in place because of some error (e.g. > the operation failed for some reason between the object removal and > the index removal). One of the things on the feature list for the > next release is AAE of index values, which should take care of this > case. This is really rare, but not unknown. It'd be interesting to > know how you ended up with so many. > > In the mean time, the only way I can think of to get rid of them > (other than deleting them from the console, which would require taking > nodes down and a lot of manual effort), would be to write another > value that would have the same index, then delete it, which should > normally succeed. > > I'll ask around to see if there is anything that might work better. > > On Sun, Nov 3, 2013 at 7:40 PM, Toby Corkindale > wrote: >> On 01/11/13 14:04, Toby Corkindale wrote: >>> >>> Hi, >>> I have around 5000 keys which just won't die. >>> No matter how many times I delete them, they still show up in the 2i >>> $bucket=_ index. >>> >>> Actually attempting to retrieve the keys results in a not-found - even >>> >>> if I've requested that tombstones be returned. >>> >>> I'm interested to know what is going on here? >> >> >> Anyone? >> >> Should I report this as a bug against 1.4.2? >> >> >> ___ >> riak-users mailing list >> riak-users@lists.basho.com >> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > > ___ > riak-users mailing list > riak-users@lists.basho.com > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Single node causing cluster to be extremely slow (leveldb)
We have a 5 node cluster using elevelDB (1.4.2) and 2i, and this afternoon it started responding extremely slowly. CPU on member 4 was extremely high and we restarted that process, but it didn’t help. We temporarily shut down member 4 and cluster speed returned to normal, but as soon as we boot member 4 back up, the cluster performance goes to shit. We’ve run in to this before but were able to just start with a fresh set of data after wiping machines as it was before we migrated to this bare-metal cluster. Now it is causing some pretty significant issues and we’re not sure what we can do to get it back to normal, many of our queues are filling up and we’ll probably have to take node 4 off again just so we can provide a regular quality of service. We’ve turned off AAE on node 4 but it hasn’t helped. We have some transfers that need to happen but they are going very slowly. 'riak-admin top’ on node 4 reports this: Load: cpu 610 Memory: total 503852binary 231544 procs 804processes 179850code 11588 runq 134atom 533ets 4581 Pid Name or Initial Func Time Reds Memory MsgQ Current Function --- <6175.29048.3> proc_lib:init_p/5 '-' 462231 51356760 0 mochijson2:json_bin_is_safe/1 <6175.12281.6> proc_lib:init_p/5 '-' 307183 64195856 1 gen_fsm:loop/7 <6175.1581.5> proc_lib:init_p/5 '-' 286143 41085600 0 mochijson2:json_bin_is_safe/1 <6175.6659.0> proc_lib:init_p/5 '-' 281845 13752 0 sext:decode_binary/3 <6175..0> proc_lib:init_p/5 '-' 209113 21648 0 sext:decode_binary/3 <6175.12219.6> proc_lib:init_p/5 '-' 168832 16829200 0 riak_client:wait_for_query_results/4 <6175.8403.0> proc_lib:init_p/5 '-' 13 13880 1 eleveldb:iterator_move/2 <6175.8813.0> proc_lib:init_p/5 '-' 119548 9000 1 eleveldb:iterator/3 <6175.8411.0> proc_lib:init_p/5 '-' 115759 34472 0 riak_kv_vnode:'-result_fun_ack/2-fun-0-' <6175.5679.0> proc_lib:init_p/5 '-' 109577 8952 0 riak_kv_vnode:'-result_fun_ack/2-fun-0-' Output server crashed: connection_lost Based on that, is there anything anyone can think to do to try to bring performance back in to the land of usability? Does this thing appear to be something that may have been resolved in 1.4.6 or 1.4.7? Only thing we can think of at this point might be to remove or force remove the member and join in a new freshly built one, but last time we attempted that (on a different cluster) our secondary indexes got irreparably damaged and only regained consistency when we copied every individual key to (this) new cluster! Not a good experience :( but i’m hopeful that 1.4.6 may have addressed some of our issues. Any help is appreciated. Thank you, Sean McKibben ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Single node causing cluster to be extremely slow (leveldb)
We need all the results right away anyway, so we don't paginate, so once we get to 1.4.6+, being able to skip sorting ought to return some speed to us (and maybe we will leave +S at 6:6). With our small ring size and SSDs we see 3M keys returning in about 120 sec. While that case isn't rare, there are only a handful of queries we run that return over 1M. Will be interesting to compare the speed of unordered result sets in 1.4.6. So far, we did run into one case for a few days where some servers had gotten some 2i corruption and were returning subsets. We had to make multiple simultaneous requests and union the result sets to compensate. Luckily we completely migrated (manually) to a new cluster soon after, which resolved the issue. The additions to 1.4.6 seem like they will be very helpful, should we encounter something similar again. I realize we use 2i in an atypical way, but by using meaningful keys it is the fastest solution we've come across for retrieval of a set of high churn, tag-indexed keys that won't fit in RAM. We do hope that Yokozuna may replace 2i for us in a more horizontally-scalable way with 2.0, but we haven't yet tested with that. Thanks, Sean > On Jan 10, 2014, at 7:09 AM, Matthew Von-Maszewski wrote: > > Sean, > > Also you mentioned concern about +S 6:6. 2i queries in 1.4 added "sorting". > Another heavy 2i user noticed that the sorting need more CPU for Erlang. > They were happier after removing the +S. > > And finally, those 2i queries that return "millions of results" … how long do > those queries take to execute? > > Matthew > >> On Jan 9, 2014, at 9:33 PM, Sean McKibben wrote: >> >> We have a 5 node cluster using elevelDB (1.4.2) and 2i, and this afternoon >> it started responding extremely slowly. CPU on member 4 was extremely high >> and we restarted that process, but it didn’t help. We temporarily shut down >> member 4 and cluster speed returned to normal, but as soon as we boot member >> 4 back up, the cluster performance goes to shit. >> >> We’ve run in to this before but were able to just start with a fresh set of >> data after wiping machines as it was before we migrated to this bare-metal >> cluster. Now it is causing some pretty significant issues and we’re not sure >> what we can do to get it back to normal, many of our queues are filling up >> and we’ll probably have to take node 4 off again just so we can provide a >> regular quality of service. >> >> We’ve turned off AAE on node 4 but it hasn’t helped. We have some transfers >> that need to happen but they are going very slowly. >> >> 'riak-admin top’ on node 4 reports this: >> Load: cpu 610 Memory: total 503852binary >> 231544 >> procs 804processes 179850code >> 11588 >> runq 134atom 533ets >> 4581 >> >> Pid Name or Initial Func Time Reds Memory >> MsgQ Current Function >> --- >> <6175.29048.3> proc_lib:init_p/5 '-' 462231 51356760 >> 0 mochijson2:json_bin_is_safe/1 >> <6175.12281.6> proc_lib:init_p/5 '-' 307183 64195856 >> 1 gen_fsm:loop/7 >> <6175.1581.5> proc_lib:init_p/5 '-' 286143 41085600 >> 0 mochijson2:json_bin_is_safe/1 >> <6175.6659.0> proc_lib:init_p/5 '-' 281845 13752 >> 0 sext:decode_binary/3 >> <6175..0> proc_lib:init_p/5 '-' 209113 21648 >> 0 sext:decode_binary/3 >> <6175.12219.6> proc_lib:init_p/5 '-' 168832 16829200 >> 0 riak_client:wait_for_query_results/4 >> <6175.8403.0> proc_lib:init_p/5 '-' 13 13880 >> 1 eleveldb:iterator_move/2 >> <6175.8813.0> proc_lib:init_p/5 '-' 119548 9000 >> 1 eleveldb:iterator/3 >> <6175.8411.0> proc_lib:init_p/5 '-' 115759 34472 >> 0 riak_kv_vnode:'-result_fun_ack/2-fun-0-' >> <6175.5679.0> proc_lib:init_p/5 '-' 109577 8952 >> 0 riak_kv_vnode:'-result_fun_ack/2-fun-0-' >> Output server crashed: connection_lost >> >> B
Re: Single node causing cluster to be extremely slow (leveldb)
Excellent and informative explanation, thank you very much. We’re very happy that our adjustments have returned the cluster to its normal operating parameters. Also glad that Riak 2 will be handling this stuff programmatically, as prior to your spreadsheet and explanation it was pure voodoo for us. I think the automation will significantly decrease the number of animal sacrifices needed to appease the levelDB gods! :) Sean McKibben On Jan 10, 2014, at 9:18 AM, Matthew Von-Maszewski wrote: > Attached is the spreadsheet I used for deriving the cache_size and > max_open_files. The general guidelines of the spreadsheet are: > > vnode count: ring size divided by (number of nodes minus one) > write_buf_min/max: don't touch … you will screw up my leveldb tuning > cache_size: 8Mbytes is hard minimum > max_open_files: this is NOT a file count in 1.4. It is 4Mbytes times the > value. File cache is meta-data size based, not file count. > > lower cache_size and raise max_open_files as necessary to keep "remaining" > close to zero AND cover your total file metadata size > > What is file metadata size? I looked at one vnode's LOG file for rough > estimates: > > - Your total file count was 1,479 in one vnode > - You typically hit the 75,000 key limit > - Key count (75,000) divided into a typical file size is 496 bytes … used 496 > as average value size > - Block_size is 4096. 496 value size goes into block size about 10 times (no > need for fractions since block_size is a threshold, not fixed value) > - 75,000 total keys in file, 10 keys per block … that means 7,500 keys in > file's index … 100 bytes per key is 750,000 bytes of keys in index. > - bloom filter is 2 bytes per key (all 75,000 keys) or 150,00 bytes > - metadata loaded into file cache is therefore 750,000 + 150,000 bytes per > file or 900,000 bytes. > - 900,000 bytes per file times 1,479 files is 1,331,100,000 bytes of file > cache needed … > > Your original 315 max_open_files is 1,279,262,720 in size (315 * 4Mbytes) … > file cache is thrashing since 1,279,262,720 is less than 1,331,100,000. > > I told you 425 as a max_open_files setting, spreadsheet has 400 as more > conservative number. > > Matthew > > > > On Jan 10, 2014, at 9:41 AM, Martin May wrote: > >> Hi Matthew, >> >> We applied this change to node 4, started it up, and it seems much happier >> (no crazy CPU). We’re going to keep an eye on it for a little while, and >> then apply this setting to all the other nodes as well. >> >> Is there anything we can do to prevent this scenario in the future, or >> should the settings you suggested take care of that? >> >> Thanks, >> Martin >> >> On Jan 10, 2014, at 6:42 AM, Matthew Von-Maszewski >> wrote: >> >>> Sean, >>> >>> I did some math based upon the app.config and LOG files. I am guessing >>> that you are starting to thrash your file cache. >>> >>> This theory should be easy to prove / disprove. On that one node, change >>> the cache_size and max_open_files to: >>> >>> cache_size 68435456 >>> max_open_files 425 >>> >>> If I am correct, the node should come up and not cause problems. We are >>> trading block cache space for file cache space. A miss in the file cache >>> is far more costly than a miss in the block cache. >>> >>> Let me know how this works for you. It is possible that we might want to >>> talk about raising your block size slightly to reduce file cache overhead. >>> >>> Matthew >>> >>> On Jan 9, 2014, at 9:33 PM, Sean McKibben wrote: >>> >>>> We have a 5 node cluster using elevelDB (1.4.2) and 2i, and this afternoon >>>> it started responding extremely slowly. CPU on member 4 was extremely high >>>> and we restarted that process, but it didn’t help. We temporarily shut >>>> down member 4 and cluster speed returned to normal, but as soon as we boot >>>> member 4 back up, the cluster performance goes to shit. >>>> >>>> We’ve run in to this before but were able to just start with a fresh set >>>> of data after wiping machines as it was before we migrated to this >>>> bare-metal cluster. Now it is causing some pretty significant issues and >>>> we’re not sure what we can do to get it back to normal, many of our queues >>>> are filling up and we’ll probably have to take node 4 off again just so we >>>> can provide a regular quality of service. >>>&
Re: Riak Search and Yokozuna Backup Strategy
+1 LevelDB backup information is important to us On Jan 20, 2014, at 4:38 PM, Elias Levy wrote: > Anyone from Basho care to comment? > > > On Thu, Jan 16, 2014 at 10:19 AM, Elias Levy > wrote: > > Also, while LevelDB appears to be largely an append only format, the > documentation currently does not recommend live backups, presumably because > there are some issues that can crop up if restoring a DB that was not cleanly > shutdown. > > I am guessing those issues are the ones documented as edge cases here: > https://github.com/basho/leveldb/wiki/repair-notes > > That said, it looks like as of 1.4 those are largely cleared up, at least > from what I gather from that page, and that one must only ensure that data is > copied in a certain order and that you run the LevelDB repair algorithm when > retiring the files. > > So is the backup documentation on LevelDB still correct? Will Basho will > enable hot backups on LevelDB backends any time soon? > > > > On Thu, Jan 16, 2014 at 10:05 AM, Elias Levy > wrote: > How well does Riak Search play with backups? Can you backup the Riak Search > data without bringing the node down? > > The Riak documentation backup page is completely silent on Riak Search and > its merge_index backend. > > And looking forward, what is the backup strategy for Yokozuna? Will it make > use of Solr's Replication Handler, or something more lower level? Will the > node need to be offline to backup it up? > > > > ___ > riak-users mailing list > riak-users@lists.basho.com > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Ruby eventmachine async client
Given Mathias Meyer's talk at Scotland Ruby Conference about eventmachine programming (while wearing a Riak t-shirt!), I was hoping to see a little bit more in terms of eventmachine clients and Riak. Has anyone used EventMachine and/or em-synchrony with Riak and could give me some advice? I'm using ruby-riak-client at this point and going the route of trying to wrap my workflow in fibers so riak client plays nice with it. Am I better off just using a HTTP client like EM::HttpRequest or EM::Synchrony::Multi, or is there some good way to use ruby-riak-client or ripple with eventmachine that requires less manual intervention? Sorry if this has been covered somewhere else but I haven't had much luck finding anyone else using EM with Riak. Thanks, Sean McKibben ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com