Actually I had never seen this error before, and dont see it anymore now (maybe it was because of the migration to 1.2.0-rc1). The problem is difficult to describe because there are different errors every time I make tests : here is a list of them (error appears during uncorrelated riak search queries) : {{nocatch,stream_timeout},[{riak_search_op_utils,gather_stream_results,4}]} (correlated with other errors) {{badmatch,{error,emfile}},[{mi_segment,iterate_by_keyinfo,7},{mi_server,'-lookup/8-lc$^1/1-1-',4},{mi_server,'-lookup/8-lc$^1/1-1-',4},{mi_server,lookup,8}]}
After the migration to 1.2.0-rc1 I saw (never saw this error before, that's why it's not in the first mail) : {{badfun,#Fun<riak_search_client.9.8393097>},[{mi_server,iterate,6},{mi_server,lookup,8}]} But the main error (the one that appears most often) is : {error,{throw,{timeout,range_loop},[{riak_search_backend,collect_info_response,3},{riak_search_op_term,info,3},{riak_search_op_term,preplan,2},{riak_search_op,'-preplan/2-lc$^0/1-0-',2},{riak_search_op_intersection,preplan,2},{riak_search_op,'-preplan/2-lc$^0/1-0-',2},{riak_search_op,'-preplan/2-lc$^0/1-0-',2},{riak_search_op_intersection,preplan,2}]}} My riak cluster has 5 nodes, now they all use the 1.2.0-rc1 version of Riak, with default configuration in app.config. The ulimit is 2048 for all nodes. To avoid error during indexing, I had in vm.arg : -env ERL_MAX_ETS_TABLES 50000 On each node there is approximately : - 53G of merge_index data - 26G of bitcast data There are around two hundred different riak search indexes. The errors began after indexing many documents into riak search (there were 3 nodes), a node reached is disk capacity so I had to add 2 nodes, then restart the indexation that succeed, but the errors describe above starts on some random search queries. Thank you very much for your answer, I will try the repair command on every partitions tonight. Regards. 2012/7/18 Ryan Zezeski <rzeze...@basho.com> > The `badfun` is a new error. That wasn't in your original email. I'm not > sure why you are seeing that. Are all your Riak nodes using 1.2.0-rc1? > Can you give me more information on your cluster setup? Are there any > other errors in you logs? The more information the more I can help. > > The repair "command" is not actually available from the command line yet. > You need to attach to the Riak console to access it. The APIs are > `riak_kv_vnode:repair(PartitionNumber)` and > `riak_search_vnode:repair(PartitionNumber)`. > > > On Wed, Jul 18, 2012 at 1:02 PM, Arnaud Wetzel <arnaud.wet...@gmail.com>wrote: > >> Ryan, >> Increasing "ulimit -n" (current value is 4096, I have tested from 1024 to >> 200000) does not change anything, always the same errors : >> {timeout,range_loop} >> lookup/range failure: >> {{badfun,#Fun<riak_search_client.9.8393097>},[{mi_server,iterate,6},{mi_server,lookup,8}]} >> >> I cannot find the command "repair" that you talk about in your email (on >> riak1.2.0-rc1), is it a function directly in an erlang module and not >> accessible yet with riak-admin ? >> >> Thank you very much. >> >> -- >> Arnaud Wetzel >> KBRW Ad-Venture >> 13 rue st Anastase, 75003 Paris >> >> 2012/7/16 Ryan Zezeski <rzeze...@basho.com> >> >>> Arnaud, >>> >>> The 'stream_timeout' and 'emfile' should be correlated. Whenever you >>> see the 'emfile' you should see a corresponding timeout. The index server >>> errors causing the result collector to timeout later. First, adjust your >>> file descriptor limit and then go from there. >>> >>> For the 1.2 release a "repair" command has been added to rebuild KV or >>> index data for a given partition. In releases before that you must reindex >>> all your data. You don't have to worry about removing the current indexes >>> as merge index will garbage collect that for you as it merges. As I said, >>> first I would fix the 'emfile' issue and then see if further action is >>> needed. >>> >>> -Z >>> >>> P.S. If you want to be absolutely sure what your FD limit is in Riak you >>> can `riak attach` and then `os:cmd("ulimit -n").` Make sure to use Ctrl-D >>> to exit from the Riak shell. >>> >>> On Mon, Jul 16, 2012 at 5:21 AM, Arnaud Wetzel >>> <arnaud.wet...@gmail.com>wrote: >>> >>>> Hi, >>>> Friday evening one of our riak node has reach his disk space limit >>>> during indexing in riak-search. Then after adding some nodes, some requests >>>> fail, and it is impossible to find the correlation between requests with >>>> error or those who succeed. >>>> The errors are : >>>> >>>> {{nocatch,stream_timeout},[{riak_search_op_utils,gather_stream_results,4}]} >>>> {timeout,range_loop} >>>> >>>> and sometimes (not always) : >>>> >>>> {{badmatch,{error,emfile}},[{mi_segment,iterate_by_keyinfo,7},{mi_server,'-lookup/8-lc$^1/1-1-',4},{mi_server,'-lookup/8-lc$^1/1-1-',4},{mi_server,lookup,8}]} >>>> >>>> So anyone else has experienced these errors ? Is it possible that they >>>> come from the disk over limit error ? How can I try to repair merge index >>>> data ? If it is not possible, what is the good process to delete entirely >>>> all the indexes (only indexes, keeping riak datas). >>>> >>>> Thank you very much. >>>> >>>> Regards. >>>> >>>> -- >>>> Arnaud Wetzel >>>> KBRW Ad-Venture >>>> 13 rue st Anastase, 75003 Paris >>>> >>>> _______________________________________________ >>>> riak-users mailing list >>>> riak-users@lists.basho.com >>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com >>>> >>>> >>> >> >
_______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com