Re: Range Loop Timeout Error after (after disk space over limit)

Arnaud Wetzel Wed, 18 Jul 2012 13:40:18 -0700

Actually I had never seen this error before, and dont see it anymore now
(maybe it was because of the migration to 1.2.0-rc1). The problem is
difficult to describe because there are different errors every time I make
tests : here is a list of them (error appears during uncorrelated riak
search queries) :
 {{nocatch,stream_timeout},[{riak_search_op_utils,gather_stream_results,4}]}
(correlated with other errors)
{{badmatch,{error,emfile}},[{mi_segment,iterate_by_keyinfo,7},{mi_server,'-lookup/8-lc$^1/1-1-',4},{mi_server,'-lookup/8-lc$^1/1-1-',4},{mi_server,lookup,8}]}


After the migration to 1.2.0-rc1 I saw (never saw this error before, that's
why it's not in the first mail) :
{{badfun,#Fun<riak_search_client.9.8393097>},[{mi_server,iterate,6},{mi_server,lookup,8}]}

But the main error (the one that appears most often) is :
{error,{throw,{timeout,range_loop},[{riak_search_backend,collect_info_response,3},{riak_search_op_term,info,3},{riak_search_op_term,preplan,2},{riak_search_op,'-preplan/2-lc$^0/1-0-',2},{riak_search_op_intersection,preplan,2},{riak_search_op,'-preplan/2-lc$^0/1-0-',2},{riak_search_op,'-preplan/2-lc$^0/1-0-',2},{riak_search_op_intersection,preplan,2}]}}

My riak cluster has 5 nodes, now they all use the 1.2.0-rc1 version of
Riak, with default configuration in app.config. The ulimit is 2048 for all
nodes.
To avoid error during indexing, I had in vm.arg :
-env ERL_MAX_ETS_TABLES 50000

On each node there is approximately :
- 53G of merge_index data
- 26G of bitcast data
There are around two hundred different riak search indexes.

The errors began after indexing many documents into riak search (there were
3 nodes), a node reached is disk capacity so I had to add 2 nodes, then
restart the indexation that succeed, but the errors describe above starts
on some random search queries.

Thank you very much for your answer, I will try the repair command on every
partitions tonight.

Regards.

2012/7/18 Ryan Zezeski <rzeze...@basho.com>

> The `badfun` is a new error.  That wasn't in your original email.  I'm not
> sure why you are seeing that.  Are all your Riak nodes using 1.2.0-rc1?
>  Can you give me more information on your cluster setup?  Are there any
> other errors in you logs?  The more information the more I can help.
>
> The repair "command" is not actually available from the command line yet.
>  You need to attach to the Riak console to access it.  The APIs are
> `riak_kv_vnode:repair(PartitionNumber)` and
> `riak_search_vnode:repair(PartitionNumber)`.
>
>
> On Wed, Jul 18, 2012 at 1:02 PM, Arnaud Wetzel <arnaud.wet...@gmail.com>wrote:
>
>> Ryan,
>> Increasing "ulimit -n" (current value is 4096, I have tested from 1024 to
>> 200000) does not change anything, always the same errors :
>> {timeout,range_loop}
>> lookup/range failure:
>> {{badfun,#Fun<riak_search_client.9.8393097>},[{mi_server,iterate,6},{mi_server,lookup,8}]}
>>
>> I cannot find the command "repair" that you talk about in your email (on
>> riak1.2.0-rc1), is it a function directly in an erlang module and not
>> accessible yet with riak-admin ?
>>
>>  Thank you very much.
>>
>> --
>> Arnaud Wetzel
>> KBRW Ad-Venture
>>  13 rue st Anastase, 75003 Paris
>>
>> 2012/7/16 Ryan Zezeski <rzeze...@basho.com>
>>
>>> Arnaud,
>>>
>>> The 'stream_timeout' and 'emfile' should be correlated.  Whenever you
>>> see the 'emfile' you should see a corresponding timeout.  The index server
>>> errors causing the result collector to timeout later.  First, adjust your
>>> file descriptor limit and then go from there.
>>>
>>> For the 1.2 release a "repair" command has been added to rebuild KV or
>>> index data for a given partition.  In releases before that you must reindex
>>> all your data.  You don't have to worry about removing the current indexes
>>> as merge index will garbage collect that for you as it merges.  As I said,
>>> first I would fix the 'emfile' issue and then see if further action is
>>> needed.
>>>
>>> -Z
>>>
>>> P.S. If you want to be absolutely sure what your FD limit is in Riak you
>>> can `riak attach` and then `os:cmd("ulimit -n").`  Make sure to use Ctrl-D
>>> to exit from the Riak shell.
>>>
>>> On Mon, Jul 16, 2012 at 5:21 AM, Arnaud Wetzel 
>>> <arnaud.wet...@gmail.com>wrote:
>>>
>>>> Hi,
>>>> Friday evening one of our riak node has reach his disk space limit
>>>> during indexing in riak-search. Then after adding some nodes, some requests
>>>> fail, and it is impossible to find the correlation between requests with
>>>> error or those who succeed.
>>>> The errors are :
>>>>
>>>> {{nocatch,stream_timeout},[{riak_search_op_utils,gather_stream_results,4}]}
>>>> {timeout,range_loop}
>>>>
>>>> and sometimes (not always) :
>>>>
>>>> {{badmatch,{error,emfile}},[{mi_segment,iterate_by_keyinfo,7},{mi_server,'-lookup/8-lc$^1/1-1-',4},{mi_server,'-lookup/8-lc$^1/1-1-',4},{mi_server,lookup,8}]}
>>>>
>>>> So anyone else has experienced these errors ? Is it possible that they
>>>> come from the disk over limit error ? How can I try to repair merge index
>>>> data ? If it is not possible, what is the good process to delete entirely
>>>> all the indexes (only indexes, keeping riak datas).
>>>>
>>>> Thank you very much.
>>>>
>>>> Regards.
>>>>
>>>> --
>>>> Arnaud Wetzel
>>>> KBRW Ad-Venture
>>>> 13 rue st Anastase, 75003 Paris
>>>>
>>>> _______________________________________________
>>>> riak-users mailing list
>>>> riak-users@lists.basho.com
>>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>>>
>>>>
>>>
>>
>

_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: Range Loop Timeout Error after (after disk space over limit)

Reply via email to