We have seen crash _yz_events yes, none today for example, but there was
quite a bit yesterday.

It's a 5 node cluster - 14GB of RAM in each, solr jvm is set at 8GB on each.

I've not seen any corrupted data, but we could be looking in the wrong
place? Our buckets are set to allow_mult false and last_write_wins true, so
we don't expect any siblings.

We had a health done by yourselves also, the main change, which we have not
yet applied is switching AAE from active to passive and setting erlang
buffer to 128MB

*Jason Ryan*
VP Engineering

Real Time, Online Identity Verification

On 7 March 2015 at 15:54, Zeeshan Lakhani <zlakh...@basho.com> wrote:

> Hello Jason,
> I initially was thinking that you’re issues were similar to what we’ve
> found here,
> https://github.com/basho/yokozuna/issues/442#issuecomment-77233636, but
> seeing {error, retry_later} and the 500s seem to place the problems more on
> the Solr end. Just to make sure, are you seeing `yz_events` crashes in your
> logs?
> Can you tell me how much RAM each node has and have you adjusted/increased
> the search.sol.jmv_options max heap size at all (more info on
> issues/factors w/ Solr performance here ->
> http://docs.basho.com/riak/latest/ops/advanced/configs/search/#Solr-for-Operators
> )?
> What kind of issues are you getting in your solr logs? Anything related to
> bad/corrupted data (which will index into _yz_err fields when dealing with
> failed extractions) or possible sibling explosion (duplicates of the same
> object with different _yz_vtag fields)?
> Thanks.
Zeeshan Lakhani
programmer |
software engineer at @basho |
> org. member/founder of @papers_we_love | paperswelove.org
> twitter => @zeeshanlakhani
> On Mar 7, 2015, at 5:43 AM, Jason Ryan <jason.r...@trustev.com> wrote:
> Hi all,
> We're having real trouble with Riak Search.
> We are seeing an awful lot of errors, which leads to alot of logging for
> solr and the disk IO reaching 95% + which is causing lots of issues.
> - We consistently see errors around Riak trying to create indexes which
> already exist - only a restart of each Riak node stops this for a period of
> time - indexes are only ever created manually, not by software.
> - We see lots of errors around failing to index objects - details of what
> appears in the log are:
> 2015-03-07 10:30:26.871 [error] <0.2538.0>@yz_kv:index:215 failed to index
> object
> {{<<"Production">>,<<"Grains.Domain.Case">>},<<"455203890918dfc6fd3c7da49dd6adb00300000043a46a51">>}
> with error {"Failed to index docs",{error,retry_later}} because
> [{yz_solr,index,3,[{file,"src/yz_solr.erl"},{line,192}]},{yz_kv,index,7,[{file,"src/yz_kv.erl"},{line,267}]},{yz_kv,index,3,[{file,"src/yz_kv.erl"},{line,202}]},{riak_kv_vnode,actual_put,6,[{file,"src/riak_kv_vnode.erl"},{line,1418}]},{riak_kv_vnode,perform_put,3,[{file,"src/riak_kv_vnode.erl"},{line,1406}]},{riak_kv_vnode,do_put,7,[{file,"src/riak_kv_vnode.erl"},{line,1201}]},{riak_kv_vnode,handle_command,3,[{file,"src/riak_kv_vnode.erl"},{line,486}]},{riak_core_vnode,vnode_command,3,[{file,"src/riak_core_vnode.erl"},{line,345}]}]
> - We are also starting to see 500's being returned for search queries -
> the response looks like this:
> <html><head><title>500 Internal Server
> Error</title></head><body><h1>Internal Server Error</h1>The server
> encountered an error while processing this request:<br><pre>{error,
>     {throw,
>         {"Failed to search",
>          "http://localhost:8093/internal_solr/sessions/select";,
>          {error,retry_later}},
>         [{yz_solr,search,3,[{file,"src/yz_solr.erl"},{line,278}]},
> {yz_wm_search,search,2,[{file,"src/yz_wm_search.erl"},{line,129}]},
>          {webmachine_resource,resource_call,3,
>              [{file,"src/webmachine_resource.erl"},{line,186}]},
>          {webmachine_resource,do,3,
>              [{file,"src/webmachine_resource.erl"},{line,142}]},
>          {webmachine_decision_core,resource_call,1,
>              [{file,"src/webmachine_decision_core.erl"},{line,48}]},
>          {webmachine_decision_core,decision,1,
>              [{file,"src/webmachine_decision_core.erl"},{line,558}]},
>          {webmachine_decision_core,handle_request,2,
>              [{file,"src/webmachine_decision_core.erl"},{line,33}]},
>          {webmachine_mochiweb,loop,2,
> * Connection #0 to host left intact
> [{file,"src/webmachine_mochiweb.erl"},{line,74}]}]}}</pre><P><HR><ADDRESS>mochiweb+webmachine
> web server</ADDRESS></body></html>
> Anyone that could point us in the right direction of where to look and
> debug, as this is becoming a huge issue for us.
> Thanks,
> Jason
