Re: NoHttpResponseException error between leader and replica

Mark Miller Fri, 17 Jun 2016 06:55:10 -0700

you can't do it good enough *without* built in support

On Fri, Jun 17, 2016 at 9:53 AM Mark Miller <[email protected]> wrote:


> No, you can't do it good enough with built in support. You can follow that
> ticket and see that is how I started. If it's a big enough issue, we should
> backport that to 6x and deal with the back compat breaks.
>
> On Fri, Jun 17, 2016 at 8:11 AM Varun Thacker <[email protected]>
> wrote:
>
>> Hi Mark,
>>
>> So for the 6.x line do you think we should add a background thread which
>> expires idle connections and expired connections ?  Or do you have any
>> other recommendations ?
>>
>> On Fri, Jun 17, 2016 at 5:25 PM, Mark Miller <[email protected]>
>> wrote:
>>
>>> Ah, forgot to mention, it's only on 7x.
>>>
>>> Mark
>>>
>>> On Fri, Jun 17, 2016 at 5:06 AM Varun Thacker <
>>> [email protected]> wrote:
>>>
>>>> bq. It's now part of HttpClient.
>>>>
>>>> Were you referring to Line230 of HttpClientUtil on master ? - 
>>>> cm.setValidateAfterInactivity(Integer.getInteger(VALIDATE_AFTER_INACTIVITY,
>>>> VALIDATE_AFTER_INACTIVITY_DEFAULT));
>>>>
>>>> On Fri, Jun 17, 2016 at 12:13 PM, Varun Thacker <
>>>> [email protected]> wrote:
>>>>
>>>>> Hi Mark,
>>>>>
>>>>> We were running Solr 5.4.1 on a 4 node machine and a 2 shard 2 replica
>>>>> collection.
>>>>> The test data is roughly 30M large documents. The indexing process is
>>>>> via map-reduce and there are 80 parallel reducers sending a batch of 500
>>>>> documents to solr at a go.
>>>>>
>>>>> In this setup almost all runs hit the NoHttpResponseException b/w
>>>>> leader and replica once.
>>>>>
>>>>> "It's now part of HttpClient." - Sorry I didn't quite follow whats
>>>>> part of HttpClient?
>>>>>
>>>>>
>>>>>
>>>>> On Fri, Jun 17, 2016 at 6:51 AM, Mark Miller <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> I'm sorry, you say it's easy to reproduce, but can you explain
>>>>>> roughly what you are doing to reproduce it?
>>>>>>
>>>>>> Mark
>>>>>>
>>>>>> On Thu, Jun 16, 2016 at 9:20 PM Mark Miller <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> That's already how things work. It's now part of HttpClient. There
>>>>>>> are some settings you can mess with. Is it easy to reproduce?
>>>>>>>
>>>>>>> Mark
>>>>>>> On Thu, Jun 16, 2016 at 1:15 PM Varun Thacker <
>>>>>>> [email protected]> wrote:
>>>>>>>
>>>>>>>> When running a bulk index process occasionally we see a
>>>>>>>> NoHttpResponseException error when the leader is forwarding docs to the
>>>>>>>> replica. I think this is a known issue and can be reproduced pretty 
>>>>>>>> easily.
>>>>>>>>
>>>>>>>> What makes me want to dig more is that because of one such
>>>>>>>> NoHttpResponseException the leader will put the replica into recovery. 
>>>>>>>> The
>>>>>>>> replica can never catch up because the indexing throughput is quite 
>>>>>>>> high .
>>>>>>>> This can add hours of recovery time for the replica depending on how 
>>>>>>>> many
>>>>>>>> documents one is indexing .
>>>>>>>>
>>>>>>>> So from what I can think we have two options here -
>>>>>>>> 1. Implement a thread which removes stale connections. This has
>>>>>>>> been discussed on https://issues.apache.org/jira/browse/SOLR-4509
>>>>>>>> in the past
>>>>>>>> 2. The above solution is not the right way forward. The main
>>>>>>>> problem here is that replicas can't catch up because Solr doesn't 
>>>>>>>> implement
>>>>>>>> backpressure yet and implementing that would be the correct solution 
>>>>>>>> here
>>>>>>>>
>>>>>>>> Does anyone have an opinion on how we should we go forward with
>>>>>>>> this issue?
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>>
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Varun Thacker
>>>>>>>>
>>>>>>> --
>>>>>>> - Mark
>>>>>>> about.me/markrmiller
>>>>>>>
>>>>>> --
>>>>>> - Mark
>>>>>> about.me/markrmiller
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>>
>>>>>
>>>>> Regards,
>>>>> Varun Thacker
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>>
>>>> Regards,
>>>> Varun Thacker
>>>>
>>> --
>>> - Mark
>>> about.me/markrmiller
>>>
>>
>>
>>
>> --
>>
>>
>> Regards,
>> Varun Thacker
>>
> --
> - Mark
> about.me/markrmiller
>
-- 
- Mark
about.me/markrmiller

Re: NoHttpResponseException error between leader and replica

Reply via email to