you can't do it good enough *without* built in support On Fri, Jun 17, 2016 at 9:53 AM Mark Miller <[email protected]> wrote:
> No, you can't do it good enough with built in support. You can follow that > ticket and see that is how I started. If it's a big enough issue, we should > backport that to 6x and deal with the back compat breaks. > > On Fri, Jun 17, 2016 at 8:11 AM Varun Thacker <[email protected]> > wrote: > >> Hi Mark, >> >> So for the 6.x line do you think we should add a background thread which >> expires idle connections and expired connections ? Or do you have any >> other recommendations ? >> >> On Fri, Jun 17, 2016 at 5:25 PM, Mark Miller <[email protected]> >> wrote: >> >>> Ah, forgot to mention, it's only on 7x. >>> >>> Mark >>> >>> On Fri, Jun 17, 2016 at 5:06 AM Varun Thacker < >>> [email protected]> wrote: >>> >>>> bq. It's now part of HttpClient. >>>> >>>> Were you referring to Line230 of HttpClientUtil on master ? - >>>> cm.setValidateAfterInactivity(Integer.getInteger(VALIDATE_AFTER_INACTIVITY, >>>> VALIDATE_AFTER_INACTIVITY_DEFAULT)); >>>> >>>> On Fri, Jun 17, 2016 at 12:13 PM, Varun Thacker < >>>> [email protected]> wrote: >>>> >>>>> Hi Mark, >>>>> >>>>> We were running Solr 5.4.1 on a 4 node machine and a 2 shard 2 replica >>>>> collection. >>>>> The test data is roughly 30M large documents. The indexing process is >>>>> via map-reduce and there are 80 parallel reducers sending a batch of 500 >>>>> documents to solr at a go. >>>>> >>>>> In this setup almost all runs hit the NoHttpResponseException b/w >>>>> leader and replica once. >>>>> >>>>> "It's now part of HttpClient." - Sorry I didn't quite follow whats >>>>> part of HttpClient? >>>>> >>>>> >>>>> >>>>> On Fri, Jun 17, 2016 at 6:51 AM, Mark Miller <[email protected]> >>>>> wrote: >>>>> >>>>>> I'm sorry, you say it's easy to reproduce, but can you explain >>>>>> roughly what you are doing to reproduce it? >>>>>> >>>>>> Mark >>>>>> >>>>>> On Thu, Jun 16, 2016 at 9:20 PM Mark Miller <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> That's already how things work. It's now part of HttpClient. There >>>>>>> are some settings you can mess with. Is it easy to reproduce? >>>>>>> >>>>>>> Mark >>>>>>> On Thu, Jun 16, 2016 at 1:15 PM Varun Thacker < >>>>>>> [email protected]> wrote: >>>>>>> >>>>>>>> When running a bulk index process occasionally we see a >>>>>>>> NoHttpResponseException error when the leader is forwarding docs to the >>>>>>>> replica. I think this is a known issue and can be reproduced pretty >>>>>>>> easily. >>>>>>>> >>>>>>>> What makes me want to dig more is that because of one such >>>>>>>> NoHttpResponseException the leader will put the replica into recovery. >>>>>>>> The >>>>>>>> replica can never catch up because the indexing throughput is quite >>>>>>>> high . >>>>>>>> This can add hours of recovery time for the replica depending on how >>>>>>>> many >>>>>>>> documents one is indexing . >>>>>>>> >>>>>>>> So from what I can think we have two options here - >>>>>>>> 1. Implement a thread which removes stale connections. This has >>>>>>>> been discussed on https://issues.apache.org/jira/browse/SOLR-4509 >>>>>>>> in the past >>>>>>>> 2. The above solution is not the right way forward. The main >>>>>>>> problem here is that replicas can't catch up because Solr doesn't >>>>>>>> implement >>>>>>>> backpressure yet and implementing that would be the correct solution >>>>>>>> here >>>>>>>> >>>>>>>> Does anyone have an opinion on how we should we go forward with >>>>>>>> this issue? >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> >>>>>>>> >>>>>>>> Regards, >>>>>>>> Varun Thacker >>>>>>>> >>>>>>> -- >>>>>>> - Mark >>>>>>> about.me/markrmiller >>>>>>> >>>>>> -- >>>>>> - Mark >>>>>> about.me/markrmiller >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> >>>>> >>>>> Regards, >>>>> Varun Thacker >>>>> >>>> >>>> >>>> >>>> -- >>>> >>>> >>>> Regards, >>>> Varun Thacker >>>> >>> -- >>> - Mark >>> about.me/markrmiller >>> >> >> >> >> -- >> >> >> Regards, >> Varun Thacker >> > -- > - Mark > about.me/markrmiller > -- - Mark about.me/markrmiller
