The start parameter needs to be read from the request. That is how the client 
gets to the second page of results, by setting start=10 or start=20. The 
problem is when a bot sneaks through the checks and Solr gets start=3990000. A 
few of those will use all of heap and take down the server process.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Jun 25, 2021, at 6:40 PM, Dwane Hall <dwaneh...@hotmail.com> wrote:
> 
> Hey Walter,
> 
> Can you set the value for start (0) and rows (your default sensible response 
> row size) as an invariant in the request handler you're using so it can't be 
> overridden from a client request? That's how I've defended against it from 
> Solr's perspective in the past. This can be hard coded in your request 
> handler in the XML of your solr-config or using the parameters API. I've 
> found it simple but effective approach and there's an example here from the 
> docs 
> (https://solr.apache.org/guide/8_8/requesthandlers-and-searchcomponents-in-solrconfig.html#request-handlers).
> 
> Thanks,
> 
> Dwane
> From: Walter Underwood <wun...@wunderwood.org>
> Sent: Saturday, 26 June 2021 6:39 AM
> To: users@solr.apache.org <users@solr.apache.org>
> Subject: Re: Defense against deep paging?
>  
> Thanks, that is exactly the info I wanted! I’ve commented there, even though 
> it is closed as Won’t Do.
> 
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/ <http://observer.wunderwood.org/>  (my blog)
> 
> > On Jun 25, 2021, at 12:46 PM, Mike Drob <md...@mdrob.com> wrote:
> > 
> > This was discussed somewhat in
> > https://issues.apache.org/jira/browse/SOLR-15252 
> > <https://issues.apache.org/jira/browse/SOLR-15252> with no
> > implementation provided.
> > 
> > On Fri, Jun 25, 2021 at 11:52 AM Walter Underwood <wun...@wunderwood.org> 
> > wrote:
> >> 
> >> I already said that we have a limit in the client code. I’m asking about a 
> >> limit in Solr.
> >> 
> >> wunder
> >> Walter Underwood
> >> wun...@wunderwood.org
> >> http://observer.wunderwood.org/ <http://observer.wunderwood.org/>  (my 
> >> blog)
> >> 
> >>> On Jun 25, 2021, at 11:50 AM, Håvard Wahl Kongsgård 
> >>> <haavard.kongsga...@gmail.com> wrote:
> >>> 
> >>> Just create a proxy client between the user and solr. Set if page >= 500 
> >>> ….
> >>> else
> >>> 
> >>> Simple stuff
> >>> 
> >>> fre. 25. jun. 2021 kl. 19:20 skrev Walter Underwood 
> >>> <wun...@wunderwood.org>:
> >>> 
> >>>> Has anyone implemented protection against deep paging inside Solr? I’m
> >>>> thinking about something like a max_rows parameter, where if start+rows 
> >>>> was
> >>>> greater than that, it would limit the max result to that number. Or maybe
> >>>> just return a 400, that would be OK too.
> >>>> 
> >>>> I’ve had three or four outages caused by deep paging over the past dozen
> >>>> years with Solr. We implement a limit in the client code, then someone
> >>>> forgets to add it to the redesigned client code. A limit in the request
> >>>> handler would be so much easier.
> >>>> 
> >>>> And yes, I know about cursor marks. We don’t want to enable deep paging,
> >>>> we want to stop it.
> >>>> 
> >>>> wunder
> >>>> Walter Underwood
> >>>> wun...@wunderwood.org
> >>>> http://observer.wunderwood.org/ <http://observer.wunderwood.org/>  (my 
> >>>> blog)
> >>>> 
> >>>> --
> >>> Håvard Wahl Kongsgård
> >>> Data Scientist
> >> 
> 

Reply via email to