This begs a question...For anyone who has been burnt by the deep pagination issue in the past, what is a reasonable value of "start" param beyond which there is a noticeable performance degradation?
Rahul On Fri, Jun 25, 2021 at 11:28 PM Walter Underwood <wun...@wunderwood.org> wrote: > Cursors require keeping session state outside of Solr. With a million > queries per hour and the middle tier spread across lots of containers, that > isn’t practical. Stateless searches are the default in Solr for a good > reason. > > Using start and rows works great. The only issue is that Solr is > defenseless against deep paging. > > wunder > Walter Underwood > wun...@wunderwood.org > http://observer.wunderwood.org/ (my blog) > > > On Jun 25, 2021, at 8:09 PM, Dwane Hall <dwaneh...@hotmail.com> wrote: > > > > Ok we lock down the rows and start params and then use cursors (which > you don't want to use) for paging in increments of the page size. It works > nicely for us but it sounds like it's not workable solution for you. > > > > Thanks, > > > > Dwane > > From: Walter Underwood <wun...@wunderwood.org <mailto: > wun...@wunderwood.org>> > > Sent: Saturday, 26 June 2021 12:53 PM > > To: users@solr.apache.org <mailto:users@solr.apache.org> < > users@solr.apache.org <mailto:users@solr.apache.org>> > > Subject: Re: Defense against deep paging? > > > > The start parameter needs to be read from the request. That is how the > client gets to the second page of results, by setting start=10 or start=20. > The problem is when a bot sneaks through the checks and Solr gets > start=3990000. A few of those will use all of heap and take down the server > process. > > > > wunder > > Walter Underwood > > wun...@wunderwood.org <mailto:wun...@wunderwood.org> > > http://observer.wunderwood.org/ <http://observer.wunderwood.org/> (my > blog) > > > > > On Jun 25, 2021, at 6:40 PM, Dwane Hall <dwaneh...@hotmail.com > <mailto:dwaneh...@hotmail.com>> wrote: > > > > > > Hey Walter, > > > > > > Can you set the value for start (0) and rows (your default sensible > response row size) as an invariant in the request handler you're using so > it can't be overridden from a client request? That's how I've defended > against it from Solr's perspective in the past. This can be hard coded in > your request handler in the XML of your solr-config or using the parameters > API. I've found it simple but effective approach and there's an example > here from the docs ( > https://solr.apache.org/guide/8_8/requesthandlers-and-searchcomponents-in-solrconfig.html#request-handlers > < > https://solr.apache.org/guide/8_8/requesthandlers-and-searchcomponents-in-solrconfig.html#request-handlers > >). > > > > > > Thanks, > > > > > > Dwane > > > From: Walter Underwood <wun...@wunderwood.org <mailto: > wun...@wunderwood.org>> > > > Sent: Saturday, 26 June 2021 6:39 AM > > > To: users@solr.apache.org <mailto:users@solr.apache.org> < > users@solr.apache.org <mailto:users@solr.apache.org>> > > > Subject: Re: Defense against deep paging? > > > > > > Thanks, that is exactly the info I wanted! I’ve commented there, even > though it is closed as Won’t Do. > > > > > > wunder > > > Walter Underwood > > > wun...@wunderwood.org <mailto:wun...@wunderwood.org> > > > http://observer.wunderwood.org/ <http://observer.wunderwood.org/> < > http://observer.wunderwood.org/ <http://observer.wunderwood.org/>> (my > blog) > > > > > > > On Jun 25, 2021, at 12:46 PM, Mike Drob <md...@mdrob.com <mailto: > md...@mdrob.com>> wrote: > > > > > > > > This was discussed somewhat in > > > > https://issues.apache.org/jira/browse/SOLR-15252 < > https://issues.apache.org/jira/browse/SOLR-15252>< > https://issues.apache.org/jira/browse/SOLR-15252 < > https://issues.apache.org/jira/browse/SOLR-15252>> with no > > > > implementation provided. > > > > > > > > On Fri, Jun 25, 2021 at 11:52 AM Walter Underwood < > wun...@wunderwood.org <mailto:wun...@wunderwood.org>> wrote: > > > >> > > > >> I already said that we have a limit in the client code. I’m asking > about a limit in Solr. > > > >> > > > >> wunder > > > >> Walter Underwood > > > >> wun...@wunderwood.org <mailto:wun...@wunderwood.org> > > > >> http://observer.wunderwood.org/ <http://observer.wunderwood.org/> < > http://observer.wunderwood.org/ <http://observer.wunderwood.org/>> (my > blog) > > > >> > > > >>> On Jun 25, 2021, at 11:50 AM, Håvard Wahl Kongsgård < > haavard.kongsga...@gmail.com <mailto:haavard.kongsga...@gmail.com>> wrote: > > > >>> > > > >>> Just create a proxy client between the user and solr. Set if page > >= 500 …. > > > >>> else > > > >>> > > > >>> Simple stuff > > > >>> > > > >>> fre. 25. jun. 2021 kl. 19:20 skrev Walter Underwood < > wun...@wunderwood.org <mailto:wun...@wunderwood.org>>: > > > >>> > > > >>>> Has anyone implemented protection against deep paging inside > Solr? I’m > > > >>>> thinking about something like a max_rows parameter, where if > start+rows was > > > >>>> greater than that, it would limit the max result to that number. > Or maybe > > > >>>> just return a 400, that would be OK too. > > > >>>> > > > >>>> I’ve had three or four outages caused by deep paging over the > past dozen > > > >>>> years with Solr. We implement a limit in the client code, then > someone > > > >>>> forgets to add it to the redesigned client code. A limit in the > request > > > >>>> handler would be so much easier. > > > >>>> > > > >>>> And yes, I know about cursor marks. We don’t want to enable deep > paging, > > > >>>> we want to stop it. > > > >>>> > > > >>>> wunder > > > >>>> Walter Underwood > > > >>>> wun...@wunderwood.org <mailto:wun...@wunderwood.org> > > > >>>> http://observer.wunderwood.org/ <http://observer.wunderwood.org/> > <http://observer.wunderwood.org/ <http://observer.wunderwood.org/>> (my > blog) > > > >>>> > > > >>>> -- > > > >>> Håvard Wahl Kongsgård > > > >>> Data Scientist > > > >> > > > > >