On Sat, Jun 26, 2021 at 11:09 PM Rahul Goswami <rahul196...@gmail.com>
wrote:

> This begs a question...For anyone who has been burnt by the deep pagination
> issue in the past, what is a reasonable value of "start" param beyond which



> M
> there is a noticeable performance degradation?
>
> Rahul
>
> On Fri, Jun 25, 2021 at 11:28 PM Walter Underwood <wun...@wunderwood.org>
> wrote:
>
> > Cursors require keeping session state outside of Solr. With a million
> > queries per hour and the middle tier spread across lots of containers,
> that
> > isn’t practical. Stateless searches are the default in Solr for a good
> > reason.
> >
> > Using start and rows works great. The only issue is that Solr is
> > defenseless against deep paging.
> >
> > wunder
> > Walter Underwood
> > wun...@wunderwood.org
> > http://observer.wunderwood.org/  (my blog)
> >
> > > On Jun 25, 2021, at 8:09 PM, Dwane Hall <dwaneh...@hotmail.com> wrote:
> > >
> > > Ok we lock down the rows and start params and then use cursors (which
> > you don't want to use) for paging in increments of the page size.  It
> works
> > nicely for us but it sounds like it's not workable solution for you.
> > >
> > > Thanks,
> > >
> > > Dwane
> > > From: Walter Underwood <wun...@wunderwood.org <mailto:
> > wun...@wunderwood.org>>
> > > Sent: Saturday, 26 June 2021 12:53 PM
> > > To: users@solr.apache.org <mailto:users@solr.apache.org> <
> > users@solr.apache.org <mailto:users@solr.apache.org>>
> > > Subject: Re: Defense against deep paging?
> > >
> > > The start parameter needs to be read from the request. That is how the
> > client gets to the second page of results, by setting start=10 or
> start=20.
> > The problem is when a bot sneaks through the checks and Solr gets
> > start=3990000. A few of those will use all of heap and take down the
> server
> > process.
> > >
> > > wunder
> > > Walter Underwood
> > > wun...@wunderwood.org <mailto:wun...@wunderwood.org>
> > > http://observer.wunderwood.org/ <http://observer.wunderwood.org/>  (my
> > blog)
> > >
> > > > On Jun 25, 2021, at 6:40 PM, Dwane Hall <dwaneh...@hotmail.com
> > <mailto:dwaneh...@hotmail.com>> wrote:
> > > >
> > > > Hey Walter,
> > > >
> > > > Can you set the value for start (0) and rows (your default sensible
> > response row size) as an invariant in the request handler you're using so
> > it can't be overridden from a client request? That's how I've defended
> > against it from Solr's perspective in the past. This can be hard coded in
> > your request handler in the XML of your solr-config or using the
> parameters
> > API. I've found it simple but effective approach and there's an example
> > here from the docs (
> >
> https://solr.apache.org/guide/8_8/requesthandlers-and-searchcomponents-in-solrconfig.html#request-handlers
> > <
> >
> https://solr.apache.org/guide/8_8/requesthandlers-and-searchcomponents-in-solrconfig.html#request-handlers
> > >).
> > > >
> > > > Thanks,
> > > >
> > > > Dwane
> > > > From: Walter Underwood <wun...@wunderwood.org <mailto:
> > wun...@wunderwood.org>>
> > > > Sent: Saturday, 26 June 2021 6:39 AM
> > > > To: users@solr.apache.org <mailto:users@solr.apache.org> <
> > users@solr.apache.org <mailto:users@solr.apache.org>>
> > > > Subject: Re: Defense against deep paging?
> > > >
> > > > Thanks, that is exactly the info I wanted! I’ve commented there, even
> > though it is closed as Won’t Do.
> > > >
> > > > wunder
> > > > Walter Underwood
> > > > wun...@wunderwood.org <mailto:wun...@wunderwood.org>
> > > > http://observer.wunderwood.org/ <http://observer.wunderwood.org/> <
> > http://observer.wunderwood.org/ <http://observer.wunderwood.org/>>  (my
> > blog)
> > > >
> > > > > On Jun 25, 2021, at 12:46 PM, Mike Drob <md...@mdrob.com <mailto:
> > md...@mdrob.com>> wrote:
> > > > >
> > > > > This was discussed somewhat in
> > > > > https://issues.apache.org/jira/browse/SOLR-15252 <
> > https://issues.apache.org/jira/browse/SOLR-15252><
> > https://issues.apache.org/jira/browse/SOLR-15252 <
> > https://issues.apache.org/jira/browse/SOLR-15252>> with no
> > > > > implementation provided.
> > > > >
> > > > > On Fri, Jun 25, 2021 at 11:52 AM Walter Underwood <
> > wun...@wunderwood.org <mailto:wun...@wunderwood.org>> wrote:
> > > > >>
> > > > >> I already said that we have a limit in the client code. I’m asking
> > about a limit in Solr.
> > > > >>
> > > > >> wunder
> > > > >> Walter Underwood
> > > > >> wun...@wunderwood.org <mailto:wun...@wunderwood.org>
> > > > >> http://observer.wunderwood.org/ <http://observer.wunderwood.org/>
> <
> > http://observer.wunderwood.org/ <http://observer.wunderwood.org/>>  (my
> > blog)
> > > > >>
> > > > >>> On Jun 25, 2021, at 11:50 AM, Håvard Wahl Kongsgård <
> > haavard.kongsga...@gmail.com <mailto:haavard.kongsga...@gmail.com>>
> wrote:
> > > > >>>
> > > > >>> Just create a proxy client between the user and solr. Set if page
> > >= 500 ….
> > > > >>> else
> > > > >>>
> > > > >>> Simple stuff
> > > > >>>
> > > > >>> fre. 25. jun. 2021 kl. 19:20 skrev Walter Underwood <
> > wun...@wunderwood.org <mailto:wun...@wunderwood.org>>:
> > > > >>>
> > > > >>>> Has anyone implemented protection against deep paging inside
> > Solr? I’m
> > > > >>>> thinking about something like a max_rows parameter, where if
> > start+rows was
> > > > >>>> greater than that, it would limit the max result to that number.
> > Or maybe
> > > > >>>> just return a 400, that would be OK too.
> > > > >>>>
> > > > >>>> I’ve had three or four outages caused by deep paging over the
> > past dozen
> > > > >>>> years with Solr. We implement a limit in the client code, then
> > someone
> > > > >>>> forgets to add it to the redesigned client code. A limit in the
> > request
> > > > >>>> handler would be so much easier.
> > > > >>>>
> > > > >>>> And yes, I know about cursor marks. We don’t want to enable deep
> > paging,
> > > > >>>> we want to stop it.
> > > > >>>>
> > > > >>>> wunder
> > > > >>>> Walter Underwood
> > > > >>>> wun...@wunderwood.org <mailto:wun...@wunderwood.org>
> > > > >>>> http://observer.wunderwood.org/ <
> http://observer.wunderwood.org/>
> > <http://observer.wunderwood.org/ <http://observer.wunderwood.org/>>  (my
> > blog)
> > > > >>>>
> > > > >>>> --
> > > > >>> Håvard Wahl Kongsgård
> > > > >>> Data Scientist
> > > > >>
> > > >
> >
> >
>
-- 
*Geren White | Senior Director, Engineering*
*(e)* ge...@1stdibs.com

Reply via email to