It is usually a scraping bot. As I said before, we don’t want to actually do deep paging, we want to prevent it. We already know about streaming, export, and cursor marks. We use those when needed.
wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Jun 25, 2021, at 11:43 AM, Rahul Goswami <rahul196...@gmail.com> wrote: > > Is this deep pagination happening as a result of user action (eg: wanting > to see what is on the 200th page of the result)? Or due to queries > triggered in a loop trying to fetch results for some batch job ? > If it's the latter, you could consider suggesting client code change to use > streaming calls (/export or the Streaming API) which scales much better > than deep pagination queries. > > Rahul > > On Fri, Jun 25, 2021 at 1:20 PM Walter Underwood <wun...@wunderwood.org> > wrote: > >> Has anyone implemented protection against deep paging inside Solr? I’m >> thinking about something like a max_rows parameter, where if start+rows was >> greater than that, it would limit the max result to that number. Or maybe >> just return a 400, that would be OK too. >> >> I’ve had three or four outages caused by deep paging over the past dozen >> years with Solr. We implement a limit in the client code, then someone >> forgets to add it to the redesigned client code. A limit in the request >> handler would be so much easier. >> >> And yes, I know about cursor marks. We don’t want to enable deep paging, >> we want to stop it. >> >> wunder >> Walter Underwood >> wun...@wunderwood.org >> http://observer.wunderwood.org/ (my blog) >> >>