That IP resolves to rate-limited-proxy-72-14-199-18.google.com - this is not the Google search crawler, hence why it ignores your robots.txt. No one seems to know for sure what the rate-limited-proxy IPs are used for. They could represent random Chrome users using the Google data saving feature, hence the varying user-agents you will see. Either way, they are probably best not blocked, as they could represent many end user IPs. Maybe there is an X-Forwarded-For header you could look at.
The Google search crawler will resolve to an IP like crawl-66-249-64-213.googlebot.com. On Mon, Jun 11, 2018 at 5:05 PM Francis Daly <fran...@daoine.org> wrote: > On Thu, Jun 07, 2018 at 07:57:43PM -0400, shiz wrote: > > Hi there, > > > Recently, Google has started spidering my website and in addition to > normal > > pages, appended "&" to all urls, even the pages excluded by robots.txt > > > > e.g. page.php?page=aaa -> page.php?page=aaa& > > > > Any idea how to redirect/rewrite this? > > Untested, but: > > if ($args ~ "&$") { return 400; } > > should handle all requests that end in the four characters you report. > > You may prefer a different response code. > > Good luck with it, > > f > -- > Francis Daly fran...@daoine.org > _______________________________________________ > nginx mailing list > nginx@nginx.org > http://mailman.nginx.org/mailman/listinfo/nginx >
_______________________________________________ nginx mailing list nginx@nginx.org http://mailman.nginx.org/mailman/listinfo/nginx