Re: rewrite question

Richard Stanway via nginx Mon, 11 Jun 2018 08:37:51 -0700

That IP resolves to rate-limited-proxy-72-14-199-18.google.com - this is
not the Google search crawler, hence why it ignores your robots.txt. No one
seems to know for sure what the rate-limited-proxy IPs are used for. They
could represent random Chrome users using the Google data saving feature,
hence the varying user-agents you will see. Either way, they are probably
best not blocked, as they could represent many end user IPs. Maybe there is
an X-Forwarded-For header you could look at.


The Google search crawler will resolve to an IP like
crawl-66-249-64-213.googlebot.com.



On Mon, Jun 11, 2018 at 5:05 PM Francis Daly <fran...@daoine.org> wrote:

> On Thu, Jun 07, 2018 at 07:57:43PM -0400, shiz wrote:
>
> Hi there,
>
> > Recently, Google has started spidering my website and in addition to
> normal
> > pages, appended "&amp" to all urls, even the pages excluded by robots.txt
> >
> > e.g.  page.php?page=aaa -> page.php?page=aaa&amp
> >
> > Any idea how to redirect/rewrite this?
>
> Untested, but:
>
>   if ($args ~ "&amp$") { return 400; }
>
> should handle all requests that end in the four characters you report.
>
> You may prefer a different response code.
>
> Good luck with it,
>
>         f
> --
> Francis Daly        fran...@daoine.org
> _______________________________________________
> nginx mailing list
> nginx@nginx.org
> http://mailman.nginx.org/mailman/listinfo/nginx
>

_______________________________________________
nginx mailing list
nginx@nginx.org
http://mailman.nginx.org/mailman/listinfo/nginx

Re: rewrite question

Reply via email to