On Tue, Jul 9, 2024 at 9:11 AM Dave Wreski
<dwre...@guardiandigital.com.invalid> wrote:

> Hi, I have the following rewrite rule in place on one of our staging sites
> to redirect bots and malicious scripts to our corporate page:
>
>   RewriteCond %{HTTP_USER_AGENT}
> ^$                                                              [OR]
>   RewriteCond %{HTTP_USER_AGENT}
> ^.*(<|>|'|%0A|%0D|%27|%3C|%3E|%00).*                            [NC,OR]
>   RewriteCond %{HTTP_USER_AGENT}
> ^.*(HTTrack|clshttp|archiver|loader|email|nikto|miner|python).* [NC,OR]
>   RewriteCond %{HTTP_USER_AGENT}
> ^.*(winhttp|libwww\-perl|curl|wget|harvest|scan|grab|extract).* [NC,OR]
>   RewriteCond %{HTTP_USER_AGENT}
> ^.*(Googlebot|SemrushBot|PetalBot|Bytespider|bingbot).* [NC]
>   RewriteRule (.*)    https://guardiandigital.com$1 [L,R=301]
>
> However, it doesn't appear to always work properly:
>
> 66.249.68.6 - - [08/Jul/2024:11:43:41 -0400] "GET /robots.txt HTTP/1.1"
> 200 343 r:"-" "Mozilla/5.0 (compatible; Googlebot/2.1; +
> http://www.google.com/bot.html)" 0/5493 1145/6615/343 H:HTTP/1.1
> U:/robots.txt s:200
>
> Instead of making changes to my rules then having to wait until the
> condition is met (Googlebot scans the site again), I'd like to simulate the
> above request against my ruleset to see if it matches. Is this possible?
>
> Thanks,
> Dave
>
>
>
For the user agent, just install an extension in your browser to "fake" the
value, and make a HTTP request.  Alternatively, you can use curl as well.

Reply via email to