On Tue, Jul 9, 2024 at 9:11 AM Dave Wreski <dwre...@guardiandigital.com.invalid> wrote:
> Hi, I have the following rewrite rule in place on one of our staging sites > to redirect bots and malicious scripts to our corporate page: > > RewriteCond %{HTTP_USER_AGENT} > ^$ [OR] > RewriteCond %{HTTP_USER_AGENT} > ^.*(<|>|'|%0A|%0D|%27|%3C|%3E|%00).* [NC,OR] > RewriteCond %{HTTP_USER_AGENT} > ^.*(HTTrack|clshttp|archiver|loader|email|nikto|miner|python).* [NC,OR] > RewriteCond %{HTTP_USER_AGENT} > ^.*(winhttp|libwww\-perl|curl|wget|harvest|scan|grab|extract).* [NC,OR] > RewriteCond %{HTTP_USER_AGENT} > ^.*(Googlebot|SemrushBot|PetalBot|Bytespider|bingbot).* [NC] > RewriteRule (.*) https://guardiandigital.com$1 [L,R=301] > > However, it doesn't appear to always work properly: > > 66.249.68.6 - - [08/Jul/2024:11:43:41 -0400] "GET /robots.txt HTTP/1.1" > 200 343 r:"-" "Mozilla/5.0 (compatible; Googlebot/2.1; + > http://www.google.com/bot.html)" 0/5493 1145/6615/343 H:HTTP/1.1 > U:/robots.txt s:200 > > Instead of making changes to my rules then having to wait until the > condition is met (Googlebot scans the site again), I'd like to simulate the > above request against my ruleset to see if it matches. Is this possible? > > Thanks, > Dave > > > For the user agent, just install an extension in your browser to "fake" the value, and make a HTTP request. Alternatively, you can use curl as well.