On Wed, 21 Feb 2024 14:57:29 +0100, Sadeep Madurange wrote: > > Is there a way to block non-browser clients from accessing a website > (e.g., scraping attempts by bots or even software like Selenium that > might programmatically control a browser), preferrably before the > requests reach the webserver? > > I'm wondering if there's a to do that with, for example, pf to block > such requests completely rather than responding with a 403. >
Here the whole industry which is called Bot Managment which solves that issue via analyzing request, offers to some edge cases captcha and so on. A trivial bot can be catch by regex against User-Agent, or via rate limit. But more sophisticated ones need a lot of tools, which may include things like crossing user agent with TLS-level extentions inside Hello packet, checks against lists of blacklisted IPs and so on. As far as I know the best public availabe list of "bad IP" is https://www.blocklist.de/ which isn't full but allows to ban automatically something. Thus, you may use spamd-setup in blocking mode to fill pf rules via cron. -- wbr, Kirill