On Wed, 21 Feb 2024 14:57:29 +0100,
Sadeep Madurange wrote:
> 
> Is there a way to block non-browser clients from accessing a website
> (e.g., scraping attempts by bots or even software like Selenium that
> might programmatically control a browser), preferrably before the
> requests reach the webserver?
> 
> I'm wondering if there's a to do that with, for example, pf to block
> such requests completely rather than responding with a 403.
> 

Here the whole industry which is called Bot Managment which solves that
issue via analyzing request, offers to some edge cases captcha and so
on.

A trivial bot can be catch by regex against User-Agent, or via rate
limit. But more sophisticated ones need a lot of tools, which may
include things like crossing user agent with TLS-level extentions inside
Hello packet, checks against lists of blacklisted IPs and so on.

As far as I know the best public availabe list of "bad IP" is
https://www.blocklist.de/ which isn't full but allows to ban
automatically something. Thus, you may use spamd-setup in blocking mode
to fill pf rules via cron.

-- 
wbr, Kirill

Reply via email to