Here's a port for Anubis. It requires the go update (1.24.1) which was
only just committed.

I took a few slightly non-standard decisions around the rc script.

1. Anubis is normally configured via environment variables (it is
normally meant for running in Docker and this seems fairly common,
albeit annoying). This can be done via login.conf "setenv", but one
of the required parameters is a URL. It is _possible_ to escape
URL-standard characters for setenv, but the syntax is barely
documented and pretty horrible, so I am sourcing a shell script
/etc/anubis.env instead.

2. I decided to run as user 'www' by default; I'm not entirely happy
with the proliferation of fixed uids in ports, and 'www' seems
reasonable for the purpose. If someone wants a specific other uid,
they're able to change it via the usual mechanism (i.e. anubis_user
in rc.conf.local).

Any comments, strong objections to those decisions, or OKs?

---
Anubis acts as middleware between a reverse proxy and backend web server.
It assesses whether a connection is likely to be from a scraper bot and,
if this seems that there's a chance of this, it issues a SHA-256 proof-
of-work challenge before allowing the connection to proceed.

As of 1.14.x, Anubis decides to present a challenge using this logic:

    User-Agent contains "Mozilla"
    Request path is not in /.well-known, /robots.txt, or /favicon.ico
    Request path is not obviously an RSS feed (ends with .rss, .xml, or .atom)

This should ensure that git clients, RSS readers, and other low-harm
clients can get through without issue, but high-risk clients such as
browsers and AI scraper bots impersonating browsers will get blocked.

When a challenge is passed, a signed JSON Web Token (JWT) is provided
as a cookie, allowing future requests to pass without triggering the
challenge.

Using Anubis will likely result in your website not being indexed by
some search engines. This is considered a feature, not a bug.
---

Attachment: anubis.tgz
Description: application/tar-gz

Reply via email to