On Wed, Oct 07, 2020 at 10:35:39PM +0000, Pau Peris wrote:
> Could you explain to me which would be the benefits of implementing
> such behaviour on a filter or milter instead of doing it on
> header_checks?
As I wrote upthread, and you quoted in your message:
> > RFC5322.From syntax is rather non-trivial, and trying to parse it with
> > regular expressions is not a terribly good idea. While most addresses
> > are simple, and you might not ever see the exceptions, I do not
> > recommend ad-hoc half-right parsers for the mailbox syntax.
It is non-trivial to craft robust regular expressions for RFC*22 mailbox
syntax, not quite as bad as:
https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454
but naïve attempts are likely to fall short of the full grammar. It
might be simpler to arrange for multi-recipient messages to the
purported author of the message to be dropped, by passing mail
submission from the Web form through an SMTP service that rejects
all multi-recipient mail (and making sure that the envelope is
not split before that happens).
On the other hand, for a web contact form, if you want to only permit
a single
localpart@domain
format, rather than any of the more general
phrase <mailbox>
"quoted-text" <mailbox>
mailbox (comment)
...
variants, then a regular expression becomes somewhat simpler, until
you also need to handle EAI (non-ASCII localpart and/or domain), e.g.
виктор1spam@духовный.org
the possible forms are then:
- dot-atom@domain
- quoted-string@domain
Where the first variant is matched by:
# PCRE: ASCII dot-atom @ domain
/^ (?: [^][()<>:;@\\,."\x00-\x20\x7f-\xff]+ \.)?
[^][()<>:;@\\,."\x00-\x20\x7f-\xff]+ @ (?: [a-z\d]+ (-+[a-z\d]+)* \.)+ [a-z\d]+
(-+[a-z\d]+)* /x DUNNO
# PCRE: quoted-string sans NUL @ domain
/^ " ( [^\\"\x00]+ | \\[^\x00] )+ " @ (?: [a-z\d]+ (-+[a-z\d]+)* \.)+
[a-z\d]+ (-+[a-z\d]+)* /x DUNNO
# Not a valid address
/^/ whatever action is appropriate
You may want to replace /^/ with /^From:\s*/ if this is header checks.
Postfix does not currently support matching unicode with PCRE, so
validating EAI addresses with pcre_table(5) may not yet be possible.
> Also, do you know in which cases would be useful to allow or make use
> of multiple From addresses? Just in case I'm missing something.
>
> Thanks in advanced,
>
> On Tue, Oct 6, 2020 at 10:50 PM Viktor Dukhovni
> <[email protected]> wrote:
> >
> > On Wed, Oct 07, 2020 at 12:27:09AM +0000, Pau Peris wrote:
> >
> > > I'm hosting my dad's webpage which has a contact form (which should be
> > > improved to avoid spam and/or bots) and from time to time someone
> > > types multiple email addresses in the from field of the form so
> > > contact emails with multiple from addresses like "from:
> > > [email protected], [email protected]" are generated. I though that those
> > > kind of messages should get rejected and thought that maybe there was
> > > a builtin restriction for this use case.
> >
> > Therefore, the right solution would be in a content filter or milter,
> > coupled with a solid email address (list) parsing library.
--
Viktor.