On 05/06/2024 10:25, Viktor Dukhovni via mailop wrote:
On Wed, Jun 05, 2024 at 11:08:31AM +0200, Tobias Fiebig via mailop wrote:
Yeah, I misread 8616 there, then; My brain somewhat autoclicked to
"well, if there can be UTF8 you must be able to mime encode."
No, RFC2047 encoding of headers applies only to header parts that are an
ABNF *phrase* in an *unstructured* message header. The primary examples
of this are the Display-Name in "From:", "To:" or "Cc:" and the message
Subject.
In fact, the original distinction between structured and unstructured
headers defined in the RFC2047 just makes parsing extremely complicated
and I personally consider it as an example of a standard being accepted
with a clear violation of KISS principle for no good reason.
Unfortunately, SMTPUTF8 makes it even worse as instead of following
something that works (e.g. punycode) it creates a completely different
state machine for parsing messages otherwise indistinguishable from
generic ASCII compatible emails.
I know that I might be opening a can of worms but I just want to express
my own opinion here.
As Rspamd author, I will not change the existing logic, as it works with
headers as with black boxes making the following steps: unfold ->
rfc2047 decode -> process specific data.
Side comment: I have tried once to build spec-compatible state machine
for the full RFC822 compatible ABNF in Ragel. The resulting machine was
around 65k thousands of lines of C code and clang was unable to compile
it without crashing. GCC managed to compile that code but the machine
was extremely slow (probably because of ICache pollution/locality
issues). That's probably a good example why do we have that many
*slightly* incompatible parsers in the world.
_______________________________________________
mailop mailing list
mailop@mailop.org
https://list.mailop.org/listinfo/mailop