Something I noticed on a set of emails that were reported to me. I have custom rules to look out for certain names in From:name. The messages should have been caught by them, however upon inspection the name was UTF-8 encoded, and included a character that doesn't seem to render, but interferes with the regex I used. Specifically, the bad actor included a RIGHT-TO-LEFT mark (U+200F, or \xe2\x80\x8f) effectively as a null-space character. The body of the message was also flooded with LEFT-TO-RIGHT (U+200E, or \xe2\x80\x8e) and ZERO WIDTH NO-BREAK SPACE (U+FEFF, or \xef\xbb\xbf) characters randomly placed within the body and within words to interfere with other rules. When debugging the message, it doesn't appear that the characters are normalized, so from SA's perspective it seems like all of these characters have to be accounted for with any rules.
To add, I'm currently on SA 3.6.x. It looks like 4.0 improves UTF-8 handling, but I'm not sure if it would address the behavior I see (though happy to be wrong... albeit not able to update immediately). I'm trying to see if ReplaceTags might be useful, and found an older discussion in this list on the matter related to the trouble with UTF-8. I checked to see if there were any existing tags that would account for null-space/zero-width space-like characters, but didn't see any. I have no issues working on creating a tag, but wanted to gauge the community to see what their thoughts were while I started down that path.