Typo, I meant to say I was on SA 3.4.6. On Wed, Aug 30, 2023, 3:22 PM Ricky Boone <ricky.bo...@gmail.com> wrote:
> Something I noticed on a set of emails that were reported to me. > > I have custom rules to look out for certain names in From:name. The > messages should have been caught by them, however upon inspection the > name was UTF-8 encoded, and included a character that doesn't seem to > render, but interferes with the regex I used. Specifically, the bad > actor included a RIGHT-TO-LEFT mark (U+200F, or \xe2\x80\x8f) > effectively as a null-space character. The body of the message was > also flooded with LEFT-TO-RIGHT (U+200E, or \xe2\x80\x8e) and ZERO > WIDTH NO-BREAK SPACE (U+FEFF, or \xef\xbb\xbf) characters randomly > placed within the body and within words to interfere with other rules. > When debugging the message, it doesn't appear that the characters are > normalized, so from SA's perspective it seems like all of these > characters have to be accounted for with any rules. > > To add, I'm currently on SA 3.6.x. It looks like 4.0 improves UTF-8 > handling, but I'm not sure if it would address the behavior I see > (though happy to be wrong... albeit not able to update immediately). > > I'm trying to see if ReplaceTags might be useful, and found an older > discussion in this list on the matter related to the trouble with > UTF-8. I checked to see if there were any existing tags that would > account for null-space/zero-width space-like characters, but didn't > see any. I have no issues working on creating a tag, but wanted to > gauge the community to see what their thoughts were while I started > down that path. >