Typo, I meant to say I was on SA 3.4.6.

On Wed, Aug 30, 2023, 3:22 PM Ricky Boone <ricky.bo...@gmail.com> wrote:

> Something I noticed on a set of emails that were reported to me.
>
> I have custom rules to look out for certain names in From:name.  The
> messages should have been caught by them, however upon inspection the
> name was UTF-8 encoded, and included a character that doesn't seem to
> render, but interferes with the regex I used.  Specifically, the bad
> actor included a RIGHT-TO-LEFT mark (U+200F, or \xe2\x80\x8f)
> effectively as a null-space character.  The body of the message was
> also flooded with LEFT-TO-RIGHT (U+200E, or \xe2\x80\x8e) and ZERO
> WIDTH NO-BREAK SPACE (U+FEFF, or \xef\xbb\xbf) characters randomly
> placed within the body and within words to interfere with other rules.
> When debugging the message, it doesn't appear that the characters are
> normalized, so from SA's perspective it seems like all of these
> characters have to be accounted for with any rules.
>
> To add, I'm currently on SA 3.6.x.  It looks like 4.0 improves UTF-8
> handling, but I'm not sure if it would address the behavior I see
> (though happy to be wrong... albeit not able to update immediately).
>
> I'm trying to see if ReplaceTags might be useful, and found an older
> discussion in this list on the matter related to the trouble with
> UTF-8.  I checked to see if there were any existing tags that would
> account for null-space/zero-width space-like characters, but didn't
> see any.  I have no issues working on creating a tag, but wanted to
> gauge the community to see what their thoughts were while I started
> down that path.
>

Reply via email to