On Jun 28, 2019, at 11:33 AM, Antony Stone 
<antony.st...@spamassassin.open.source.it> wrote:
> 
> Indeed - people even promote its use:
> 
> https://litmus.com/blog/the-little-known 
> <https://litmus.com/blog/the-little-known>

Uuuggggghhhhhhhhhh. I'd argue they deserve to be classified as spam just for 
doing that. =P  I know, I know... opt-in and all that.

Well, let's hope that they stay with the ZWNJ trick and keep their hands off 
ZWS.

Regarding tuning: I would say that any of the ZW chars, if surrounding by 
standard roman chars, should count as spammy obfuscation.  This would be true 
certainly for ZWS, but I'd argue for ZW[N]J as well.  If you match on something 
like [A-Za-z0-9]<ZW>[A-Za-z0-9] then that should probably work, and would avoid 
an FP on the whole &nbsp;&zwnj;&nbsp; nonsense since the chars on either side 
are non-alphanum.  On the other hand, this wouldn't capture the 
full-obfuscation method where every letter is represented with its unicode/HTML 
entity equivalent.

Are there any plans to modify normalize_charset so that it strips out these ZW 
entities?

Cheers.

--- Amir

Reply via email to