On 30 Nov 2018, at 17:49, Amir Caspi wrote:

On Nov 30, 2018, at 7:00 AM, Bill Cole <sausers-20150...@billmail.scconsult.com> wrote:

Since HTML is already getting rendered to text, then perhaps the conversion code should strip (literally, just delete) any zero-width characters during this conversion? That should make normal body rules, and Bayes, function properly, no?

Not if they are *looking for* those characters.

But AFAIK we're only looking for those characters with rawbody rules,

Not so.

because it's really hard to search for them in regular body rules... no?

No.

See the relevant rule cluster (all with 'ZW' in their names) in KAM.cf and __UNICODE_OBFU_ZW in the standard ruleset.

Also see my more generic (but still useful!) __SCC_SHORT_WORDS and derivatives in KAM.cf: it is a body rule that takes advantage of the fact that zero-width typographical control characters create logical word breaks as far as Perl is concerned.




--
Bill Cole

Reply via email to