Re: Malformed UTF-8 character with SA 3.2.5

Sergey Kovalev Fri, 09 Jan 2009 02:47:37 -0800

Mark Martinec wrote:

Our log file contains many:
 amavis[19738]: (19738-05) _WARN: Malformed UTF-8 character (unexpected
continuation byte 0x8e, with no preceding start byte) in pattern match
(m//) at
/var/lib/spamassassin/3.002005/70_sare_specific_cf_sare_sa-update_dostech_n
et/200605280300.cf, rule SARE_SPEC_REPL_OBFU2, line 1, <GEN16> line 3620.


I searched our logs for something similar and came up with a possibly related
case, but in a different code section. Here is mine (using SA 3.3):

rules: failed to run TVD_STOCK1 test, skipping:(Malformed UTF-8 character (fatal)at /usr/local/lib/perl5/site_perl/5.10.0/Mail/SpamAssassin/Plugin/BodyEval.pmline 250, <GEN11> line 499.


This one is  within sub _check_stock_info, evaluating the regexp:
  $rnd_chunk =~ /^\s*([^:\s][^:\n]{2,29})\s*:\s*\S/mg
on a perfectly valid UTF-8 string. It turns out it is a bug in
perl5.8.8, 5.8.9 and in 5.10.0 - the bug goes away if the string
is not tainted.

I'm not quite familiar with utf-8 handling in perl, but it seems to methat there are some different flavours of UTF :)

I'm trying to score cyrillic pr0n messages. And it is not simple.

In order SA to start hitting my regexp rules I've modified Check.pm with`use utf8;' string and added normalize_charset 1 to local.cf.

Now it detects bad words but spamd generates warnings at startup/reload

Jan 9 01:54:40 mx spamd[19814]: Malformed UTF-8 character (unexpectednon-continuation byte 0xe8, immediately after start byte 0xe9) in eval"string" at/var/db/spamassassin/3.002005/updates_spamassassin_org/20_advance_fee.cf,rule __FRAUD_GAN, line 1.Jan 9 01:54:40 mx spamd[19814]: Malformed UTF-8 character (unexpectednon-continuation byte 0x5d, immediately after start byte 0xe8) in eval"string" at/var/db/spamassassin/3.002005/updates_spamassassin_org/20_advance_fee.cf,rule __FRAUD_GAN, line 1.

BTW, I don't need 20_advance_fee.cf but want to useupdates.spamassassin.org channel. Is it possible to somehow ignore some*.cf?

Re: Malformed UTF-8 character with SA 3.2.5

Reply via email to