On Mon, 1 Dec 2014, OGAWA Hirofumi wrote:

Hi,

I'm not sure if this is valid ml to report issue. If not, sorry.

It is the correct place.

Recently I noticed some Japanese emails are detected as spam, and it is
false positive actually. The cause of it seems to be TVD_SPACE_ENCODED
and TVD_SPACE_RATIO_MINFP.

Both of rules uses __TVD_SPACE_RATIO, and it matches Japanese characters
easily, because Japaneses doesn't have spaces as word separator.

Well, so, the question is, TVD_SPACE_RATIO checks !__ISO_2022_JP_DELIM
with __TVD_SPACE_RATIO in rule.  But TVD_SPACE_ENCODED and
TVD_SPACE_RATIO_MINFP uses __TVD_SPACE_RATIO without !__ISO_2022_JP_DELIM.

Is it intended one?

BTW, if I added "&& !__ISO_2022_JP_DELIM" check to __TVD_SPACE_ENCODED
and TVD_SPACE_RATIO_MINFP rules in .spamassassin/user_prefs, it seems to
prevent the false positive for me.

That is probably a valid exclusion, I will add it in.

Unfortunately the masscheck corpus doesn't have enough Japanese-language ham to catch this. Thank you for the report!

--
 John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
 jhar...@impsec.org    FALaholic #11174     pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
 15 days until Bill of Rights day

Reply via email to