On Mon, 1 Dec 2014, OGAWA Hirofumi wrote:
Hi,
I'm not sure if this is valid ml to report issue. If not, sorry.
It is the correct place.
Recently I noticed some Japanese emails are detected as spam, and it is
false positive actually. The cause of it seems to be TVD_SPACE_ENCODED
and TVD_SPACE_RATIO_MINFP.
Both of rules uses __TVD_SPACE_RATIO, and it matches Japanese characters
easily, because Japaneses doesn't have spaces as word separator.
Well, so, the question is, TVD_SPACE_RATIO checks !__ISO_2022_JP_DELIM
with __TVD_SPACE_RATIO in rule. But TVD_SPACE_ENCODED and
TVD_SPACE_RATIO_MINFP uses __TVD_SPACE_RATIO without !__ISO_2022_JP_DELIM.
Is it intended one?
BTW, if I added "&& !__ISO_2022_JP_DELIM" check to __TVD_SPACE_ENCODED
and TVD_SPACE_RATIO_MINFP rules in .spamassassin/user_prefs, it seems to
prevent the false positive for me.
That is probably a valid exclusion, I will add it in.
Unfortunately the masscheck corpus doesn't have enough
Japanese-language ham to catch this. Thank you for the report!
--
John Hardin KA7OHZ http://www.impsec.org/~jhardin/
jhar...@impsec.org FALaholic #11174 pgpk -a jhar...@impsec.org
key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
15 days until Bill of Rights day