On Sat, 2 Feb 2013, Eliezer Croitoru wrote:
I just need to know about a pattern match in the content since it's a form.
There are existing rules to detect fill-in-the-form emails. Are any of the
FILL_FORM family of rules hitting those messages?
If the form text is in hebrew it likely won't; if you want to send me
samples of those messages off-list, ideally as RFC-822 attachments, I'll
be happy to see if I can add hebrew variants to the form rules.
This address spam is pretty specific.
If the spams are from the same address then you can blacklist that address
as Martin suggested.
This is why I wanted to use specific check for this kind of mail.
The start and end has specific percentage of Hebrew language.
Most of the mail should be in hebrew and if there is more then 50 percent of
the body in english it's 100% spam.
That's the difficult part.
It's easy to look for specific strings in the body, or specific things
like the ratio of text to whitespace or text to images, but trying to
*interpret* the text to do something like detect which language it is in
is a *hard* problem. Even more so if you want to detect that the message
body is in more than one language, and determine the ratios.
The closest we can come today is to look at the character set of the
message and try to guess from that whether the *entire* message is in a
"foreign" language. This runs into problems where the character set of the
message supports multiple languages, like UTF-8 or some of the character
sets used by Windows.
Do you have Bayes enabled? If so, are you training these messages as spam?
If you are doing this, then they should eventually hit BAYES_99 and if
there are any other spammy characteristics that would probably be enough
to detect them.
If you would upload a few of these spams to someplace like pastebin and
point us at them then we will be able to do better than just guess and
make general suggestions.
--
John Hardin KA7OHZ http://www.impsec.org/~jhardin/
jhar...@impsec.org FALaholic #11174 pgpk -a jhar...@impsec.org
key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
"A well educated Electorate, being necessary to the liberty of a
free State, the Right of the People to Keep and Read Books,
shall not be infringed."
...means only registered voters can read books, and only those books
obtained with State permission from State-controlled bookstores?
-----------------------------------------------------------------------
10 days until Abraham Lincoln's and Charles Darwin's 204th Birthdays