On Sat, 2 Feb 2013, Eliezer Croitoru wrote:

I just need to know about a pattern match in the content since it's a form.

There are existing rules to detect fill-in-the-form emails. Are any of the FILL_FORM family of rules hitting those messages?

If the form text is in hebrew it likely won't; if you want to send me samples of those messages off-list, ideally as RFC-822 attachments, I'll be happy to see if I can add hebrew variants to the form rules.

This address spam is pretty specific.

If the spams are from the same address then you can blacklist that address as Martin suggested.

This is why I wanted to use specific check for this kind of mail.
The start and end has specific percentage of Hebrew language.
Most of the mail should be in hebrew and if there is more then 50 percent of the body in english it's 100% spam.

That's the difficult part.

It's easy to look for specific strings in the body, or specific things like the ratio of text to whitespace or text to images, but trying to *interpret* the text to do something like detect which language it is in is a *hard* problem. Even more so if you want to detect that the message body is in more than one language, and determine the ratios.

The closest we can come today is to look at the character set of the message and try to guess from that whether the *entire* message is in a "foreign" language. This runs into problems where the character set of the message supports multiple languages, like UTF-8 or some of the character sets used by Windows.

Do you have Bayes enabled? If so, are you training these messages as spam? If you are doing this, then they should eventually hit BAYES_99 and if there are any other spammy characteristics that would probably be enough to detect them.

If you would upload a few of these spams to someplace like pastebin and point us at them then we will be able to do better than just guess and make general suggestions.

--
 John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
 jhar...@impsec.org    FALaholic #11174     pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   "A well educated Electorate, being necessary to the liberty of a
    free State, the Right of the People to Keep and Read Books,
    shall not be infringed."
  ...means only registered voters can read books, and only those books
  obtained with State permission from State-controlled bookstores?
-----------------------------------------------------------------------
 10 days until Abraham Lincoln's and Charles Darwin's 204th Birthdays

Reply via email to