On Sat, 2 Feb 2013, Eliezer Croitoru wrote:

I wrote something in ruby which actually works fine as a starter.

#code start
spam_content = "the long part from the mail".force_encoding("Windows-1255")

template_hebrew_chars = 270

def hebrew_char(char)
  if (223..251).member?(char.unpack("H*")[0].hex)
    return true
  elsif (192..203).member?(char.unpack("H*")[0].hex)
     return true
  elsif (205..219).member?(char.unpack("H*")[0].hex)
     return true
  end
  return false
end

counter = 0; spam_content.each_char {|char| if hebrew_char(char);counter += 1 ;end;}

if counter == template_hebrew_chars
 puts "this is a spam"
else
 puts "might not be a spam"
end
##code end

Now *that* might be possible in plain SA rules without a plugin: count the number of characters in the message body, and the number of characters that fall in a given range (e.g. those that are hebrew glyphs), and calculate the percentage. I *think* you can do math in meta rules...

However, a plugin would be _much_ more efficient than something like:

  body   __HBRW_CHARS    /[\xc0-\xcb\xcd-\xdb\xdf-\xfb]/
  tflags __HBRW_CHARS    multiple
  body   __TOTAL_CHARS   /\S/
  tflags __TOTAL_CHARS   multiple
  meta   __HBRW_PCT      ((__HBRW_CHARS * 100) / __TOTAL_CHARS)
  meta   HBRW_SPAM       (__HBRW_PCT < 50) && __HBRW_ENCODING

I don't know whether the division in __HBRW_PCT or the less-than comparison in HBRW_SPAM would work, that's totally off the top of my head and untested. I also leave the __HBRW_ENCODING rule as an exercise for the student. :)


--
 John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
 jhar...@impsec.org    FALaholic #11174     pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
  For those who are being swayed by Microsoft's whining about the
  GPL, consider how aggressively viral their Shared Source license is:
  If you've *ever* seen *any* MS code covered by the Shared Source
  license, you're infected for life. MS can sue you for Intellectual
  Property misappropriation whenever they like, so you'd better not
  come up with any Innovative Ideas that they want to Embrace...
-----------------------------------------------------------------------
 10 days until Abraham Lincoln's and Charles Darwin's 204th Birthdays

Reply via email to