On Sun, 27 Aug 2006, Justin Mason wrote:

> "John D. Hardin" writes:
> >On Sat, 26 Aug 2006, Loren Wilton wrote:
> >> > That's what I was thinking, and would allow leverage by a lot of
> >> > plugins (e.g. the Word plugin I am prepping to start)...
> >> >
> >> > Create some PerMsgStatus string variable or some such that the body
> >> > rules would be run over...
> >> 
> >> Actually the easy way would probably be to create a new X-Spam
> >> header item that rules could run on.
> >
> >...an X-Spam-mumble header containing the text extracted from an
> >attached Word document? That somehow strikes me as a bad idea...
> 
> Actually, I think it's quite a good one ;)  headers provide a
> good way for plugins to offer name=value metadata pairs for rules
> to match on.

Well, yes, so long as the header does not get inserted into the
rewritten message.

However, there is a much richer set of body text rules than header
rules. I think they should be leveraged against the image text (and
attached document text) as well. After all, they are just variant
delivery methods for the same message: BUY MY SHIT^WSTUFF!

> The idea of sticking text from OCR'd images into the body is
> interesting -- however, I'm not sure it'd be useful in this case.
> One key aspect that makes the rules accurate, is that it's not
> that the text appears *anywhere* in the mail; it's that the text
> appears in an OCR'd image.

Okay, how about this: a "variant-encapsulation" object in $PMS where
the text from images/documents is stuffed, and has the body rules run
over it, and has a multiplier or threshhold or some such that
affects/controls how the score from the body rules against that block
of text are applied to the message as a whole.

What bothers me is the separate list of simplified matching rules that
FuzzyOCR is using. I think that it would be better in the long run to
leverage the rich set of existing body rules rather than having a
separate set of simple rules.

--
 John Hardin KA7OHZ    ICQ#15735746    http://www.impsec.org/~jhardin/
 [EMAIL PROTECTED]    FALaholic #11174    pgpk -a [EMAIL PROTECTED]
 key: 0xB8732E79 - 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
  People seem to have this obsession with objects and tools as being
  dangerous in and of themselves, as though a weapon will act of its
  own accord to cause harm. A weapon is just a force multiplier. It's
  *humans* that are (or are not) dangerous.
-----------------------------------------------------------------------
 23 days until Talk Like a Pirate day

Reply via email to