> You'd need some clever rules...
>
> As an example, the word "stock" is perfectly valid in emails, but if you
> found it in an attached image you'd be pretty sure it was spam.
It would be perfectly valid in a, say, graph image too. SA is meant to work in
the overall message content. It is not t
The real problem is the potentially fuzzy output from the ocr engine: shure all
the copies of the very same spam would be detected the same, but what about
slightly different copies? Would the "use the sa force" approach be feasible?
The use of String::Approx in fuzzyocr has shurely a meaning, b
> On Mon, Oct 02, 2006 at 03:18:58PM +0100, Randal, Phil wrote:
> > > undetected). Wouldn't it be better to inject the detected
> > > text back to SA? There should be enough variants of spam
> > > worlds to let SA fuzzily catch the ones from images.
> >
> > I think so. Some of the words would b
>
> ...omissis...
>
> How about the FuzzyOCR plugin? That has been discussed quite a bit
> here recently.
>
> http://wiki.apache.org/spamassassin/FuzzyOcrPlugin
>
> --
> Bowie
And, by the way, it seems to work!
Actually, the only limit I see is the own-made FuzzyOcr.words (and, maybe, the
fa
> I'm a newbie to the list and have been scanning recent posts to see if
> what I'm about to ask about has been covered but I haven't seen anything
> yet.
>
> Lately I have been getting more and more of the stock alert spam but now
> all the good info is in an image and typically following the ima