R: Stock spam in images

2006-10-02 Thread Giampaolo Tomassoni
> You'd need some clever rules... > > As an example, the word "stock" is perfectly valid in emails, but if you > found it in an attached image you'd be pretty sure it was spam. It would be perfectly valid in a, say, graph image too. SA is meant to work in the overall message content. It is not t

R: Stock spam in images

2006-10-02 Thread Giampaolo Tomassoni
The real problem is the potentially fuzzy output from the ocr engine: shure all the copies of the very same spam would be detected the same, but what about slightly different copies? Would the "use the sa force" approach be feasible? The use of String::Approx in fuzzyocr has shurely a meaning, b

R: Stock spam in images

2006-10-02 Thread Giampaolo Tomassoni
> On Mon, Oct 02, 2006 at 03:18:58PM +0100, Randal, Phil wrote: > > > undetected). Wouldn't it be better to inject the detected > > > text back to SA? There should be enough variants of spam > > > worlds to let SA fuzzily catch the ones from images. > > > > I think so. Some of the words would b

R: Stock spam in images

2006-10-02 Thread Giampaolo Tomassoni
> > ...omissis... > > How about the FuzzyOCR plugin? That has been discussed quite a bit > here recently. > > http://wiki.apache.org/spamassassin/FuzzyOcrPlugin > > -- > Bowie And, by the way, it seems to work! Actually, the only limit I see is the own-made FuzzyOcr.words (and, maybe, the fa

R: Stock spam in images

2006-10-02 Thread Giampaolo Tomassoni
> I'm a newbie to the list and have been scanning recent posts to see if > what I'm about to ask about has been covered but I haven't seen anything > yet. > > Lately I have been getting more and more of the stock alert spam but now > all the good info is in an image and typically following the ima