--- On Wed 03/02, Matt Kettler < [EMAIL PROTECTED] > wrote: That part is definitely NOT safe in the context of spamassassin... Nonsense looks a lot like bugs in spam mailers, and very little like legitimate email to SA.
If nothing else, consider the tripwire rules, which look for letter combinations that don't exist in normal English... ----------- Thanks! If so, then it's a bit more work to implement. For example, a trivial idea is not to let the attachments, which stem from images, go through the rules that search for nonsense. I meant 'safe' in the following sense: if the tool says some meaningful word (e.g. present in the english wordlist up to a small misspell), then this word is surely present in the image up to a small misspell. So, if some spam rule sees "viagra" or 'click here to get removed' after OCRing, then it is 'safe' to give a hit for it, for example. Another work-intensive method could be as follows (corrections are welcome) 1. OCR. 2. Throw out all the words which are not in the english (german, russian, etc...) dictionary up to a misspell. E.g. tolerate at most one error per word. Correct the misspelled words. (Fast dictionary search required, e.g. represent wordlists as binary balanced trees.) 3. run other text-based rules. Actually, I posted because I get too much image spam (which goes ok through SA) and tried to determine the possibility of catching it with the present tools. Sometimes I get photos and image-smileys so I'm very reluctant to stop all mails containing images without inspecting images. My strong belief is that such tools as gocr can really help. The other question is how to integrate it in SA and who does it. I'm afraid I cannot dig into the SA code myself; so it's a suggestion to the advanced users and developers. Regards, sasha. _______________________________________________ Join Excite! - http://www.excite.com The most personalized portal on the Web!