John Thompson wrote: > I've gotten a number of image spams that don't trigger FuzzyOcr at all > for some reason, e.g. http://www.os2.dhs.org/~john/DPO.gif [snip] > Using spamassassin-3.2.3, FuzzyOcr-3.4, gocr-0.44, ocrad-0.16 on > FreeBSD-6.2. If I use the FuzzyOcr sample image spams, it seems to work. > What gives?
Old FuzzyOcr, and probably old ocrad. Using FuzzyOcr 3.5.1 (plus patched files to revision 131) and ocrad 0.17 (with 0.16 the test didn't give any result, thanks for making me upgrade): $ spamassassin -x -D FuzzyOcr -t < /c/tmp/Spam\ example.eml ... [2684] dbg: FuzzyOcr: Exec : /usr/local/bin/ocrad -s5 -i /tmp/.spamassassin2340eonVM9tmp/DPO.gif.pnm [2684] dbg: FuzzyOcr: Stdout: >/tmp/.spamassassin2340eonVM9tmp/scanset.ocrad-invert.out [2340] dbg: FuzzyOcr: Saved pid: 2684 [2684] dbg: FuzzyOcr: Stderr: >/tmp/.spamassassin2340eonVM9tmp/scanset.ocrad-invert.err [2340] dbg: FuzzyOcr: Elapsed [2684]: 1.544600 sec. (/usr/local/bin/ocrad: exit 0) [2340] dbg: FuzzyOcr: ocrdata=>>Discount Pharmacy Online [2340] dbg: FuzzyOcr: Special offers: Save up_o 80°/ [2340] dbg: FuzzyOcr: o [2340] dbg: FuzzyOcr: V#GRA ONLY $2.00 [2340] dbg: FuzzyOcr: CIALIS ONL.Y $2.00 [2340] dbg: FuzzyOcr: SOMA ONLY $2.44 [2340] dbg: FuzzyOcr: ULTRAM ONLY $2.28 [2340] dbg: FuzzyOcr: [2340] dbg: FuzzyOcr: .. ... ... ... ... ... .-. ... ... ... ... ... ... ... ... -.. ... ... ... . [2340] dbg: FuzzyOcr: [2340] dbg: FuzzyOcr: For mo.rY information, Please do not click [2340] dbg: FuzzyOcr: Just type: wrm.SiDnpleRXZ.org [2340] dbg: FuzzyOcr: inthe address barofyou browser,then press the Enterkey [2340] dbg: FuzzyOcr: [2340] dbg: FuzzyOcr: <<=end [2340] info: FuzzyOcr: Scanset "ocrad-invert" found word "addressbar" with fuzz of 0.1000 [2340] info: FuzzyOcr: line: "inthe address barofyou browserthen press the enterkey" [2340] info: FuzzyOcr: Scanset "ocrad-invert" found word "cialis" with fuzz of 0.0000 [2340] info: FuzzyOcr: line: "cialis only oo" [2340] info: FuzzyOcr: Scanset "ocrad-invert" found word "click" with fuzz of 0.0000 [2340] info: FuzzyOcr: line: "for mory information please do not click" [2340] info: FuzzyOcr: Scanset "ocrad-invert" found word "offer" with fuzz of 0.0000 [2340] info: FuzzyOcr: line: "special offers save upo ao" ... Content analysis details: (11.5 points, 5.0 required) pts rule name description ---- ---------------------- -------------------------------------------------- -1.4 ALL_TRUSTED Passed through trusted hosts only via SMTP 0.6 HTML_IMAGE_RATIO_02 BODY: HTML has a low ratio of text to image area 0.0 HTML_MESSAGE BODY: HTML included in message 1.5 HTML_IMAGE_ONLY_04 BODY: HTML: images with 0-400 bytes of words 1.4 SARE_GIF_ATTACH FULL: Email has a inline gif 9.5 FUZZY_OCR BODY: Mail contains an image with common spam text inside [Words found:] ["addressbar" in 1 lines] ["cialis" in 1 lines] ["click" in 1 lines] ["offer" in 1 lines] ["browser" in 1 lines] ["soma" in 1 lines] ["type" in 1 lines] ["pharmacy" in 1 lines] [(12 word occurrences found)] ... -- René Berber