we also get many of these and FuzzyOcr is doing a good job here:

Inhaltsanalyse im Detail:   (13.3 Punkte, 5.0 benötigt)

Pkte Regelname              Beschreibung
---- ---------------------- --------------------------------------------------
 1.1 EXTRA_MPART_TYPE       Unnötige Parameter in "Content-Type"-Kopfzeile
                            ("...type=")
 1.6 FRT_LITTLE             BODY: ReplaceTags: Little
 0.0 HTML_MESSAGE           BODY: Nachricht enthält HTML
 0.0 BAYES_50               BODY: Spamwahrscheinlichkeit nach Bayes-Test: 40-60%
                            [score: 0.5001]
 0.8 SARE_GIF_ATTACH        FULL: Email has a inline gif
 0.5 RAZOR2_CHECK           Gelistet im "Razor2"-System (http://razor.sf.net/)
 1.6 RCVD_IN_BL_SPAMCOP_NET RBL: Transportiert via Rechner in Liste von
                            www.spamcop.net
               [Blocked - see <http://www.spamcop.net/bl.shtml?213.33.168.29>]
 0.1 RCVD_IN_IMP_SPAMLIST   RBL: Listed in spamrbl.imp.ch
                            [213.33.168.29 listed in spamrbl.imp.ch]
 0.7 MY_CID_AND_STYLE       SARE cid and style
 7.0 FUZZY_OCR              BODY: Mail contains an image with common spam text 
inside
                            Words found:
                            "addressbar" in 1 lines
                            "stock" in 1 lines
                            "cialis" in 1 lines
                            "viagra" in 1 lines
                            "xanax" in 1 lines
                            (5 word occurrences found)

with details in FuzzyOCR.log:

2007-06-13 11:15:27 [16507] Saved: /tmp/.spamassassin16507iEX20Ytmp/raw.eml
2007-06-13 11:15:27 [16507] Wrote: 
/tmp/.spamassassin16507iEX20Ytmp/wNd2KIniaa.gif
2007-06-13 11:15:27 [16507] Found: 1 images
2007-06-13 11:15:27 [16507] Errors to: /tmp/.spamassassin16507iEX20Ytmp/raw.err
2007-06-13 11:15:27 [16507] Analyzing file with content-type="image/gif"
2007-06-13 11:15:27 [16507] pfile => 
/tmp/.spamassassin16507iEX20Ytmp/wNd2KIniaa.gif.pnm
2007-06-13 11:15:27 [16507] efile => 
/tmp/.spamassassin16507iEX20Ytmp/wNd2KIniaa.gif.err
2007-06-13 11:15:27 [16507] Found GIF header name="wNd2KIniaa.gif"
2007-06-13 11:15:27 [16507] Image is interlaced or animated...
2007-06-13 11:15:27 [16507] File contains <4> images, deanimating...
2007-06-13 11:15:27 [16507] Calculating the image hash: 
/tmp/.spamassassin16507iEX20Ytmp/wNd2KIniaa.gif.pnm
2007-06-13 11:15:27 [16507] Got: 
<337515:250:450:234::252:254:252:253:102091::5:4:5:4:1530::252:3:5:78:964::252:25:24:93:939::219:218:220:219:328::246:233:23
3:237:284>
2007-06-13 11:15:35 [16507] Expiring 
<201:218:242:216:88485::0:0:255:29:1990::255:0:0:76:984::0:153:255:119:774::153:0:102:57:587::51:51:153:63:509>
 older th
an 35 days
2007-06-13 11:15:36 [16507] Expiring 
<221:255:255:245:49642::255:255:255:255:25621::0:0:255:29:1304::255:0:0:76:710::0:153:255:119:622::153:0:102:57:589>
 old
er than 35 days
2007-06-13 11:15:41 [16507] Trying: $gocr -i $pfile
2007-06-13 11:15:41 [16507] Trying: $gocr -l 180 -d 2 -i $pfile
2007-06-13 11:15:41 [16507] Trying: $ocrad -c ascii -s5 -T 0.5 $pfile
2007-06-13 11:15:41 [16507] Trying: $ocrad -c ascii -s5 $pfile
2007-06-13 11:15:42 [16507] Trying: $ocrad -c ascii -s5 -T 0.5 -i $pfile
2007-06-13 11:15:42 [16507] Found word "addressbar" in line
                       "intheadarssbarofyourbrowsgrhenprejstheenterkey"
                       with fuzz of 0.2000 scanned with scanset $ocrad -c ascii 
-s5 $pfile
2007-06-13 11:15:42 [16507] Found word "stock" in line
                       "lomstpcegugnxebfastoeive"
                       with fuzz of 0.2000 scanned with scanset $gocr -l 180 -d 
2 -i $pfile
2007-06-13 11:15:42 [16507] Found word "cialis" in line
                       "iiciaiisoniyo"
                       with fuzz of 0.1667 scanned with scanset $gocr -i $pfile
2007-06-13 11:15:42 [16507] Found word "cialis" in line
                       "cialisonlyoo"
                       with fuzz of 0.0000 scanned with scanset $gocr -l 180 -d 
2 -i $pfile
2007-06-13 11:15:42 [16507] Found word "viagra" in line
                       "viagraniooii"
                       with fuzz of 0.0000 scanned with scanset $gocr -i $pfile
2007-06-13 11:15:42 [16507] Found word "viagra" in line
                       "viagraonioo"
                       with fuzz of 0.0000 scanned with scanset $gocr -l 180 -d 
2 -i $pfile
2007-06-13 11:15:42 [16507] Found word "viagra" in line
                       "viaigraonlysoai"
                       with fuzz of 0.1667 scanned with scanset $ocrad -c ascii 
-s5 -T 0.5 $pfile
2007-06-13 11:15:42 [16507] Found word "viagra" in line
                       "viaigraonlygoai"
                       with fuzz of 0.1667 scanned with scanset $ocrad -c ascii 
-s5 $pfile
2007-06-13 11:15:42 [16507] Found word "xanax" in line
                       "xanxinlygoo"
                       with fuzz of 0.2000 scanned with scanset $gocr -i $pfile
2007-06-13 11:15:42 [16507] Found word "xanax" in line
                       "xanaxonlygoo"
                       with fuzz of 0.0000 scanned with scanset $gocr -l 180 -d 
2 -i $pfile
2007-06-13 11:15:42 [16507] Found word "xanax" in line
                       "xanaxonlysoo"
                       with fuzz of 0.0000 scanned with scanset $ocrad -c ascii 
-s5 -T 0.5 $pfile
2007-06-13 11:15:42 [16507] Found word "xanax" in line
                       "xanaxonlygoo"
                       with fuzz of 0.0000 scanned with scanset $ocrad -c ascii 
-s5 $pfile
2007-06-13 11:15:42 [16507] Message is spam, score = 7.000
2007-06-13 11:15:42 [16507] Adding Hash to 
"/home/spam/.spamassassin/FuzzyOcr.db"
2007-06-13 11:15:42 [16507] Digest: 
337515:250:450:234::252:254:252:253:102091::5:4:5:4:1530::252:3:5:78:964::252:25:24:93:939::219:218:220:219:328::246:233:
233:237:284
2007-06-13 11:15:42 [16507] Words found:
                      "addressbar" in 1 lines
                      "stock" in 1 lines
                      "cialis" in 1 lines
                      "viagra" in 1 lines
                      "xanax" in 1 lines
                      (5 word occurrences found)
2007-06-13 11:15:42 [16507] Remove DIR: /tmp/.spamassassin16507iEX20Ytmp
2007-06-13 11:15:42 [16507] FuzzyOcr ending successfully...




Ove Starckjohann



> -----Ursprüngliche Nachricht-----
> Von: Oenus Tech Services [mailto:[EMAIL PROTECTED] 
> Gesendet: Mittwoch, 13. Juni 2007 11:08
> An: users@spamassassin.apache.org
> Betreff: Looks like image spam is coming back (fuzzyocr 
> useless in this situation)
> 
> 
> Some weeks ago I posted a message about fuzzyocr not scoring a spam
> contents gif file with a broken frame. I got confirmation in the list
> from Keith De Souza being able to reproduce the problem. 
> Well, it looks
> like spammers have found their way to deal with fuzzyocr. These days
> we're getting more and more of those image spam messages. If anyone is
> interested in testing the file, here it is:
> 
http://www.anfitrion.net/MvPmAyp9yb.gif

Analysis to the gif file shows that frame #3 is broken.

I'm thinking of disabling fuzzyocr for the time being until the problem
is solved. However, fuzzyocr is still doing a good job on other files.
Does anybody have a suggestion or clue on how to solve this? Is there a
way for fuzzyocr to consider this broken gif images as indecipherable
and mark it accordingly?

TIA

Ignacio

Reply via email to