Stefan and guys!!! You are awesome!!!

All I did was aptitude install fuzzyocr. Nothing else. I re-ran the
test again, and this particular spam scored for fuzzyOCR and got a
score of 16!!!

Here's the new score:

#############

 pts rule name              description
---- ---------------------- --------------------------------------------------
 0.0 HTML_MESSAGE           BODY: HTML included in message
 0.0 BAYES_50               BODY: Bayesian spam probability is 40 to 60%
                            [score: 0.5085]
 3.0 RCVD_IN_XBL            RBL: Received via a relay in Spamhaus XBL
                            [88.236.102.45 listed in zen.spamhaus.org]
 0.9 RCVD_IN_PBL            RBL: Received via a relay in Spamhaus PBL
 0.8 SHORT_HELO_AND_INLINE_IMAGE Short HELO string, with inline image
 0.1 RDNS_NONE              Delivered to trusted network by a host with no rDNS
  12 FUZZY_OCR              BODY: Mail contains an image with common spam text 
inside
                            [Words found:]
                            ["cia***" in 3 lines]
                            ["via***" in 3 lines]
                            [(9 word occurrences found)]

On Fri, Apr 24, 2009 at 10:52:30PM +0200, Stefan Luetje wrote:
> Am 24. Apr 2009 um 22:12 CEST schrieb Igor Chudov:
> > I get plenty of these also, and cannot get them to score well. 
> > 
> > These advertise knockoffs of bestselling Pfizer products. The text is
> > meaningless garbage text. The sales message is contained in a PNG
> > image, but it could be other image types like jpeg. 
> > 
> >        http://igor.chudov.com/tmp/spam008.txt
> > 
> > Any ides what I can do?
> 
> You can install FuzzyOcr
> <http://wiki.apache.org/spamassassin/FuzzyOcrPlugin>
> 
> ,----
> | X-Spam-Status: Yes, score=19.8 required=5.0 
> tests=BADRELAY,BAYES_99,FUZZY_OCR,
> |     HK_IMGSPAM,HTML_MESSAGE,SAGREY autolearn=no version=3.2.5
> | X-Spam-Relay-Country: US TR
> | X-Spam-Report: =?ISO-8859-1?Q?
> |     *  3.5 BAYES_99 BODY: Spamwahrscheinlichkeit nach Bayes-Test: 99-100%
> |     *      [score: 1.0000]
> |     *  0.3 HTML_MESSAGE BODY: Nachricht enth=e4lt HTML
> |     *  2.5 BADRELAY bad Relay
> |     *  2.0 HK_IMGSPAM Inline image in message, Bayes think it's spam
> |     *   10 FUZZY_OCR BODY:
> |     *  1.0 SAGREY Adds 1.0 to spam from first-time senders
> `----
> 
> ,----[ fuzzyocr.log ]
> | 2009-04-24 22:30:08 [9756] Scanset "ocrad" found word "cialis" with fuzz of 
> 0.0000
> |                       line: "ur prce viagra  cialis special offer"
> | 2009-04-24 22:30:08 [9756] Scanset "ocrad" found word "cialis" with fuzz of 
> 0.0000
> |                       line: "lgg cialis special offer"
> | 2009-04-24 22:30:08 [9756] Scanset "ocrad" found word "viagra" with fuzz of 
> 0.0000
> |                       line: "ur prce viagra  cialis special offer"
> | 2009-04-24 22:30:08 [9756] Scanset "ocrad" found word "viagra" with fuzz of 
> 0.1667
> |                       line: "l ls lo x vagra loo mg  lo x cals omg"
> | 2009-04-24 22:30:08 [9756] Scanset "ocrad" found word "viagra" with fuzz of 
> 0.0000
> |                       line: " viagra hot offer"
> | 2009-04-24 22:30:08 [9756] Scanset "ocrad" generates enough hits (5), 
> skipping further scansets...
> | 2009-04-24 22:30:08 [9756] Message is spam, score = 10.500
> | 2009-04-24 22:30:08 [9756] Adding Hash to 
> "/home/stefan/.fuzzyocr/FuzzyOcr.hashdb"
> | 2009-04-24 22:30:08 [9756] Words found:
> |                       "cialis" in 2 lines
> |                       "viagra" in 3 lines
> |                       (7.5 word occurrences found)
> `----
> 
> 
> Greets
> Stefan
>   


Reply via email to