At 12:54 PM Tuesday, 1/23/2007, René Berber wrote -=>
Ed Kasky wrote:

> At 10:23 AM Tuesday, 1/23/2007, René Berber wrote -=>
>> Ed Kasky wrote:
>>
>> > With FuzzyOcr 3.5.1 and SA 3.1.7, I noticed this in the log while
>> > debugging my setup:
>> >
>> > 2007-01-23 01:39:23 [16842] Processing Message with ID
>> > "<[EMAIL PROTECTED]>" ("Lacy Silva"
>> > <[EMAIL PROTECTED]> -> "ed" <[EMAIL PROTECTED]>)
>> > 2007-01-23 01:39:23 [16842] GIF: [248x442] submersible.gif (5458)
>> > 2007-01-23 01:39:23 [16842] Found: 1 images
>> > 2007-01-23 01:39:23 [16842] Found GIF header name="submersible.gif"
>> > 2007-01-23 01:39:23 [16842] Image is single non-interlaced...
>> > 2007-01-23 01:39:24 [16842] Calculating image hash for:
>> > /tmp/.spamassassin168423O9h2Ttmp/submersible.gif.pnm
>> > 2007-01-23 01:39:24 [16842] Timed out
>>
>> Look at the timestamp, there was no 10 sec timeout, it was immediate.
>
> I know - that caught my attention right away.

What version of module Time::HiRes do you have?

Time::HiRes is up to date (1.9704)

However, I suppose running a debug would have helped ;-)

[456] info: FuzzyOcr: Calculating image hash for: /tmp/.spamassassin456xeuqXRtmp/CIMG0980.gif.pnm
[456] dbg: FuzzyOcr: Saved pid: 490
[490] dbg: FuzzyOcr: Exec : /usr/local/netpbm/bin/ppmhist -noheader /tmp/.spamassassin456xeuqXRtmp/CIMG0980.gif.pnm
[490] dbg: FuzzyOcr: Stdout: >/tmp/.spamassassin456xeuqXRtmp/ppmhist.info
[490] dbg: FuzzyOcr: Stderr: >/dev/null
[456] dbg: FuzzyOcr: Elapsed [490]: 0.162664 sec. (/usr/local/netpbm/bin/ppmhist: exit 127)
[456] error: FuzzyOcr: Timed out
[456] info: FuzzyOcr: Error calculating the image hash, skipping hash check...
[456] info: FuzzyOcr: Empty Hash, skipping...
[456] dbg: FuzzyOcr: Remove DIR: /tmp/.spamassassin456xeuqXRtmp
[456] dbg: FuzzyOcr: FuzzyOcr ending successfully...
[456] dbg: FuzzyOcr: Processed in 1.138189 sec.

ppmhist couldn't find libnetpbm.so.10 so I added the path and it's working now. Results from parsing one of the sample emails:

1.5 FUZZY_OCR_WRONG_CTYPE  BODY: Mail contains an image with wrong
                            content-type set
                            Image has format "GIF" but content-type is
                            "image/jpeg"
1.5 FUZZY_OCR_WRONG_EXTENSION BODY: Mail contains an image with wrong
                            file extension
                            Image has format "GIF" but file extension is
                            "jpeg"
2.5 FUZZY_OCR_CORRUPT_IMG  BODY: Mail contains a corrupted image
                            Corrupt image: GIF-LIB error: Image is
                            defective, decoding aborted.
15 FUZZY_OCR_KNOWN_HASH   BODY: Mail contains an image with known hash
                            Words found:
                            "company" in 1 lines
                            "recommendation" in 1 lines
                            "target" in 1 lines
                            "price" in 2 lines
                            "service" in 1 lines
                            "stock" in 2 lines
                            (12 word occurrences found)

And I got a hit on an email a few minutes ago as well.

Ed Kasky
~~~~~~~~~
Randomly Generated Quote (56 of 526):
"Every people has a right to choose the sovereignty under which they
shall live."   --Woodroe Wilson

Reply via email to