I am using Extracttext from
http://whatever.frukt.org/spamassassin.text.shtml#ExtractText.pm
It extracts text from attached .rtf .doc and some other formats. Then
feeds the results to BAYES and normal body testing.
 
My issues are that it works great with SA 3.2.5, However on the same
server it does not give any results with SA 3.3.1
I downgraded SA back to 3.2.5 and Extracttext works again.
 
The dbg output looks like this in 3.3.1:
Jun 3 07:54:17.447 [11937] dbg: extracttext: Part: application/msword
spam.doc
Jun 3 07:54:17.447 [11937] dbg: extracttext: Match: name "spam.doc" =~
".*\.doc"
Jun 3 07:54:17.534 [11937] dbg: extracttext: External call: antiword
"/usr/bin/antiword","-t","-w","0","-m","UTF-8.txt","-"
Jun 3 07:54:17.537 [11937] info: extracttext: External extraction
command: "/usr/bin/antiword","-t","-w","0","-m","UTF-8.txt","-"
Jun 3 07:54:17.537 [11937] info: extracttext: External extraction
object: 17 application/msword "spam.doc"
Jun 3 07:54:17.538 [11937] info: extracttext: External extraction error:
antiword 0 ?
Jun 3 07:54:17.538 [11937] dbg: extracttext: Match: name "spam.doc" =~
".*\.doc"
Jun 3 07:54:17.538 [11937] dbg: extracttext: External call: unrtf
"/usr/local/bin/unrtf","-t","ExtractText.tags","--nopict"
Jun 3 07:54:17.539 [11937] info: extracttext: External extraction
command: "/usr/local/bin/unrtf","-t","ExtractText.tags","--nopict"
Jun 3 07:54:17.540 [11937] info: extracttext: External extraction
object: 17 application/msword "spam.doc"
Jun 3 07:54:17.540 [11937] info: extracttext: External extraction error:
unrtf 0 ?
Jun 3 07:54:17.616 [11937] dbg: extracttext: Magic:
application/x-ole-storage
Jun 3 07:54:17.617 [11937] dbg: extracttext: Not extracted
Jun 3 07:54:17.617 [11937] dbg: extracttext: X-ExtractText-Words: 0
Jun 3 07:54:17.617 [11937] dbg: extracttext: X-ExtractText-Chars: 0

The dbg output looks like this in 3.2.5:
[7828] dbg: extracttext: Part: application/msword spam.doc
[7828] dbg: extracttext: Match: name "spam.doc" =~ ".*\.doc"
[7828] dbg: extracttext: External call: antiword
"/usr/bin/antiword","-t","-w","0","-m","UTF-8.txt","-"
[7828] info: extracttext: Extracted 40 chars using antiword
[7828] info: extracttext: Text: Viagra
[7828] info: extracttext: Text: Free sex
[7828] info: extracttext: Text: Free porn
[7828] info: extracttext: Text: Cash Out Now
[7828] dbg: extracttext: X-ExtractText-Words: 8
[7828] dbg: extracttext: X-ExtractText-Chars: 40
[7828] dbg: extracttext: X-ExtractText-Tools: antiword
[7828] dbg: extracttext: X-ExtractText-Types: application/msword
[7828] dbg: extracttext: X-ExtractText-Extensions: doc
 
Any thoughts on how to get it to work with 3.3.1?
_____________________________
Scott Ostrander
Staff System Administrator

  

Reply via email to