you dont need this
I see, I stand corrected!
maybe ask how to configure extracttext ?
On 17.03.23 13:59, Michael Grant via users wrote:
Sure, I'd be happy to see some examples. The man page looks pretty
straight forward.
I use exactly what's in the docs and it seems to work.
I have added for debugging:
add_header all ExtractText-Chars _EXTRACTTEXTCHARS_
add_header all ExtractText-Words _EXTRACTTEXTWORDS_
add_header all ExtractText-Tools _EXTRACTTEXTTOOLS_
add_header all ExtractText-Types _EXTRACTTEXTTYPES_
add_header all ExtractText-Extensions _EXTRACTTEXTEXTENSIONS_
add_header all ExtractText-Flags _EXTRACTTEXTFLAGS_
(I use spamass-milter so these headers don't appear in the incoming mail,
only when I feet it to SA)
I see it depends on some external tools like tesseract and odt2txt so
I had better install those first.
I have not had good luck with tesseract out of the box, I wonder if
there's some options to tune it to make it work better. Is there
anything better?
I have looked at gocr/ocrad/tesseract >15 years ago, at that time gocr seemed
to be the best alternative.
Since then, google started sponsoring tesseract and it seems to be the best.
you just need to install scripts and language files for it.
To see how well this is working, I am hoping to be able to see the
output of these tools with -D so I can write some rules.
Similarly, is there a way to see the 'body' text that is fed into the
rules? I don't see that in the output of -D. By 'body', I mean the
text with the html cleaned out of it plus the subject line. I have a
message and I want to write a new body rule, I want to see what
spamassassin is using as the 'body' so I can write the regex. I don't
see the body text in -D.
no idea here
--
Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
I intend to live forever - so far so good.