On 20.02.23 16:25, Wolfgang Breyha wrote:
Is there a way with SA4 to process inline images (SRC=data;image/...) as
images (like attachments)? eg. to feed them to ExtractText.

I see that SA tries to call URI canonicalization, but that's it so far?

I use config directives as described in documentation for in Mail::SpamAssassin::Plugin::ExtractText

extracttext_external    tesseract       {OMP_THREAD_LIMIT=1} /usr/bin/tesseract 
-c page_separator= {} -
extracttext_use         tesseract       .jpg .png .bmp .tif .tiff 
image/(?:jpeg|png|x-ms-bmp|tiff)

and looks like it works.

...one just needs to have tesseract installed.


--
Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
LSD will make your ECS screen display 16.7 million colors

Reply via email to