Hi, On Sat, Aug 17, 2024 at 12:14 PM <giova...@paclan.it> wrote:
> On 8/16/24 2:03 PM, Alex wrote: > > The body was empty with a PDF attachment. It's too big for pastebin. > > > https://drive.google.com/file/d/1FzBgTKoBgRp7TWkqjWqSqqESYmCGH0G2/view?usp=sharing > < > https://drive.google.com/file/d/1FzBgTKoBgRp7TWkqjWqSqqESYmCGH0G2/view?usp=sharing > > > > > > Any success stories with setting up zbar for QR code spam would also be > appreciated :-) > > With this rule the QR-code is extracted correctly. > > extracttext_external zbar /usr/local/bin/zbarimg -q -D {} > extracttext_use zbar .jpg .png .pdf .webp > image/(?:jpeg|png) application/pdf > add_header all ExtractText-Uris _EXTRACTTEXTURIS_ > Is it possible zbar is competing with pdftotext for which content it contains? Looks like it's either unable to identify the image or unable to extract the link, perhaps because pdftotext is processing it instead? X-Spam-ExtractText-Uris: X-Spam-ExtractText-Chars: 323 X-Spam-ExtractText-Words: 35 X-Spam-ExtractText-Tools: pdftotext X-Spam-ExtractText-Types: application/pdf X-Spam-ExtractText-Extensions: pdf X-Spam-ExtractText-Flags: Here's my ExtractText.cf. I've verified all paths exist. Hopefully gmail doesn't truncate the lines. It does hit EXTRACTTEXT. extracttext_external pdftotext /usr/bin/pdftotext -nopgbrk -layout -enc UTF-8 {} - extracttext_use pdftotext .pdf application/pdf # http://docx2txt.sourceforge.net extracttext_external docx2txt /usr/local/bin/docx2txt.pl {} - extracttext_use docx2txt .docx application/docx extracttext_external antiword /usr/bin/antiword -t -w 0 -m UTF-8.txt {} extracttext_use antiword .doc application/(?:vnd\.?)?ms-?word.* extracttext_external unrtf /usr/bin/unrtf --nopict {} extracttext_use unrtf .doc .rtf application/rtf text/rtf extracttext_external odt2txt /usr/bin/odt2txt --encoding=UTF-8 {} extracttext_use odt2txt .odt .ott application/.*?opendocument.*text extracttext_use odt2txt .sdw .stw application/(?:x-)?soffice application/(?:x-)?starwriter extracttext_external tesseract {OMP_THREAD_LIMIT=1} /usr/bin/tesseract -c page_separator= {} - extracttext_use tesseract .jpg .png .bmp .tif .tiff image/(?:jpeg|png|x-ms-bmp|tiff) # QR-code decoder extracttext_external zbar /usr/bin/zbarimg -q -D {} extracttext_use zbar .jpg .png .pdf .webp image/(?:jpeg|png) application/pdf add_header all ExtractText-Uris _EXTRACTTEXTURIS_ add_header all ExtractText-Flags _EXTRACTTEXTFLAGS_ header PDF_NO_TEXT X-ExtractText-Flags =~ /\bpdftotext_NoText\b/ describe PDF_NO_TEXT PDF without text score PDF_NO_TEXT 0.001 header DOC_NO_TEXT X-ExtractText-Flags =~ /\b(?:antiword|openxml|unrtf|odt2txt)_NoText\b/ describe DOC_NO_TEXT Document without text score DOC_NO_TEXT 0.001 header EXTRACTTEXT exists:X-ExtractText-Flags describe EXTRACTTEXT Email processed by extracttext plugin score EXTRACTTEXT 0.001