Re: ExtractText and zbarimg

giovanni Thu, 03 Jul 2025 08:27:42 -0700

On 7/2/25 3:45 PM, Alex wrote:

Hi, I'm seeing an increase in the number of QR code spam that isn't being 
caught. I'm not even sure it's being checked using zbarimg. Here's what I have 
in ExtractText.cf:


extracttext_external    zbar            /usr/bin/zbarimg -D {}
extracttext_use         zbar            .jpg .png .pdf .webp image/(?:jpeg|png) 
application/pdf
add_header              all             ExtractText-Uris _EXTRACTTEXTURIS_

adding .pdf to "extracttext_use" should be enough in this case

Here's an example of the encoded PDF in an email that appears not to have been scanned. 
Should I add "application/octet-stream" to the extracttext_use line in addition 
to the others?

--===============5303414978067341145==

Content-Type: application/octet-stream
MIME-Version: 1.0
Content-Transfer-Encoding: base64
Content-Disposition: attachment;
  filename="Telecommuting_Policy_2025-07-01.pdf"

JVBERi0xLjQKMSAwIG9iago8PAovVGl0bGUgKP7/AFIAZQBtAG8AdABlACAAVwBvAHIAawAgAFAA
bwBsAGkAYwB5ACAAfAAgAEkAbgB0AGUAcgBuAGEAbAAgAEgAUgAgAFAAbwByAHQAYQBsKQovQ3Jl
YXRvciAo/v8AdwBrAGgAdABtAGwAdABvAHAAZABmACAAMAAuADEAMgAuADYpCi9Qcm9kdWNlciAo

When I run zbarimg on the saved PDF directly, it does reveal the QR-code link 
within the PDF.

Also, it's very slow because it has to spawn the binary with every request. Is 
there a way to load it into memory or use a library version to avoid having to 
do this every time? Sometimes salespeople send emails to 50+ people at a time 
with a legitimate PDF, but it has to spawn zbarimg for each of them, 
nevertheless, so it could eventually be a denial-of-service.


maybe it could be possible to add a cache layer to extracttext plugin, could 
you open an enhancement request on https://bz.apache.org/SpamAssassin/ please ?
 Thanks
  Giovanni

OpenPGP_signature.asc
Description: OpenPGP digital signature

Re: ExtractText and zbarimg

Reply via email to