On 7/2/25 3:45 PM, Alex wrote:
Hi, I'm seeing an increase in the number of QR code spam that isn't being caught. I'm not even sure it's being checked using zbarimg. Here's what I have in ExtractText.cf:extracttext_external zbar /usr/bin/zbarimg -D {} extracttext_use zbar .jpg .png .pdf .webp image/(?:jpeg|png) application/pdf add_header all ExtractText-Uris _EXTRACTTEXTURIS_
adding .pdf to "extracttext_use" should be enough in this case
Here's an example of the encoded PDF in an email that appears not to have been scanned. Should I add "application/octet-stream" to the extracttext_use line in addition to the others? --===============5303414978067341145== Content-Type: application/octet-stream MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename="Telecommuting_Policy_2025-07-01.pdf" JVBERi0xLjQKMSAwIG9iago8PAovVGl0bGUgKP7/AFIAZQBtAG8AdABlACAAVwBvAHIAawAgAFAA bwBsAGkAYwB5ACAAfAAgAEkAbgB0AGUAcgBuAGEAbAAgAEgAUgAgAFAAbwByAHQAYQBsKQovQ3Jl YXRvciAo/v8AdwBrAGgAdABtAGwAdABvAHAAZABmACAAMAAuADEAMgAuADYpCi9Qcm9kdWNlciAo When I run zbarimg on the saved PDF directly, it does reveal the QR-code link within the PDF. Also, it's very slow because it has to spawn the binary with every request. Is there a way to load it into memory or use a library version to avoid having to do this every time? Sometimes salespeople send emails to 50+ people at a time with a legitimate PDF, but it has to spawn zbarimg for each of them, nevertheless, so it could eventually be a denial-of-service.
maybe it could be possible to add a cache layer to extracttext plugin, could you open an enhancement request on https://bz.apache.org/SpamAssassin/ please ? Thanks Giovanni
OpenPGP_signature.asc
Description: OpenPGP digital signature