Hello, Is it possible to scan PDFs for phishing URLs?
I tried using php-clamavlib-0.12a with ClamAV 0.91.2 on Ubuntu/x86 7.10 with the standard signatures and Sanesecurity's phishing and scam signatures. I modified php-clamavlib to call cl_load with "CL_DB_STDOPT|CL_DB_PHISHING|CL_DB_PHISHING_URLS" and cl_scanfile with "CL_SCAN_STDOPT|CL_SCAN_PDF". As a test I scanned an HTML e-mail containing a hex encoded URL which was detected as "Phishing.Heuristics.Email.HexURL". I inserted the same URL as a hyperlink in an OpenOffice.org 2.2 (Win32) document and exported it as a PDF. Clamav didn't detect the phishing URL in the exported PDF. I took the exported PDF and ran it through pdftohtml and added some e-mail headers (Return-path, Content-Type, Subject, Date, To, From). The e-mail that I made from the PDF was detected properly as "Phishing.Heuristics.Email.HexURL". I also tried a URL with a spoofed domain from the list in daily.pdb, but I got the same results as above (detected in e-mails but not PDFs). -- Tom Cort Systems Developer Vermont Department of Taxes _______________________________________________ Help us build a comprehensive ClamAV guide: visit http://wiki.clamav.net http://lurker.clamav.net/list/clamav-users.html