One problem that we're running into is that we encounter web pages and cgi scripts that are "inconsistently" normalized. I put "inconsistently" in quotes because without fully knowing the way ClamAV normalizes files, it is sometimes difficult to understand why two similar files might be normalized differently. For example, a PHP script that doesn't contain HTML tags will be normalized using 'ascii-normalise', while the exact same PHP code will be normalized with 'html-normalise' if it happens to be tacked on to some HTML.
This seems to be particularly prevalent with phishing kits, where we want to write a signature based on the PHP code, not necessarily on the HTML. As a result, we end up having to write two signatures because HTML normalization seems to remove the spaces around equal signs, while ASCII normalization leaves them in. Additionally, HTML normalization uses double-quotes (") to replace single-quotes (') while ASCII normalization leaves them as their original. Example: $ip = getenv("REMOTE_ADDR"); $password = $_POST['password']; ASCII normalized: $ip = getenv("remote_addr"); $password = $_post['password']; HTML normalized: $ip=getenv("remote_addr"); $password=$_post["password"]; So, my question is this: How can we get PHP tags ( <? and <?php ) marked as 'HTML' file type so they are normalized the same as other 'web' files? Also, there are more than a few HTML files that browsers render 'properly' that don't contain the following tags: '<html>' '<head>' '<a*href' '<img' '<script' '<object' '<iframe' '<table' A few other tags, such as <style, <!doctype, <meta, <title, <form, might help as well as fixing the html and head tags to only require the leading < (<html instead of <html>) --Maarten
_______________________________________________ clamav-users mailing list clamav-users@lists.clamav.net https://lists.clamav.net/mailman/listinfo/clamav-users Help us build a comprehensive ClamAV guide: https://github.com/vrtadmin/clamav-faq http://www.clamav.net/contact.html#ml