Hello Clamav Team, To detect some JS includers, I need to create a signature based on HTML comment. Here is an example
# cat test.html <html> <body> <!-- This is a malware --> </body> </html> I *need* to include the comment tags to avoid false positives. I tried several signatures : # cat test.ndb test:7:*:3c212d2d20546869732069732061206d616c77617265202d2d3e test:7:*:3c212d2d20746869732069732061206d616c77617265202d2d3e test:3:*:3c212d2d20546869732069732061206d616c77617265202d2d3e test:3:*:3c212d2d20746869732069732061206d616c77617265202d2d3e None of them matches. # clamscan -id test.ndb test.html ----------- SCAN SUMMARY ----------- Known viruses: 4 Engine version: 0.98.7 Scanned directories: 0 Scanned files: 1 Infected files: 0 Data scanned: 0.00 MB Data read: 0.00 MB (ratio 0.00:1) Time: 0.007 sec (0 m 0 s) (I have also tested with lastest github snapshot of clamav-devel with no more success) Why doesn't it match ? Let's run the scan with debug information : # clamscan -id test.ndb test.html --debug (... snip ...) LibClamAV debug: Recognized ASCII text LibClamAV debug: Matched signature for file type HTML data at 0 LibClamAV debug: cache_check: e7a3239dc6d11597df1a03a6a8a55854 is negative LibClamAV debug: in cli_scanhtml() LibClamAV debug: cli_scanhtml: using tempdir /tmp/clamav- a13a0761052e94cf406a02db25f7c324.tmp LibClamAV debug: hashtab: Freeing hashset, elements: 0, capacity: 0 LibClamAV debug: hashtab: Freeing hashset, elements: 0, capacity: 0 LibClamAV debug: cli_magic_scandesc: returning 0 at line 2334 LibClamAV debug: cache_add: e7a3239dc6d11597df1a03a6a8a55854 (level 0) LibClamAV debug: Cleaning up phishcheck LibClamAV debug: Freeing phishcheck struct LibClamAV debug: Phishcheck cleaned up The file is detected as ASCII, is not normalized and not scanned by the engine, then the file is detected as HTML, normalized and scanned by the engine. The HTML normalization is removing html comments from the original file. That's why it is not detected. There is 2 soltions to resolve this : 1/ When detecting ASCII file, normalize it and scan it before clamav try to detect if it is a html file. or 2/ When detecting HTML, Clamav generate 2 temp files : "nocomment.html" and "notags.html". I suggest to add a third temp file "withcomment.html". "withcomment.html" should be normalized (removing space, carriage returns, lower ascii, etc) but keeps the html comments. On my side, a signature is ready to detect hundreds of thousands of JS.Includer. I'm ready to publish it in the official Clamav database when this new engine feature is ready. This could greatly improve Clamav detection ratio. -- Best regards, Arnaud Jacques SecuriteInfo.com Facebook : https://www.facebook.com/pages/SecuriteInfocom/132872523492286 Twitter : @SecuriteInfoCom _______________________________________________ Help us build a comprehensive ClamAV guide: https://github.com/vrtadmin/clamav-faq http://www.clamav.net/contact.html#ml