Hello Clamav Team,

To detect some JS includers, I need to create a signature based on HTML 
comment. Here is an example

# cat test.html
<html>
<body>
<!-- This is a malware -->
</body>
</html>

I *need* to include the comment tags to avoid false positives. I tried several 
signatures :
# cat test.ndb
test:7:*:3c212d2d20546869732069732061206d616c77617265202d2d3e
test:7:*:3c212d2d20746869732069732061206d616c77617265202d2d3e
test:3:*:3c212d2d20546869732069732061206d616c77617265202d2d3e
test:3:*:3c212d2d20746869732069732061206d616c77617265202d2d3e

None of them matches.

# clamscan -id test.ndb test.html

----------- SCAN SUMMARY -----------
Known viruses: 4
Engine version: 0.98.7
Scanned directories: 0
Scanned files: 1
Infected files: 0
Data scanned: 0.00 MB
Data read: 0.00 MB (ratio 0.00:1)
Time: 0.007 sec (0 m 0 s)

(I have also tested with lastest github snapshot of clamav-devel with no more 
success)


Why doesn't it match ? Let's run the scan with debug information :

# clamscan -id test.ndb test.html --debug
(... snip ...)
LibClamAV debug: Recognized ASCII text
LibClamAV debug: Matched signature for file type HTML data at 0
LibClamAV debug: cache_check: e7a3239dc6d11597df1a03a6a8a55854 is negative
LibClamAV debug: in cli_scanhtml()
LibClamAV debug: cli_scanhtml: using tempdir /tmp/clamav-
a13a0761052e94cf406a02db25f7c324.tmp
LibClamAV debug: hashtab: Freeing hashset, elements: 0, capacity: 0
LibClamAV debug: hashtab: Freeing hashset, elements: 0, capacity: 0
LibClamAV debug: cli_magic_scandesc: returning 0  at line 2334
LibClamAV debug: cache_add: e7a3239dc6d11597df1a03a6a8a55854 (level 0)
LibClamAV debug: Cleaning up phishcheck
LibClamAV debug: Freeing phishcheck struct
LibClamAV debug: Phishcheck cleaned up

The file is detected as ASCII, is not normalized and not scanned by the 
engine, then the file is detected as HTML, normalized and scanned by the 
engine.

The HTML normalization is removing html comments from the original file. 
That's why it is not detected.

There is 2 soltions to resolve this :

1/ When detecting ASCII file, normalize it and scan it before clamav try to 
detect if it is a html file.

or

2/ When detecting HTML, Clamav generate 2 temp files : "nocomment.html" and 
"notags.html". I suggest to add a third temp file "withcomment.html". 
"withcomment.html" should be normalized (removing space, carriage returns, 
lower ascii, etc) but keeps the html comments.

On my side, a signature is ready to detect hundreds of thousands of 
JS.Includer. I'm ready to publish it in the official Clamav database when this 
new engine feature is ready. This could greatly improve Clamav detection 
ratio.

-- 
Best regards,

Arnaud Jacques
SecuriteInfo.com

Facebook : https://www.facebook.com/pages/SecuriteInfocom/132872523492286
Twitter : @SecuriteInfoCom
_______________________________________________
Help us build a comprehensive ClamAV guide:
https://github.com/vrtadmin/clamav-faq

http://www.clamav.net/contact.html#ml

Reply via email to