test.html
<html>
<body>
THIS IS     A MALWARE
<!-- THIS      IS A MALWARE -->
</html>

Test signatures:
<!-- this is a malware -->
<!-- This is a malware -->
 this is a malware
 This is a malware

test.ndb
test1:3:*:3c212d2d20546869732069732061206d616c77617265202d2d3e
test2:3:*:3c212d2d20746869732069732061206d616c77617265202d2d3e
test3:3:*:20746869732069732061206d616c7761726520
test4:3:*:20546869732069732061206d616c7761726520

Results:
 clamscan -id test.ndb test.html
test.html: test3.UNOFFICIAL FOUND

----------- SCAN SUMMARY -----------
Known viruses: 4
Engine version: 0.98.4
Scanned directories: 0
Scanned files: 1
Infected files: 1
Data scanned: 0.00 MB
Data read: 0.00 MB (ratio 0.00:1)
Time: 0.005 sec (0 m 0 s)

Analysis: Clamscan is removing multiple spaces and comments and converting the text to lower case.

dp



On 1/26/16 2:49 AM, Arnaud Jacques / SecuriteInfo.com wrote:
Hello Clamav Team,

To detect some JS includers, I need to create a signature based on HTML
comment. Here is an example

# cat test.html
<html>
<body>
<!-- This is a malware -->
</body>
</html>

I *need* to include the comment tags to avoid false positives. I tried several
signatures :
# cat test.ndb
test:7:*:3c212d2d20546869732069732061206d616c77617265202d2d3e
test:7:*:3c212d2d20746869732069732061206d616c77617265202d2d3e
test:3:*:3c212d2d20546869732069732061206d616c77617265202d2d3e
test:3:*:3c212d2d20746869732069732061206d616c77617265202d2d3e

None of them matches.

# clamscan -id test.ndb test.html

----------- SCAN SUMMARY -----------
Known viruses: 4
Engine version: 0.98.7
Scanned directories: 0
Scanned files: 1
Infected files: 0
Data scanned: 0.00 MB
Data read: 0.00 MB (ratio 0.00:1)
Time: 0.007 sec (0 m 0 s)

(I have also tested with lastest github snapshot of clamav-devel with no more
success)


Why doesn't it match ? Let's run the scan with debug information :

# clamscan -id test.ndb test.html --debug
(... snip ...)
LibClamAV debug: Recognized ASCII text
LibClamAV debug: Matched signature for file type HTML data at 0
LibClamAV debug: cache_check: e7a3239dc6d11597df1a03a6a8a55854 is negative
LibClamAV debug: in cli_scanhtml()
LibClamAV debug: cli_scanhtml: using tempdir /tmp/clamav-
a13a0761052e94cf406a02db25f7c324.tmp
LibClamAV debug: hashtab: Freeing hashset, elements: 0, capacity: 0
LibClamAV debug: hashtab: Freeing hashset, elements: 0, capacity: 0
LibClamAV debug: cli_magic_scandesc: returning 0  at line 2334
LibClamAV debug: cache_add: e7a3239dc6d11597df1a03a6a8a55854 (level 0)
LibClamAV debug: Cleaning up phishcheck
LibClamAV debug: Freeing phishcheck struct
LibClamAV debug: Phishcheck cleaned up

The file is detected as ASCII, is not normalized and not scanned by the
engine, then the file is detected as HTML, normalized and scanned by the
engine.

The HTML normalization is removing html comments from the original file.
That's why it is not detected.

There is 2 soltions to resolve this :

1/ When detecting ASCII file, normalize it and scan it before clamav try to
detect if it is a html file.

or

2/ When detecting HTML, Clamav generate 2 temp files : "nocomment.html" and
"notags.html". I suggest to add a third temp file "withcomment.html".
"withcomment.html" should be normalized (removing space, carriage returns,
lower ascii, etc) but keeps the html comments.

On my side, a signature is ready to detect hundreds of thousands of
JS.Includer. I'm ready to publish it in the official Clamav database when this
new engine feature is ready. This could greatly improve Clamav detection
ratio.


_______________________________________________
Help us build a comprehensive ClamAV guide:
https://github.com/vrtadmin/clamav-faq

http://www.clamav.net/contact.html#ml

Reply via email to