Win.Trojan.URLspoof-2
We’re encountering some issues with this particular “virus”, and having worked 
through what we’re seeing, I wanted to ask a couple of questions..
The signature is pretty weak.

[main.ndb] Win.Trojan.URLspoof-2:0:*:20687265663d22*0125303040*223e*3c2f


We’ve seen hits against this signature 14 times in 8 years (I’m not sure how 
long it’s been in the defs, but we’ve been checking our ~20Mil files against 
ClamAV for 8 years).
Every hit for Win.Trojan.URLspoof-2 we’ve seen is a false positive.
Breaking the signature sequence into parts reveals the weakness of this 
particular signature:

Group 1:  20687265663d22 = ’ href=’
Group 2:  0125303040 = ‘\x01%00@’
Group 3: 223e = ‘">’
Group 4: 3c2f = ‘</’

This false positives is appearing in WARC files 
(http://iipc.github.io/warc-specifications/), and its earlier variant ARC 
(http://archive.org/web/researcher/ArcFileFormat.php)
I’ve been pulling these containers apart, and can see that we only get a hit 
when the signature parts are found across the content container, so for us,  
group 1 appearing in any piece of HTML, group 2 appearing in a variety of file 
formats including PDF, MP3, MP4 and JPG. Groups 3 and 4 are trivial and appear 
everywhere. The point here, is that it is never caused by a single file as 
would found in the wild, only through the aggregation we undertake ourselves 
when creating these WARC files.

We run a slightly non-standard conf:

# MaxScanSize
# Default: 100M
MaxScanSize 2048M

And

# MaxFileSize
# Default: 25M
MaxFileSize 2048M

Questions:

1)      How would I go about getting this signature either removed or hardened? 
For example, if the signature is specifically hunting for a URL, perhaps it 
could be confined to the max URL length * 2 or some such 
(http://stackoverflow.com/questions/417142/what-is-the-maximum-length-of-a-url-in-different-browsers)
 say 4000 bytes. As I’ve never seen a positive hit against this signature, and 
I have no idea how common it is or what its actually looking for. Removing it 
might not be a great idea.

Is there any resources that might help me to work on a stronger signature for 
this particular threat, and what’s the process for suggesting a 
revision/removal?

2)      These hits all happen in the W/ARC container. These containers are 
simple serialisations of arbitrary files harvested from websites, and their 
associated HTTP transaction. These are used to “replay” web harvests (like the 
wayback machine etc). Is there any way we can handle these particular file 
types differently? As these files are aggregations of any number of binary 
items we are much more likely to encounter false positives, especially for weak 
signatures. We’ve only seen false positives for the Trojan URL signature, but I 
anticipate seeing more when we process the 80Tbs of WARCs we have waiting to 
come in – these will translate into ~2billion files housed in several hundred 
thousand WARC files.

Ideally we ought to be ripping the (W)ARC into its binary parts – by parsing an 
arbitrary aggregation of many files as a coherent file of single payload I 
think we’re doing ourselves a disservice. I wondered if there was a method 
within the ClamAV architecture that would support the construction of a WARC 
parser. This might allow WARC files to be “properly” consumed as a series of 
disconnected binary items, reducing the likelihood of false positives.

We are also looking at what it would mean for our workflow to explode the 
W/ARCs into their parts before they are presented for scanning, and that’s a 
viable option. For now I’m mainly interested in knowing what we could/could not 
do.


Jay Gattuso | Digital Preservation Analyst | Preservation, Research and 
Consultancy
National Library of New Zealand | Te Puna Mātauranga o Aotearoa
PO Box 1467 Wellington 6140 New Zealand | +64 (0)4 474 3064
jay.gatt...@dia.govt.nz<mailto:jay.gatt...@natlib.govt.nz>

_______________________________________________
clamav-users mailing list
clamav-users@lists.clamav.net
http://lists.clamav.net/cgi-bin/mailman/listinfo/clamav-users


Help us build a comprehensive ClamAV guide:
https://github.com/vrtadmin/clamav-faq

http://www.clamav.net/contact.html#ml

Reply via email to