Win.Trojan.URLspoof-2 We’re encountering some issues with this particular “virus”, and having worked through what we’re seeing, I wanted to ask a couple of questions.. The signature is pretty weak.
[main.ndb] Win.Trojan.URLspoof-2:0:*:20687265663d22*0125303040*223e*3c2f We’ve seen hits against this signature 14 times in 8 years (I’m not sure how long it’s been in the defs, but we’ve been checking our ~20Mil files against ClamAV for 8 years). Every hit for Win.Trojan.URLspoof-2 we’ve seen is a false positive. Breaking the signature sequence into parts reveals the weakness of this particular signature: Group 1: 20687265663d22 = ’ href=’ Group 2: 0125303040 = ‘\x01%00@’ Group 3: 223e = ‘">’ Group 4: 3c2f = ‘</’ This false positives is appearing in WARC files (http://iipc.github.io/warc-specifications/), and its earlier variant ARC (http://archive.org/web/researcher/ArcFileFormat.php) I’ve been pulling these containers apart, and can see that we only get a hit when the signature parts are found across the content container, so for us, group 1 appearing in any piece of HTML, group 2 appearing in a variety of file formats including PDF, MP3, MP4 and JPG. Groups 3 and 4 are trivial and appear everywhere. The point here, is that it is never caused by a single file as would found in the wild, only through the aggregation we undertake ourselves when creating these WARC files. We run a slightly non-standard conf: # MaxScanSize # Default: 100M MaxScanSize 2048M And # MaxFileSize # Default: 25M MaxFileSize 2048M Questions: 1) How would I go about getting this signature either removed or hardened? For example, if the signature is specifically hunting for a URL, perhaps it could be confined to the max URL length * 2 or some such (http://stackoverflow.com/questions/417142/what-is-the-maximum-length-of-a-url-in-different-browsers) say 4000 bytes. As I’ve never seen a positive hit against this signature, and I have no idea how common it is or what its actually looking for. Removing it might not be a great idea. Is there any resources that might help me to work on a stronger signature for this particular threat, and what’s the process for suggesting a revision/removal? 2) These hits all happen in the W/ARC container. These containers are simple serialisations of arbitrary files harvested from websites, and their associated HTTP transaction. These are used to “replay” web harvests (like the wayback machine etc). Is there any way we can handle these particular file types differently? As these files are aggregations of any number of binary items we are much more likely to encounter false positives, especially for weak signatures. We’ve only seen false positives for the Trojan URL signature, but I anticipate seeing more when we process the 80Tbs of WARCs we have waiting to come in – these will translate into ~2billion files housed in several hundred thousand WARC files. Ideally we ought to be ripping the (W)ARC into its binary parts – by parsing an arbitrary aggregation of many files as a coherent file of single payload I think we’re doing ourselves a disservice. I wondered if there was a method within the ClamAV architecture that would support the construction of a WARC parser. This might allow WARC files to be “properly” consumed as a series of disconnected binary items, reducing the likelihood of false positives. We are also looking at what it would mean for our workflow to explode the W/ARCs into their parts before they are presented for scanning, and that’s a viable option. For now I’m mainly interested in knowing what we could/could not do. Jay Gattuso | Digital Preservation Analyst | Preservation, Research and Consultancy National Library of New Zealand | Te Puna Mātauranga o Aotearoa PO Box 1467 Wellington 6140 New Zealand | +64 (0)4 474 3064 jay.gatt...@dia.govt.nz<mailto:jay.gatt...@natlib.govt.nz> _______________________________________________ clamav-users mailing list clamav-users@lists.clamav.net http://lists.clamav.net/cgi-bin/mailman/listinfo/clamav-users Help us build a comprehensive ClamAV guide: https://github.com/vrtadmin/clamav-faq http://www.clamav.net/contact.html#ml