Oh, absolutely Micah. My scan times were negligible as I was scanning a single PHP that was 150 bytes or so (opening PHP tag, two lines of comments, and a call to phpinfo), so those times I gave were entirely load time.
I'm glad that you found the information helpful. --Maarten On Tue, Apr 9, 2019 at 5:29 PM Micah Snyder (micasnyd) <micas...@cisco.com> wrote: > Maarten, > > > > Your test results are pretty great. I really like your breakdown of the > signatures by category. I will caution that scan times will vary quite > heavily depending on what you’re scanning, based on Target type ( > https://www.clamav.net/documents/clamav-file-types). > > > > In addition, it’s important to distinguish between load and scan times. > The time reported by clamscan is both load + scan. If you just want scan > time, you will want to load the database with clamd and then test the > scantime with clamdscan. > > > > Regarding load time vs scantime, all of the signatures must be loaded, but > depending on the target type of the file being scanned, not all of the > signatures will be matched against the file. That is, daily_Win.ldb might > take the longest to load due to the number of signatures or complexity of > the signatures but when scanning a PDF, they probably won’t impact scan > time, as Win signatures are probably mostly target type 1 (PE file). > > > > I’ve bit of time today investigating what I believe is responsible for > slow load and scan times for the Phishtank sigs. I had a hunch, based on a > conversation we saw a while back in the mailing list, that the identical > beginning for URL-based signatures result in an un-balanced and inefficient > tree for matching. That is, some 3000 signatures each began with either: > > > 1. href="http:// (687265663d22687474703a2f2f) > 2. HYPERLINK"http (48595045524c494e4b2022687474703a2f2f) > 3. S/URI/URI(http:// (532f5552492f55524928687474703a2f2f) > > > > Looking at a few of the Phish.Phishing signatures, these appear to have > the same issue (href="http:// prefix). In testing with scan of a PDF > document, I was able to reduce the scan time from 31.987 sec down to 2.632 > sec simply by changing the start of the Phishtank signatures for the > following: > > > 1. href="http:// > 1. from: 687265663d22687474703a2f2f > 2. to: 687265663d2268747470{3-4} > 2. HYPERLINK "http > 1. from: 48595045524c494e4b2022687474703a2f2f > 2. to: 48595045524c494e4b202268747470{3-4} > 3. S/URI/URI(http:// > 1. from: 532f5552492f55524928687474703a2f2f > 2. to: 532f5552492f5552492868747470{3-4} > > > > This should get the same detection with a faster load and scan time, and > will accommodate for httpS for better coverage. To turn lemonade into > really good lemonade, we may be able to take the above optimization and > apply it to the Phish.Phishing signatures identified by Maarten to reduce > scan times further to levels below those before the addition of the > Phishtank signatures. > > > > As noted by Maarten as well, the Phish.Phishing sigs are Target type 0, > whereas we’d split the Phishtank.Phishing signatures up by target type to > reduce scan times of files where the signatures won’t apply. It should > also speed things up quite a bit for other file types to split those up by > Target types. > > > > Further research into scan time optimization is definitely welcome and > appreciated. > > > > Regards, > > Micah > > > > > > *From: *clamav-users <clamav-users-boun...@lists.clamav.net> on behalf of > Maarten Broekman via clamav-users <clamav-users@lists.clamav.net> > *Reply-To: *ClamAV users ML <clamav-users@lists.clamav.net> > *Date: *Tuesday, April 9, 2019 at 12:00 PM > *To: *ClamAV users ML <clamav-users@lists.clamav.net> > *Cc: *Maarten Broekman <maarten.broek...@gmail.com> > *Subject: *Re: [clamav-users] [External] Re: Scan very slow > > > > Clearly the latest daily.cvd is performing better, but the remaining > "Phishtank" sigs are *not* a majority of the slowness. > > > > I unpacked the current (?) cvd (ClamAV-VDB:09 Apr 2019 03-53 > -0400:25414:1548262:63:X:X:raynman:1554796413) and then ran a test scan > with each part to see what the load times looked like: > > daily.cdb ==== Time: 0.007 sec (0 m 0 s) > > daily.cfg ==== Time: 0.004 sec (0 m 0 s) > > daily.crb ==== Time: 0.006 sec (0 m 0 s) > > *daily.cvd ==== Time: 11.384 sec (0 m 11 s)* > > daily.fp ==== Time: 0.009 sec (0 m 0 s) > > daily.ftm ==== Time: 0.005 sec (0 m 0 s) > > daily.hdb ==== Time: 0.303 sec (0 m 0 s) > > daily.hdu ==== Time: 0.006 sec (0 m 0 s) > > daily.hsb ==== Time: 1.093 sec (0 m 1 s) > > daily.hsu ==== Time: 0.005 sec (0 m 0 s) > > daily.idb ==== Time: 0.006 sec (0 m 0 s) > > *daily.ldb ==== Time: 5.563 sec (0 m 5 s)* > > daily.ldu ==== Time: 0.005 sec (0 m 0 s) > > daily.mdb ==== Time: 0.061 sec (0 m 0 s) > > daily.mdu ==== Time: 0.007 sec (0 m 0 s) > > daily.msb ==== Time: 0.005 sec (0 m 0 s) > > daily.msu ==== Time: 0.005 sec (0 m 0 s) > > daily.ndb ==== Time: 0.017 sec (0 m 0 s) > > daily.ndu ==== Time: 0.005 sec (0 m 0 s) > > daily.pdb ==== Time: 0.010 sec (0 m 0 s) > > daily.sfp ==== Time: 0.006 sec (0 m 0 s) > > daily.wdb ==== Time: 0.014 sec (0 m 0 s) > > > > So, half the run time of a clamscan is from the daily.ldb. To break it > down farther, I split the daily.ldb into "daily_<virus>.ldb" where <virus> > is the first part of the dot-separated signature name. > > daily_Andr.ldb ==== Time: 0.008 sec (0 m 0 s) > > daily_Archive.ldb ==== Time: 0.009 sec (0 m 0 s) > > daily_Asp.ldb ==== Time: 0.004 sec (0 m 0 s) > > daily_Doc.ldb ==== Time: 0.116 sec (0 m 0 s) > > daily_Eicar-Test-Signature.ldb ==== Time: 0.009 sec (0 m 0 s) > > daily_Email.ldb ==== Time: 0.014 sec (0 m 0 s) > > daily_Emf.ldb ==== Time: 0.007 sec (0 m 0 s) > > daily_Heuristics.ldb ==== Time: 0.006 sec (0 m 0 s) > > daily_Html.ldb ==== Time: 0.010 sec (0 m 0 s) > > daily_Hwp.ldb ==== Time: 0.005 sec (0 m 0 s) > > daily_Img.ldb ==== Time: 0.006 sec (0 m 0 s) > > daily_Ios.ldb ==== Time: 0.006 sec (0 m 0 s) > > daily_Java.ldb ==== Time: 0.005 sec (0 m 0 s) > > daily_Js.ldb ==== Time: 0.007 sec (0 m 0 s) > > daily_Legacy.ldb ==== Time: 0.006 sec (0 m 0 s) > > daily_Lnk.ldb ==== Time: 0.005 sec (0 m 0 s) > > daily_Mp4.ldb ==== Time: 0.005 sec (0 m 0 s) > > daily_Multios.ldb ==== Time: 0.005 sec (0 m 0 s) > > daily_Osx.ldb ==== Time: 0.008 sec (0 m 0 s) > > daily_Pdf.ldb ==== Time: 0.007 sec (0 m 0 s) > > *daily_Phish.ldb ==== Time: 1.612 sec (0 m 1 s)* > > daily_Phishtank.ldb ==== Time: 0.146 sec (0 m 0 s) > > daily_Php.ldb ==== Time: 0.006 sec (0 m 0 s) > > daily_Ppt.ldb ==== Time: 0.007 sec (0 m 0 s) > > daily_Py.ldb ==== Time: 0.006 sec (0 m 0 s) > > daily_Rtf.ldb ==== Time: 0.006 sec (0 m 0 s) > > daily_Svg.ldb ==== Time: 0.005 sec (0 m 0 s) > > daily_Swf.ldb ==== Time: 0.007 sec (0 m 0 s) > > daily_Ttf.ldb ==== Time: 0.005 sec (0 m 0 s) > > daily_Txt.ldb ==== Time: 0.009 sec (0 m 0 s) > > daily_Unix.ldb ==== Time: 0.008 sec (0 m 0 s) > > daily_Vbs.ldb ==== Time: 0.009 sec (0 m 0 s) > > *daily_Win.ldb ==== Time: 3.391 sec (0 m 3 s)* > > daily_Xls.ldb ==== Time: 0.009 sec (0 m 0 s) > > daily_Xml.ldb ==== Time: 0.007 sec (0 m 0 s) > > > > "Phish.", not "Phishtank.", and "Win." are the longest run times. Looking > at the *number* of signatures in each, the 'Phish.' signatures are taking > a disproportionate amount of time to load compared to the other signatures: > > 216 daily_Andr.ldb > > 3 daily_Archive.ldb > > 1 daily_Asp.ldb > > 2096 daily_Doc.ldb > > 1 daily_Eicar-Test-Signature.ldb > > 1017 daily_Email.ldb > > 2 daily_Emf.ldb > > 5 daily_Heuristics.ldb > > 250 daily_Html.ldb > > 1 daily_Hwp.ldb > > 15 daily_Img.ldb > > 6 daily_Ios.ldb > > 16 daily_Java.ldb > > 69 daily_Js.ldb > > 27 daily_Legacy.ldb > > 9 daily_Lnk.ldb > > 1 daily_Mp4.ldb > > 9 daily_Multios.ldb > > 175 daily_Osx.ldb > > 132 daily_Pdf.ldb > > 2515 daily_Phish.ldb > > 3516 daily_Phishtank.ldb > > 18 daily_Php.ldb > > 5 daily_Ppt.ldb > > 3 daily_Py.ldb > > 28 daily_Rtf.ldb > > 1 daily_Svg.ldb > > 103 daily_Swf.ldb > > 2 daily_Ttf.ldb > > 140 daily_Txt.ldb > > 222 daily_Unix.ldb > > 21 daily_Vbs.ldb > > 43928 daily_Win.ldb > > 165 daily_Xls.ldb > > 8 daily_Xml.ldb > > > > From the look of it, "Phish." has those REPHISH signatures. Those > signatures seem to be looking at any file (Target 0) and have subsignatures > that are combined to match depending on which filetype they are 'looking' > for (so, href for HTML files, %PDF, Subtype, and URI objects for PDFs, etc) > as opposed to the remaining Phishtank sigs which seem to have a separate > signature depending on the target type. > > > > Breaking up daily_Win into it's constituent sub-parts doesn't reveal any > particular culprit from just a simple scan timing though... > > daily_Win.Adware.ldb ==== Time: 0.013 sec (0 m 0 s) > > daily_Win.Coinminer.ldb ==== Time: 0.009 sec (0 m 0 s) > > daily_Win.Downloader.ldb ==== Time: 0.035 sec (0 m 0 s) > > daily_Win.Dropper.ldb ==== Time: 0.240 sec (0 m 0 s) > > daily_Win.Exploit.ldb ==== Time: 0.016 sec (0 m 0 s) > > daily_Win.Ircbot.ldb ==== Time: 0.006 sec (0 m 0 s) > > daily_Win.Keylogger.ldb ==== Time: 0.010 sec (0 m 0 s) > > daily_Win.ldb ==== Time: 3.418 sec (0 m 3 s) > > daily_Win.Macro.ldb ==== Time: 0.009 sec (0 m 0 s) > > *daily_Win.Malware.ldb ==== Time: 0.731 sec (0 m 0 s)* > > daily_Win.Packed.ldb ==== Time: 0.131 sec (0 m 0 s) > > daily_Win.Packer.ldb ==== Time: 0.008 sec (0 m 0 s) > > daily_Win.Phishing.ldb ==== Time: 0.005 sec (0 m 0 s) > > daily_Win.Proxy.ldb ==== Time: 0.005 sec (0 m 0 s) > > daily_Win.Ransomware.ldb ==== Time: 0.019 sec (0 m 0 s) > > daily_Win.Spyware.ldb ==== Time: 0.006 sec (0 m 0 s) > > daily_Win.Tool.ldb ==== Time: 0.008 sec (0 m 0 s) > > *daily_Win.Trojan.ldb ==== Time: 0.582 sec (0 m 0 s)* > > daily_Win.Virus.ldb ==== Time: 0.059 sec (0 m 0 s) > > daily_Win.Worm.ldb ==== Time: 0.030 sec (0 m 0 s) > > > > 158 daily_Win.Adware.ldb > > 14 daily_Win.Coinminer.ldb > > 561 daily_Win.Downloader.ldb > > 8084 daily_Win.Dropper.ldb > > 216 daily_Win.Exploit.ldb > > 9 daily_Win.Ircbot.ldb > > 193 daily_Win.Keylogger.ldb > > 43928 daily_Win.ldb > > 1 daily_Win.Macro.ldb > > * 14820 daily_Win.Malware.ldb* > > 4209 daily_Win.Packed.ldb > > 20 daily_Win.Packer.ldb > > 2 daily_Win.Phishing.ldb > > 4 daily_Win.Proxy.ldb > > 500 daily_Win.Ransomware.ldb > > 32 daily_Win.Spyware.ldb > > 121 daily_Win.Tool.ldb > > * 12051 daily_Win.Trojan.ldb* > > 1967 daily_Win.Virus.ldb > > 966 daily_Win.Worm.ldb > > > > Malware and Trojan take the longest, but they also have a majority of the > signatures. > > > > On Tue, Apr 9, 2019 at 11:19 AM Steve Basford < > steveb_cla...@sanesecurity.com> wrote: > > On 2019-04-09 12:02, Brent Clark via clamav-users wrote: > > Cant those be adopted / managed by Sanesecurity? > > > > For all you know, those are already in Sanesecurity. > > They are... and have been for quite some time: > > > "The following databases are distributed by Sanesecurity, but produced > by Porcupine Signatures" > > phishtank.ndb. > > Briefly... > > Number of sigs in phishtank.ndb: 9,309 > > eg: > > PhishTank.Phishing.6002281, matches: > > https://www.phishtank.com/phish_detail.php?phish_id=6002281 > > So, there is going to be some possible cross over now that > Phish.Phishing.REPHISH_ID_20190404_67-6931549-0 > type signatures names from PhishTank feed are in daily.ldb and > daily.ndb. > > I'll check back on the thread later. > > -- > Cheers, > > Steve > Twitter: @sanesecurity > > _______________________________________________ > > clamav-users mailing list > clamav-users@lists.clamav.net > https://lists.clamav.net/mailman/listinfo/clamav-users > > > Help us build a comprehensive ClamAV guide: > https://github.com/vrtadmin/clamav-faq > > http://www.clamav.net/contact.html#ml > >
_______________________________________________ clamav-users mailing list clamav-users@lists.clamav.net https://lists.clamav.net/mailman/listinfo/clamav-users Help us build a comprehensive ClamAV guide: https://github.com/vrtadmin/clamav-faq http://www.clamav.net/contact.html#ml