Oh, absolutely Micah. My scan times were negligible as I was scanning a
single PHP that was 150 bytes or so (opening PHP tag, two lines of
comments, and a call to phpinfo), so those times I gave were entirely load
time.

I'm glad that you found the information helpful.

--Maarten

On Tue, Apr 9, 2019 at 5:29 PM Micah Snyder (micasnyd) <micas...@cisco.com>
wrote:

> Maarten,
>
>
>
> Your test results are pretty great.  I really like your breakdown of the
> signatures by category.  I will caution that scan times will vary quite
> heavily depending on what you’re scanning, based on Target type (
> https://www.clamav.net/documents/clamav-file-types).
>
>
>
> In addition, it’s important to distinguish between load and scan times.
> The time reported by clamscan is both load + scan.  If you just want scan
> time, you will want to load the database with clamd and then test the
> scantime with clamdscan.
>
>
>
> Regarding load time vs scantime, all of the signatures must be loaded, but
> depending on the target type of the file being scanned, not all of the
> signatures will be matched against the file.  That is, daily_Win.ldb might
> take the longest to load due to the number of signatures or complexity of
> the signatures but when scanning a PDF, they probably won’t impact scan
> time, as Win signatures are probably mostly target type 1 (PE file).
>
>
>
> I’ve bit of time today investigating what I believe is responsible for
> slow load and scan times for the Phishtank sigs.  I had a hunch, based on a
> conversation we saw a while back in the mailing list, that the identical
> beginning for URL-based signatures result in an un-balanced and inefficient
> tree for matching. That is, some 3000 signatures each began with either:
>
>
>    1. href="http:// (687265663d22687474703a2f2f)
>    2. HYPERLINK"http (48595045524c494e4b2022687474703a2f2f)
>    3. S/URI/URI(http:// (532f5552492f55524928687474703a2f2f)
>
>
>
> Looking at a few of the Phish.Phishing signatures, these appear to have
> the same issue (href="http:// prefix).  In testing with scan of a PDF
> document, I was able to reduce the scan time from 31.987 sec down to 2.632
> sec simply by changing the start of the Phishtank signatures for the
> following:
>
>
>    1. href="http://
>       1. from: 687265663d22687474703a2f2f
>       2. to: 687265663d2268747470{3-4}
>    2. HYPERLINK "http
>       1. from: 48595045524c494e4b2022687474703a2f2f
>       2. to: 48595045524c494e4b202268747470{3-4}
>    3. S/URI/URI(http://
>       1. from: 532f5552492f55524928687474703a2f2f
>       2. to: 532f5552492f5552492868747470{3-4}
>
>
>
> This should get the same detection with a faster load and scan time, and
> will accommodate for httpS for better coverage.  To turn lemonade into
> really good lemonade, we may be able to take the above optimization and
> apply it to the Phish.Phishing signatures identified by Maarten to reduce
> scan times further to levels below those before the addition of the
> Phishtank signatures.
>
>
>
> As noted by Maarten as well, the Phish.Phishing sigs are Target type 0,
> whereas we’d split the Phishtank.Phishing signatures up by target type to
> reduce scan times of files where the signatures won’t apply.  It should
> also speed things up quite a bit for other file types to split those up by
> Target types.
>
>
>
> Further research into scan time optimization is definitely welcome and
> appreciated.
>
>
>
> Regards,
>
> Micah
>
>
>
>
>
> *From: *clamav-users <clamav-users-boun...@lists.clamav.net> on behalf of
> Maarten Broekman via clamav-users <clamav-users@lists.clamav.net>
> *Reply-To: *ClamAV users ML <clamav-users@lists.clamav.net>
> *Date: *Tuesday, April 9, 2019 at 12:00 PM
> *To: *ClamAV users ML <clamav-users@lists.clamav.net>
> *Cc: *Maarten Broekman <maarten.broek...@gmail.com>
> *Subject: *Re: [clamav-users] [External] Re: Scan very slow
>
>
>
> Clearly the latest daily.cvd is performing better, but the remaining
> "Phishtank" sigs are *not* a majority of the slowness.
>
>
>
> I unpacked the current (?) cvd (ClamAV-VDB:09 Apr 2019 03-53
> -0400:25414:1548262:63:X:X:raynman:1554796413) and then ran a test scan
> with each part to see what the load times looked like:
>
> daily.cdb ==== Time: 0.007 sec (0 m 0 s)
>
> daily.cfg ==== Time: 0.004 sec (0 m 0 s)
>
> daily.crb ==== Time: 0.006 sec (0 m 0 s)
>
> *daily.cvd ==== Time: 11.384 sec (0 m 11 s)*
>
> daily.fp ==== Time: 0.009 sec (0 m 0 s)
>
> daily.ftm ==== Time: 0.005 sec (0 m 0 s)
>
> daily.hdb ==== Time: 0.303 sec (0 m 0 s)
>
> daily.hdu ==== Time: 0.006 sec (0 m 0 s)
>
> daily.hsb ==== Time: 1.093 sec (0 m 1 s)
>
> daily.hsu ==== Time: 0.005 sec (0 m 0 s)
>
> daily.idb ==== Time: 0.006 sec (0 m 0 s)
>
> *daily.ldb ==== Time: 5.563 sec (0 m 5 s)*
>
> daily.ldu ==== Time: 0.005 sec (0 m 0 s)
>
> daily.mdb ==== Time: 0.061 sec (0 m 0 s)
>
> daily.mdu ==== Time: 0.007 sec (0 m 0 s)
>
> daily.msb ==== Time: 0.005 sec (0 m 0 s)
>
> daily.msu ==== Time: 0.005 sec (0 m 0 s)
>
> daily.ndb ==== Time: 0.017 sec (0 m 0 s)
>
> daily.ndu ==== Time: 0.005 sec (0 m 0 s)
>
> daily.pdb ==== Time: 0.010 sec (0 m 0 s)
>
> daily.sfp ==== Time: 0.006 sec (0 m 0 s)
>
> daily.wdb ==== Time: 0.014 sec (0 m 0 s)
>
>
>
> So, half the run time of a clamscan is from the daily.ldb. To break it
> down farther, I split the daily.ldb into "daily_<virus>.ldb" where <virus>
> is the first part of the dot-separated signature name.
>
> daily_Andr.ldb ==== Time: 0.008 sec (0 m 0 s)
>
> daily_Archive.ldb ==== Time: 0.009 sec (0 m 0 s)
>
> daily_Asp.ldb ==== Time: 0.004 sec (0 m 0 s)
>
> daily_Doc.ldb ==== Time: 0.116 sec (0 m 0 s)
>
> daily_Eicar-Test-Signature.ldb ==== Time: 0.009 sec (0 m 0 s)
>
> daily_Email.ldb ==== Time: 0.014 sec (0 m 0 s)
>
> daily_Emf.ldb ==== Time: 0.007 sec (0 m 0 s)
>
> daily_Heuristics.ldb ==== Time: 0.006 sec (0 m 0 s)
>
> daily_Html.ldb ==== Time: 0.010 sec (0 m 0 s)
>
> daily_Hwp.ldb ==== Time: 0.005 sec (0 m 0 s)
>
> daily_Img.ldb ==== Time: 0.006 sec (0 m 0 s)
>
> daily_Ios.ldb ==== Time: 0.006 sec (0 m 0 s)
>
> daily_Java.ldb ==== Time: 0.005 sec (0 m 0 s)
>
> daily_Js.ldb ==== Time: 0.007 sec (0 m 0 s)
>
> daily_Legacy.ldb ==== Time: 0.006 sec (0 m 0 s)
>
> daily_Lnk.ldb ==== Time: 0.005 sec (0 m 0 s)
>
> daily_Mp4.ldb ==== Time: 0.005 sec (0 m 0 s)
>
> daily_Multios.ldb ==== Time: 0.005 sec (0 m 0 s)
>
> daily_Osx.ldb ==== Time: 0.008 sec (0 m 0 s)
>
> daily_Pdf.ldb ==== Time: 0.007 sec (0 m 0 s)
>
> *daily_Phish.ldb ==== Time: 1.612 sec (0 m 1 s)*
>
> daily_Phishtank.ldb ==== Time: 0.146 sec (0 m 0 s)
>
> daily_Php.ldb ==== Time: 0.006 sec (0 m 0 s)
>
> daily_Ppt.ldb ==== Time: 0.007 sec (0 m 0 s)
>
> daily_Py.ldb ==== Time: 0.006 sec (0 m 0 s)
>
> daily_Rtf.ldb ==== Time: 0.006 sec (0 m 0 s)
>
> daily_Svg.ldb ==== Time: 0.005 sec (0 m 0 s)
>
> daily_Swf.ldb ==== Time: 0.007 sec (0 m 0 s)
>
> daily_Ttf.ldb ==== Time: 0.005 sec (0 m 0 s)
>
> daily_Txt.ldb ==== Time: 0.009 sec (0 m 0 s)
>
> daily_Unix.ldb ==== Time: 0.008 sec (0 m 0 s)
>
> daily_Vbs.ldb ==== Time: 0.009 sec (0 m 0 s)
>
> *daily_Win.ldb ==== Time: 3.391 sec (0 m 3 s)*
>
> daily_Xls.ldb ==== Time: 0.009 sec (0 m 0 s)
>
> daily_Xml.ldb ==== Time: 0.007 sec (0 m 0 s)
>
>
>
> "Phish.", not "Phishtank.", and "Win." are the longest run times. Looking
> at the *number* of signatures in each, the 'Phish.' signatures are taking
> a disproportionate amount of time to load compared to the other signatures:
>
>      216 daily_Andr.ldb
>
>        3 daily_Archive.ldb
>
>        1 daily_Asp.ldb
>
>     2096 daily_Doc.ldb
>
>        1 daily_Eicar-Test-Signature.ldb
>
>     1017 daily_Email.ldb
>
>        2 daily_Emf.ldb
>
>        5 daily_Heuristics.ldb
>
>      250 daily_Html.ldb
>
>        1 daily_Hwp.ldb
>
>       15 daily_Img.ldb
>
>        6 daily_Ios.ldb
>
>       16 daily_Java.ldb
>
>       69 daily_Js.ldb
>
>       27 daily_Legacy.ldb
>
>        9 daily_Lnk.ldb
>
>        1 daily_Mp4.ldb
>
>        9 daily_Multios.ldb
>
>      175 daily_Osx.ldb
>
>      132 daily_Pdf.ldb
>
>     2515 daily_Phish.ldb
>
>     3516 daily_Phishtank.ldb
>
>       18 daily_Php.ldb
>
>        5 daily_Ppt.ldb
>
>        3 daily_Py.ldb
>
>       28 daily_Rtf.ldb
>
>        1 daily_Svg.ldb
>
>      103 daily_Swf.ldb
>
>        2 daily_Ttf.ldb
>
>      140 daily_Txt.ldb
>
>      222 daily_Unix.ldb
>
>       21 daily_Vbs.ldb
>
>    43928 daily_Win.ldb
>
>      165 daily_Xls.ldb
>
>        8 daily_Xml.ldb
>
>
>
> From the look of it, "Phish." has those REPHISH signatures. Those
> signatures seem to be looking at any file (Target 0) and have subsignatures
> that are combined to match depending on which filetype they are 'looking'
> for (so, href for HTML files, %PDF, Subtype, and URI objects for PDFs, etc)
> as opposed to the remaining Phishtank sigs which seem to have a separate
> signature depending on the target type.
>
>
>
> Breaking up daily_Win into it's constituent sub-parts doesn't reveal any
> particular culprit from just a simple scan timing though...
>
> daily_Win.Adware.ldb ==== Time: 0.013 sec (0 m 0 s)
>
> daily_Win.Coinminer.ldb ==== Time: 0.009 sec (0 m 0 s)
>
> daily_Win.Downloader.ldb ==== Time: 0.035 sec (0 m 0 s)
>
> daily_Win.Dropper.ldb ==== Time: 0.240 sec (0 m 0 s)
>
> daily_Win.Exploit.ldb ==== Time: 0.016 sec (0 m 0 s)
>
> daily_Win.Ircbot.ldb ==== Time: 0.006 sec (0 m 0 s)
>
> daily_Win.Keylogger.ldb ==== Time: 0.010 sec (0 m 0 s)
>
> daily_Win.ldb ==== Time: 3.418 sec (0 m 3 s)
>
> daily_Win.Macro.ldb ==== Time: 0.009 sec (0 m 0 s)
>
> *daily_Win.Malware.ldb ==== Time: 0.731 sec (0 m 0 s)*
>
> daily_Win.Packed.ldb ==== Time: 0.131 sec (0 m 0 s)
>
> daily_Win.Packer.ldb ==== Time: 0.008 sec (0 m 0 s)
>
> daily_Win.Phishing.ldb ==== Time: 0.005 sec (0 m 0 s)
>
> daily_Win.Proxy.ldb ==== Time: 0.005 sec (0 m 0 s)
>
> daily_Win.Ransomware.ldb ==== Time: 0.019 sec (0 m 0 s)
>
> daily_Win.Spyware.ldb ==== Time: 0.006 sec (0 m 0 s)
>
> daily_Win.Tool.ldb ==== Time: 0.008 sec (0 m 0 s)
>
> *daily_Win.Trojan.ldb ==== Time: 0.582 sec (0 m 0 s)*
>
> daily_Win.Virus.ldb ==== Time: 0.059 sec (0 m 0 s)
>
> daily_Win.Worm.ldb ==== Time: 0.030 sec (0 m 0 s)
>
>
>
>      158 daily_Win.Adware.ldb
>
>       14 daily_Win.Coinminer.ldb
>
>      561 daily_Win.Downloader.ldb
>
>     8084 daily_Win.Dropper.ldb
>
>      216 daily_Win.Exploit.ldb
>
>        9 daily_Win.Ircbot.ldb
>
>      193 daily_Win.Keylogger.ldb
>
>    43928 daily_Win.ldb
>
>        1 daily_Win.Macro.ldb
>
> *   14820 daily_Win.Malware.ldb*
>
>     4209 daily_Win.Packed.ldb
>
>       20 daily_Win.Packer.ldb
>
>        2 daily_Win.Phishing.ldb
>
>        4 daily_Win.Proxy.ldb
>
>      500 daily_Win.Ransomware.ldb
>
>       32 daily_Win.Spyware.ldb
>
>      121 daily_Win.Tool.ldb
>
> *   12051 daily_Win.Trojan.ldb*
>
>     1967 daily_Win.Virus.ldb
>
>      966 daily_Win.Worm.ldb
>
>
>
> Malware and Trojan take the longest, but they also have a majority of the
> signatures.
>
>
>
> On Tue, Apr 9, 2019 at 11:19 AM Steve Basford <
> steveb_cla...@sanesecurity.com> wrote:
>
> On 2019-04-09 12:02, Brent Clark via clamav-users wrote:
> > Cant those be adopted / managed by Sanesecurity?
> >
> > For all you know, those are already in Sanesecurity.
>
> They are... and have been for quite some time:
>
>
> "The following databases are distributed by Sanesecurity, but produced
> by Porcupine Signatures"
>
> phishtank.ndb.
>
> Briefly...
>
> Number of sigs in phishtank.ndb: 9,309
>
> eg:
>
> PhishTank.Phishing.6002281, matches:
>
> https://www.phishtank.com/phish_detail.php?phish_id=6002281
>
> So, there is going to be some possible cross over now that
> Phish.Phishing.REPHISH_ID_20190404_67-6931549-0
> type signatures names from PhishTank feed are in daily.ldb and
> daily.ndb.
>
> I'll check back on the thread later.
>
> --
> Cheers,
>
> Steve
> Twitter: @sanesecurity
>
> _______________________________________________
>
> clamav-users mailing list
> clamav-users@lists.clamav.net
> https://lists.clamav.net/mailman/listinfo/clamav-users
>
>
> Help us build a comprehensive ClamAV guide:
> https://github.com/vrtadmin/clamav-faq
>
> http://www.clamav.net/contact.html#ml
>
>
_______________________________________________

clamav-users mailing list
clamav-users@lists.clamav.net
https://lists.clamav.net/mailman/listinfo/clamav-users


Help us build a comprehensive ClamAV guide:
https://github.com/vrtadmin/clamav-faq

http://www.clamav.net/contact.html#ml

Reply via email to