Re: [clamav-users] [External] Re: Scan very slow

Micah Snyder (micasnyd) via clamav-users Fri, 12 Apr 2019 08:26:19 -0700

We don't use the word engine in quite that way with ClamAV, but I think I 
understand your question.


With regards to the word "engine":
        Clamd builds a scanning engine based on the databases and configuration 
options.  The engine is shared by scanning threads.

With regards to clamd's use of multithreading:
        Clamd uses multithreading to handle scan requests.  That is to say that 
each scan target will get its own thread.  However, files contained within the 
scan target will be scanned in the same thread as the scan target.  Scans of 
embedded content are invoked as they are identified by the parsers for each 
given file type.  None of these make use of multithreading at this time.

Regards,
Micah

On 4/11/19, 4:09 PM, "clamav-users on behalf of Paul Kosinski via 
clamav-users" <clamav-users-boun...@lists.clamav.net on behalf of 
clamav-users@lists.clamav.net> wrote:

    Does clamd use multi-threading for the various "engines" within a
    single scan, or only to handle multiple requests from different sources?
    
    
    On Tue, 9 Apr 2019 21:29:43 +0000
    "Micah Snyder \(micasnyd\) via clamav-users"
    <clamav-users@lists.clamav.net> wrote:
    
    > Maarten,
    > 
    > Your test results are pretty great.  I really like your breakdown of
    > the signatures by category.  I will caution that scan times will vary
    > quite heavily depending on what you’re scanning, based on Target type
    > (https://www.clamav.net/documents/clamav-file-types).
    > 
    > In addition, it’s important to distinguish between load and scan
    > times.  The time reported by clamscan is both load + scan.  If you
    > just want scan time, you will want to load the database with clamd
    > and then test the scantime with clamdscan.
    > 
    > Regarding load time vs scantime, all of the signatures must be
    > loaded, but depending on the target type of the file being scanned,
    > not all of the signatures will be matched against the file.  That is,
    > daily_Win.ldb might take the longest to load due to the number of
    > signatures or complexity of the signatures but when scanning a PDF,
    > they probably won’t impact scan time, as Win signatures are probably
    > mostly target type 1 (PE file).
    > 
    > I’ve bit of time today investigating what I believe is responsible
    > for slow load and scan times for the Phishtank sigs.  I had a hunch,
    > based on a conversation we saw a while back in the mailing list, that
    > the identical beginning for URL-based signatures result in an
    > un-balanced and inefficient tree for matching. That is, some 3000
    > signatures each began with either:
    > 
    > 
    >   1.  href="http:// (687265663d22687474703a2f2f)
    >   2.  HYPERLINK"http (48595045524c494e4b2022687474703a2f2f)
    >   3.  S/URI/URI(http:// (532f5552492f55524928687474703a2f2f)
    > 
    > Looking at a few of the Phish.Phishing signatures, these appear to
    > have the same issue (href="http:// prefix).  In testing with scan of
    > a PDF document, I was able to reduce the scan time from 31.987 sec
    > down to 2.632 sec simply by changing the start of the Phishtank
    > signatures for the following:
    > 
    > 
    >   1.  href="http://
    >      *   from: 687265663d22687474703a2f2f
    >      *   to: 687265663d2268747470{3-4}
    >   2.  HYPERLINK "http
    >      *   from: 48595045524c494e4b2022687474703a2f2f
    >      *   to: 48595045524c494e4b202268747470{3-4}
    >   3.  S/URI/URI(http://
    >      *   from: 532f5552492f55524928687474703a2f2f
    >      *   to: 532f5552492f5552492868747470{3-4}
    > 
    > This should get the same detection with a faster load and scan time,
    > and will accommodate for httpS for better coverage.  To turn lemonade
    > into really good lemonade, we may be able to take the above
    > optimization and apply it to the Phish.Phishing signatures identified
    > by Maarten to reduce scan times further to levels below those before
    > the addition of the Phishtank signatures.
    > 
    > As noted by Maarten as well, the Phish.Phishing sigs are Target type
    > 0, whereas we’d split the Phishtank.Phishing signatures up by target
    > type to reduce scan times of files where the signatures won’t apply.
    > It should also speed things up quite a bit for other file types to
    > split those up by Target types.
    > 
    > Further research into scan time optimization is definitely welcome
    > and appreciated.
    > 
    > Regards,
    > Micah
    
    _______________________________________________
    
    clamav-users mailing list
    clamav-users@lists.clamav.net
    https://lists.clamav.net/mailman/listinfo/clamav-users
    
    
    Help us build a comprehensive ClamAV guide:
    https://github.com/vrtadmin/clamav-faq
    
    http://www.clamav.net/contact.html#ml
    


_______________________________________________

clamav-users mailing list
clamav-users@lists.clamav.net
https://lists.clamav.net/mailman/listinfo/clamav-users


Help us build a comprehensive ClamAV guide:
https://github.com/vrtadmin/clamav-faq

http://www.clamav.net/contact.html#ml

Re: [clamav-users] [External] Re: Scan very slow

Reply via email to