As of daily-25458, we've updated the Email.Phishing.VOF2 signatures such
that they should have better performance when scanning larger email files.

Specifically, the signatures each had a PCRE component that began by
looking for the string 'filename', and as it turns out, the PCRE library
will begin evaluating the regex more thoroughly each time the first
character in the regex is encountered in a file being scanned.  It also
turns out that RTF files, which get embedded in emails as plain text, can
consist of a surprisingly large number of f's.  In an email we were testing
with that had an embedded RTF file, the email was ~13 million bytes in
size, and ~10 million of those were the letter f!  We modified the regex to
begin by looking for a semicolon, which is much less common in RTF files
and is not in the base64 character set.

Please let us know if you encounter any other cases of unreasonably slow
scan times, and we will do our best to investigate.  Thank you!

-Andrew

Andrew Williams
Malware Research Team
Cisco Talos

On Wed, Apr 10, 2019 at 8:57 PM Micah Snyder (micasnyd) via clamav-users <
clamav-users@lists.clamav.net> wrote:

> JME,
>
> As you've pointed out, it appears that some signatures containing a PCRE
> regex components are responsible for slow scan times on larger email files.
>
> I did a bunch of profiling similar to what Maarten did earlier in order to
> narrow it down.  I found that Email.Phishing.VOF2 signatures are performing
> slower with the eml sample you sent me.  Email.Phishing.VOF2 signatures
> contain a PCRE regex component to alert on email attachments with specific
> names.  Now that we've determined which signatures are performing slowly in
> these cases, I am hopeful that we will be able to optimize the
> Email.Phishing.VOF2 signatures to improve performance.
>
> I will note that your idea to lower the PCRERecMatchLimit setting to 1
> will effectively neuter all signatures that rely on regexes and so I can't
> recommend this.
>
> Regards,
> Micah
>
>
> On 4/10/19, 12:36 PM, "clamav-users on behalf of JME via clamav-users" <
> clamav-users-boun...@lists.clamav.net on behalf of
> clamav-users@lists.clamav.net> wrote:
>
>     Helo,
>
>     I managed to significantly reduce the problems of very long analysis,
> more than 400sec on some emails. Not by disabling PhishingSignatures that
> did not work. But putting: PCRERecMatchLimit to 1.
>     The PCRE analyzes are thus bypassed, but SafeBrawsing and the other
> scans continue to work. Is it a mistake to precede as well?
>
>     Regards,
>     JME
>
>     -----Message d'origine-----
>     De : clamav-users <clamav-users-boun...@lists.clamav.net> De la part
> de Brent Clark via clamav-users
>     Envoyé : mercredi 10 avril 2019 12:33
>     À : ClamAV users ML <clamav-users@lists.clamav.net>
>     Cc : Brent Clark <brentgclarkl...@gmail.com>
>     Objet : Re: [clamav-users] [External] Re: Scan very slow
>
>     Thanks for doing this.
>
>     What Im getting out of your feedback is that maybe you guys need to
> look to implementing or relooking at your CI process(es).
>
>     Before pushing a commit, your CI can run the same test(s) and alert on
> slow or long running scans.
>
>     All this can be automated and report on issues.
>
>     I highly recommend to doing this, I dont think you guys realise how
> many systems are running and dependent on Clamav. Might be a good time to
> too remind the community and ask to support and donate for the project.
>
>     HTH
>
>     Regards
>     Brent
>
>     On 2019/04/09 17:58, Maarten Broekman via clamav-users wrote:
>     > Clearly the latest daily.cvd is performing better, but the remaining
>     > "Phishtank" sigs are _not_ a majority of the slowness.
>     >
>     > I unpacked the current (?) cvd (ClamAV-VDB:09 Apr 2019 03-53
>     > -0400:25414:1548262:63:X:X:raynman:1554796413) and then ran a test
>     > scan with each part to see what the load times looked like:
>     >
>     >     daily.cdb ==== Time: 0.007 sec (0 m 0 s)
>     >     daily.cfg ==== Time: 0.004 sec (0 m 0 s)
>     >     daily.crb ==== Time: 0.006 sec (0 m 0 s)
>     >     *daily.cvd ==== Time: 11.384 sec (0 m 11 s)*
>     >     daily.fp ==== Time: 0.009 sec (0 m 0 s)
>     >     daily.ftm ==== Time: 0.005 sec (0 m 0 s)
>     >     daily.hdb ==== Time: 0.303 sec (0 m 0 s)
>     >     daily.hdu ==== Time: 0.006 sec (0 m 0 s)
>     >     daily.hsb ==== Time: 1.093 sec (0 m 1 s)
>     >     daily.hsu ==== Time: 0.005 sec (0 m 0 s)
>     >     daily.idb ==== Time: 0.006 sec (0 m 0 s)
>     >     *daily.ldb ==== Time: 5.563 sec (0 m 5 s)
>     >     *
>     >     daily.ldu ==== Time: 0.005 sec (0 m 0 s)
>     >     daily.mdb ==== Time: 0.061 sec (0 m 0 s)
>     >     daily.mdu ==== Time: 0.007 sec (0 m 0 s)
>     >     daily.msb ==== Time: 0.005 sec (0 m 0 s)
>     >     daily.msu ==== Time: 0.005 sec (0 m 0 s)
>     >     daily.ndb ==== Time: 0.017 sec (0 m 0 s)
>     >     daily.ndu ==== Time: 0.005 sec (0 m 0 s)
>     >     daily.pdb ==== Time: 0.010 sec (0 m 0 s)
>     >     daily.sfp ==== Time: 0.006 sec (0 m 0 s)
>     >     daily.wdb ==== Time: 0.014 sec (0 m 0 s)
>     >
>     > So, half the run time of a clamscan is from the daily.ldb. To break
> it
>     > down farther, I split the daily.ldb into "daily_<virus>.ldb" where
>     > <virus> is the first part of the dot-separated signature name.
>     >
>     >     daily_Andr.ldb ==== Time: 0.008 sec (0 m 0 s)
>     >     daily_Archive.ldb ==== Time: 0.009 sec (0 m 0 s)
>     >     daily_Asp.ldb ==== Time: 0.004 sec (0 m 0 s)
>     >     daily_Doc.ldb ==== Time: 0.116 sec (0 m 0 s)
>     >     daily_Eicar-Test-Signature.ldb ==== Time: 0.009 sec (0 m 0 s)
>     >     daily_Email.ldb ==== Time: 0.014 sec (0 m 0 s)
>     >     daily_Emf.ldb ==== Time: 0.007 sec (0 m 0 s)
>     >     daily_Heuristics.ldb ==== Time: 0.006 sec (0 m 0 s)
>     >     daily_Html.ldb ==== Time: 0.010 sec (0 m 0 s)
>     >     daily_Hwp.ldb ==== Time: 0.005 sec (0 m 0 s)
>     >     daily_Img.ldb ==== Time: 0.006 sec (0 m 0 s)
>     >     daily_Ios.ldb ==== Time: 0.006 sec (0 m 0 s)
>     >     daily_Java.ldb ==== Time: 0.005 sec (0 m 0 s)
>     >     daily_Js.ldb ==== Time: 0.007 sec (0 m 0 s)
>     >     daily_Legacy.ldb ==== Time: 0.006 sec (0 m 0 s)
>     >     daily_Lnk.ldb ==== Time: 0.005 sec (0 m 0 s)
>     >     daily_Mp4.ldb ==== Time: 0.005 sec (0 m 0 s)
>     >     daily_Multios.ldb ==== Time: 0.005 sec (0 m 0 s)
>     >     daily_Osx.ldb ==== Time: 0.008 sec (0 m 0 s)
>     >     daily_Pdf.ldb ==== Time: 0.007 sec (0 m 0 s)
>     >     *daily_Phish.ldb ==== Time: 1.612 sec (0 m 1 s)*
>     >     daily_Phishtank.ldb ==== Time: 0.146 sec (0 m 0 s)
>     >     daily_Php.ldb ==== Time: 0.006 sec (0 m 0 s)
>     >     daily_Ppt.ldb ==== Time: 0.007 sec (0 m 0 s)
>     >     daily_Py.ldb ==== Time: 0.006 sec (0 m 0 s)
>     >     daily_Rtf.ldb ==== Time: 0.006 sec (0 m 0 s)
>     >     daily_Svg.ldb ==== Time: 0.005 sec (0 m 0 s)
>     >     daily_Swf.ldb ==== Time: 0.007 sec (0 m 0 s)
>     >     daily_Ttf.ldb ==== Time: 0.005 sec (0 m 0 s)
>     >     daily_Txt.ldb ==== Time: 0.009 sec (0 m 0 s)
>     >     daily_Unix.ldb ==== Time: 0.008 sec (0 m 0 s)
>     >     daily_Vbs.ldb ==== Time: 0.009 sec (0 m 0 s)
>     >     *daily_Win.ldb ==== Time: 3.391 sec (0 m 3 s)*
>     >     daily_Xls.ldb ==== Time: 0.009 sec (0 m 0 s)
>     >     daily_Xml.ldb ==== Time: 0.007 sec (0 m 0 s)
>     >
>     >
>     > "Phish.", not "Phishtank.", and "Win." are the longest run times.
>     > Looking at the /number/ of signatures in each, the 'Phish.'
> signatures
>     > are taking a disproportionate amount of time to load compared to the
>     > other signatures:
>     >
>     >           216 daily_Andr.ldb
>     >             3 daily_Archive.ldb
>     >             1 daily_Asp.ldb
>     >          2096 daily_Doc.ldb
>     >             1 daily_Eicar-Test-Signature.ldb
>     >          1017 daily_Email.ldb
>     >             2 daily_Emf.ldb
>     >             5 daily_Heuristics.ldb
>     >           250 daily_Html.ldb
>     >             1 daily_Hwp.ldb
>     >            15 daily_Img.ldb
>     >             6 daily_Ios.ldb
>     >            16 daily_Java.ldb
>     >            69 daily_Js.ldb
>     >            27 daily_Legacy.ldb
>     >             9 daily_Lnk.ldb
>     >             1 daily_Mp4.ldb
>     >             9 daily_Multios.ldb
>     >           175 daily_Osx.ldb
>     >           132 daily_Pdf.ldb
>     >          2515 daily_Phish.ldb
>     >          3516 daily_Phishtank.ldb
>     >            18 daily_Php.ldb
>     >             5 daily_Ppt.ldb
>     >             3 daily_Py.ldb
>     >            28 daily_Rtf.ldb
>     >             1 daily_Svg.ldb
>     >           103 daily_Swf.ldb
>     >             2 daily_Ttf.ldb
>     >           140 daily_Txt.ldb
>     >           222 daily_Unix.ldb
>     >            21 daily_Vbs.ldb
>     >         43928 daily_Win.ldb
>     >           165 daily_Xls.ldb
>     >             8 daily_Xml.ldb
>     >
>     >
>     >  From the look of it, "Phish." has those REPHISH signatures. Those
>     > signatures seem to be looking at any file (Target 0) and have
>     > subsignatures that are combined to match depending on which filetype
>     > they are 'looking' for (so, href for HTML files, %PDF, Subtype, and
>     > URI objects for PDFs, etc) as opposed to the remaining Phishtank
> sigs
>     > which seem to have a separate signature depending on the target type.
>     >
>     > Breaking up daily_Win into it's constituent sub-parts doesn't reveal
>     > any particular culprit from just a simple scan timing though...
>     >
>     >     daily_Win.Adware.ldb ==== Time: 0.013 sec (0 m 0 s)
>     >     daily_Win.Coinminer.ldb ==== Time: 0.009 sec (0 m 0 s)
>     >     daily_Win.Downloader.ldb ==== Time: 0.035 sec (0 m 0 s)
>     >     daily_Win.Dropper.ldb ==== Time: 0.240 sec (0 m 0 s)
>     >     daily_Win.Exploit.ldb ==== Time: 0.016 sec (0 m 0 s)
>     >     daily_Win.Ircbot.ldb ==== Time: 0.006 sec (0 m 0 s)
>     >     daily_Win.Keylogger.ldb ==== Time: 0.010 sec (0 m 0 s)
>     >     daily_Win.ldb ==== Time: 3.418 sec (0 m 3 s)
>     >     daily_Win.Macro.ldb ==== Time: 0.009 sec (0 m 0 s)
>     >     *daily_Win.Malware.ldb ==== Time: 0.731 sec (0 m 0 s)*
>     >     daily_Win.Packed.ldb ==== Time: 0.131 sec (0 m 0 s)
>     >     daily_Win.Packer.ldb ==== Time: 0.008 sec (0 m 0 s)
>     >     daily_Win.Phishing.ldb ==== Time: 0.005 sec (0 m 0 s)
>     >     daily_Win.Proxy.ldb ==== Time: 0.005 sec (0 m 0 s)
>     >     daily_Win.Ransomware.ldb ==== Time: 0.019 sec (0 m 0 s)
>     >     daily_Win.Spyware.ldb ==== Time: 0.006 sec (0 m 0 s)
>     >     daily_Win.Tool.ldb ==== Time: 0.008 sec (0 m 0 s)
>     >     *daily_Win.Trojan.ldb ==== Time: 0.582 sec (0 m 0 s)*
>     >     daily_Win.Virus.ldb ==== Time: 0.059 sec (0 m 0 s)
>     >     daily_Win.Worm.ldb ==== Time: 0.030 sec (0 m 0 s)
>     >
>     >           158 daily_Win.Adware.ldb
>     >            14 daily_Win.Coinminer.ldb
>     >           561 daily_Win.Downloader.ldb
>     >          8084 daily_Win.Dropper.ldb
>     >           216 daily_Win.Exploit.ldb
>     >             9 daily_Win.Ircbot.ldb
>     >           193 daily_Win.Keylogger.ldb
>     >         43928 daily_Win.ldb
>     >             1 daily_Win.Macro.ldb
>     >     *   14820 daily_Win.Malware.ldb*
>     >          4209 daily_Win.Packed.ldb
>     >            20 daily_Win.Packer.ldb
>     >             2 daily_Win.Phishing.ldb
>     >             4 daily_Win.Proxy.ldb
>     >           500 daily_Win.Ransomware.ldb
>     >            32 daily_Win.Spyware.ldb
>     >           121 daily_Win.Tool.ldb
>     >     *   12051 daily_Win.Trojan.ldb*
>     >          1967 daily_Win.Virus.ldb
>     >           966 daily_Win.Worm.ldb
>     >
>     >
>     > Malware and Trojan take the longest, but they also have a majority
> of
>     > the signatures.
>     >
>     > On Tue, Apr 9, 2019 at 11:19 AM Steve Basford
>     > <steveb_cla...@sanesecurity.com
>     > <mailto:steveb_cla...@sanesecurity.com>>
>     > wrote:
>     >
>     >     On 2019-04-09 12:02, Brent Clark via clamav-users wrote:
>     >      > Cant those be adopted / managed by Sanesecurity?
>     >      >
>     >      > For all you know, those are already in Sanesecurity.
>     >
>     >     They are... and have been for quite some time:
>     >
>     >
>     >     "The following databases are distributed by Sanesecurity, but
> produced
>     >     by Porcupine Signatures"
>     >
>     >     phishtank.ndb.
>     >
>     >     Briefly...
>     >
>     >     Number of sigs in phishtank.ndb: 9,309
>     >
>     >     eg:
>     >
>     >     PhishTank.Phishing.6002281, matches:
>     >
>     >     https://www.phishtank.com/phish_detail.php?phish_id=6002281
>     >
>     >     So, there is going to be some possible cross over now that
>     >     Phish.Phishing.REPHISH_ID_20190404_67-6931549-0
>     >     type signatures names from PhishTank feed are in daily.ldb and
>     >     daily.ndb.
>     >
>     >     I'll check back on the thread later.
>     >
>     >     --
>     >     Cheers,
>     >
>     >     Steve
>     >     Twitter: @sanesecurity
>     >
>     >     _______________________________________________
>     >
>     >     clamav-users mailing list
>     >     clamav-users@lists.clamav.net <mailto:
> clamav-users@lists.clamav.net>
>     >     https://lists.clamav.net/mailman/listinfo/clamav-users
>     >
>     >
>     >     Help us build a comprehensive ClamAV guide:
>     >     https://github.com/vrtadmin/clamav-faq
>     >
>     >     http://www.clamav.net/contact.html#ml
>     >
>     >
>     >
>     > _______________________________________________
>     >
>     > clamav-users mailing list
>     > clamav-users@lists.clamav.net
>     > https://lists.clamav.net/mailman/listinfo/clamav-users
>     >
>     >
>     > Help us build a comprehensive ClamAV guide:
>     > https://github.com/vrtadmin/clamav-faq
>     >
>     > http://www.clamav.net/contact.html#ml
>     >
>
>     _______________________________________________
>
>     clamav-users mailing list
>     clamav-users@lists.clamav.net
>     https://lists.clamav.net/mailman/listinfo/clamav-users
>
>
>     Help us build a comprehensive ClamAV guide:
>     https://github.com/vrtadmin/clamav-faq
>
>     http://www.clamav.net/contact.html#ml
>
>
>     _______________________________________________
>
>     clamav-users mailing list
>     clamav-users@lists.clamav.net
>     https://lists.clamav.net/mailman/listinfo/clamav-users
>
>
>     Help us build a comprehensive ClamAV guide:
>     https://github.com/vrtadmin/clamav-faq
>
>     http://www.clamav.net/contact.html#ml
>
>
>
> _______________________________________________
>
> clamav-users mailing list
> clamav-users@lists.clamav.net
> https://lists.clamav.net/mailman/listinfo/clamav-users
>
>
> Help us build a comprehensive ClamAV guide:
> https://github.com/vrtadmin/clamav-faq
>
> http://www.clamav.net/contact.html#ml
>
_______________________________________________

clamav-users mailing list
clamav-users@lists.clamav.net
https://lists.clamav.net/mailman/listinfo/clamav-users


Help us build a comprehensive ClamAV guide:
https://github.com/vrtadmin/clamav-faq

http://www.clamav.net/contact.html#ml

Reply via email to