I am scanning large Data sets for a company. These file systems have
hundreds of thousands of files in them. Most files are small in size, <1GB,
while a few are large, >10GB. Most files are documents, archives, and
executables. I am scanning them to detect if there are any malware.

These are virtual machines, running Ubuntu 20.04.
The cpu on the esxi host is an Intel Xeon Platinum 828 CPu @2.70GHz. I have
in total, 112 logical processors available, and 512 GB of RAM.

The message it says is the following:
————————————————
Got command FILDES(7,11) argument
RECVTH FILDES command complete
THMGR active jobs for ***********: 2
THRMGR: Contended, sleeping
————————————————
Nothin under this command, it pauses, then after a couple minutes it will
continue, repeating

On Tue, Mar 2, 2021 at 9:40 AM G.W. Haywood via clamav-users <
clamav-users@lists.clamav.net> wrote:

> Hi there,
>
> On Tue, 2 Mar 2021, Michael Kyriacou via clamav-users wrote:
> > On Tue, Mar 2, 2021 at 4:08 AM G.W. Haywood via clamav-users wrote:
> >> On Mon, 1 Mar 2021, Michael Kyriacou via clamav-users wrote:
> >>
> >>> ... clamav 103.1 on ubuntu 20.04. I am getting “can’t allocate
> >>> memory errors” on very large files ( 10GB +). I thought clamdscan
> >>> was supposed to skip files that are larger than what you set the
> >>> maxfilesize/maxscansize to.
> >>
> >> Unfortunately this is a known issue:
> >>
> >> https://bugzilla.clamav.net/show_bug.cgi?id=12374
> >>
> >> Have you tried other ways to avoid scanning huge files?
> >
> > I was not aware of any other way to avoid scanning large files. Where
> can I
> > find such solutions?
>
> The operating system offers ways to avoid shooting your own feet.  You
> could just arrange for all the huge files to be in some corner of the
> filesystem which you don't normally scan - which begs the questions
> what are you scanning, and why?  There will of course be pseudo-files
> in your system which you should _never_ scan.  The 'find' utility will
> let you specify size limits.  You will need to spend some quality time
> with the 'man' pages to gain familiarity with using standard utilities
> in conjunction with something like ClamAV.  Using the 'man' pages is
> something of an acquired taste, which you do need to acquire if you're
> to get the most out of a Linux box.  The 'man' page for clamd.conf
> contains information about usage of resources.  Also there are some
> warnings, which to my mind are perhaps a little over the top, but they
> serve to remind us that the system's resources may be shared between a
> large number of processes; that these processes compete for resources;
> and that things can get ugly when there aren't enough to go around.
>
> The concept of "not scanning a file larger than X bytes" is a bit too
> simplistic when talking about scanning with something like ClamAV which
> (a) depending on the file type may use different approaches to scanning
> and (b) can extract the content from types of file (e.g. Zip, RAR, etc.)
> which can contain whole directory structures and also employ compression
> techniques, and which as a result are subject to various and sometimes
> non-obvious Denial-Of-Service type attacks.  So there are numerous clamd
> configuration options which permit fine-tuning of the resource usage of
> the ClamAV tools.  To make the best use of these options you'll need to
> be familiar with the your system's resources, and the constraints.
>
> How much memory does the box have?  You'll probably need a gigabyte or
> so to store the signature database before you even start a scan, plus
> whatever the scanner uses when it scans something - that depends a lot
> on what it's scanning.  Then if you keep the default configuration to
> permit scanning while reloading the databases, another gigabyte will
> be used (briefly) every time clamd reloads the database.  Note that
> the extra memory will not be released until the completion of any scan
> which was started before the reload.  I'd recommend that if you don't
> want to have to work on memory management, four gigabytes of RAM is
> about the minimum for a clamd server.  The longer it takes to scan a
> file, the more likely it is that you'll try to reload the database
> during a scan, so if you're short on memory and you want to scan files
> which take a long time to scan then it's worth considering the option
> to scan data only while a database reload is not taking place.
>
> --
>
> 73,
> Ged.
>
> _______________________________________________
>
> clamav-users mailing list
> clamav-users@lists.clamav.net
> https://lists.clamav.net/mailman/listinfo/clamav-users
>
>
> Help us build a comprehensive ClamAV guide:
> https://github.com/vrtadmin/clamav-faq
>
> http://www.clamav.net/contact.html#ml
>
_______________________________________________

clamav-users mailing list
clamav-users@lists.clamav.net
https://lists.clamav.net/mailman/listinfo/clamav-users


Help us build a comprehensive ClamAV guide:
https://github.com/vrtadmin/clamav-faq

http://www.clamav.net/contact.html#ml

Reply via email to