On Sun, 2010-05-23 at 17:43 +0300, Török Edwin wrote: > > > If a file is determined to be clean, its MD5 is added to an in-memory > > > cache. > > > When scanning a new file, its MD5 is computed and looked up in the > > > cache. If found, it is considered clean. > > > On DB reload the entire cache is cleared. > > > > But, isn't that typically done multiple times a day? > > > > So what exactly is the use-case for this, other than doing full system > > scans more frequently than signature updates? > > Even when doing full systems scan you still have a cache of last N > minutes (where N depends how often you reload the DB). > This helps with: > - duplicate files, or files both in archived an unarchived state > - since we cache at the extracted files level, even if only part of an > archive/container is redundant, we have that cached > - mails containing same attachment, which was already determined to be > clean
Ah, now I see. :) Thanks for explaining, Török. > - archive bombs: instead of trying to scan 2^N files until the > recursion depth/maxfilesize limit is reached, it only needs to scan N > files (N is recursion depth) for a typical archive bomb that expands to > 2 more archives at each depth. > - ensure that the bytecode won't accidentally need 2^N time to run: if > it happens to extract a file that matches the logical signature of the > same bytecode again, which would trigger further extraction and so on > > The latter is the reason why the feature was added, however some initial > tests have showed improved performance for nearly any kind of scan > (system, mails, home, etc.) -- char *t="\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4"; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1: (c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}} _______________________________________________ Help us build a comprehensive ClamAV guide: visit http://wiki.clamav.net http://www.clamav.net/support/ml