On 26 Apr 2012 at 21:18, Török Edwin wrote:

> On 04/26/2012 08:37 PM, Michael Orlitzky wrote:
> > On 04/26/2012 10:32 AM, Dennis Peterson wrote:
> >> On 4/25/12 7:34 AM, Michael Orlitzky wrote:
> >>> On 04/25/12 07:55, Török Edwin wrote:
> >>>>>
> >>>>> I don't know if this can help speeding up the process but I collected 
> >>>>> some statistics on
> >>>>> clamscan of a small file (wallclock duration: ~25sec):
> >>>>
> >>>> I think I'm missing some context here: which DB files are slow to load?
> >>>> The official ones? Just the sanesecurity ones? Any particular DB from 
> >>>> the sanesecurity ones?
> >>>
> >>> My problem isn't so much that it takes a while to load the signatures,
> >>> but that clamd (and thus the mail server) is effectively down the entire
> >>> time.
> >>
> >> This has been a problem on every Sparc system I've ever installed ClamAV 
> >> on and 
> >> that goes back quite a few years. I still use in on several Netra 500 mHz 
> >> pizza 
> >> boxes. It is also quite a memory hole which is more related to the 
> >> available 
> >> memory and number of sigs, so on memory constrained systems I've cut back 
> >> on the 
> >> number of SS signatures. And at my peril, I might add, as they have long 
> >> been 
> >> the most valuable in terms of results. And because of the dead time when 
> >> reloading I've cut freshclam to once a day. That has resulted in a net 
> >> improvement in detections because of the higher availability time.
> >>
> > 
> > The signature databases are created once, and loaded thousands of times.
> > They should just be sorted, so that lookups are instantaneous.
> > 
> > Then it's trivial to update the databases in the background, because you
> > can quickly determine if a particular signature was added or deleted.
> > The wall-time-elapsed would be a bit worse, but nobody would care.
> 
> Its a bit more complicated than that. To ensure fast pattern-matching the 
> signatures are loaded into an Aho-Corasick trie for example.
> It would be possible to add to the trie (thats what happens when loading 
> signatures), but removing is more tricky.
> And to determine what to remove you need to go through all the signatures in 
> the database anyway.
> Also updating the loaded signature database would require the scanning 
> threads to take read locks, which would slow things down
> and make updating it harder (right now the loaded signature database is never 
> modified, hence no locks are needed).
> 
> It would be easier to just move reload_db to a different thread and allow 
> scanning with the old database during the DB reload.
> Then when the DB reload is finished atomically replace the engine pointer and 
> free the old engine.
> Downside would be that you get twice the memory usage during reload, but you 
> don't have downtime,
> so this should probably be controlled by a flag in clamd.conf.

Doing that with 2 different processes rather than with 2 threads would at least 
free all the 
initial process memory when the "transfer of service" is done and that process 
can exit. 
AFAIK freeing the memory inside of a process does not necessarily reduce the 
memory 
space consumed. But I'm not an expert. Of course that "transfer of service" 
would be more 
tricky between 2 processes...

Regards,
Pierre


_______________________________________________
Help us build a comprehensive ClamAV guide: visit http://wiki.clamav.net
http://www.clamav.net/support/ml

Reply via email to