David F. Skoll wrote: > Hello, > > A client of ours had a bunch of machines whose CPUs were maxed out > at 100% because of clam. Changing PhishingScanURLs to "no" from the > default "yes" dropped the load average from 70+ to about 3, and the > CPU usage from 100% to under 50%. This is under Linux, so it's not > the broken Solaris regex library at fault. > > I have two questions, a practical one and a philosophical one: > > The practical one: Do others observe the very poor behaviour > of PhishingScanURLs? Is it perhaps hitting pathological cases of regex > evaluation? > > The philosophical one: Do heuristics like PhishingScanURLs belong in a > virus scanner? I realize that once the engine is in place, it's > tempting to add features, but I'm not convinced such things belong in > a virus scanner. I think they are more in the domain of anti-spam > software, especially since it's good for security to keep your > virus-scanner small, fast and secure and do more complex text analysis > in a language other than C. I guess I would vote for PhishingScanURLs > to be "no" by default rather than "yes".
I agree but have been on both sides of the URL issue. I have another tool, J-Chkmail, that scans URLs already, and uses a distributed list described by Jose-Martins and also at SURBL.org. So I don't need it in ClamAV so much andn wouldn't miss it. J-Chkmail uses pcre libraries by default. I'm content it is configurable in ClamAV (And J-Chkmail where it uses db files for lookups). To answer your first question, I've not seen it happen on the systems I run (all Solaris with pcre compiled in). It was an issue from time to time when I was working at Getty Images but the mail volume there is much greater than what I deal with now. dp _______________________________________________ Help us build a comprehensive ClamAV guide: visit http://wiki.clamav.net http://lurker.clamav.net/list/clamav-users.html