Hi,

Perhaps it is something else that is not thread safe, that do_mulipart() is calling?

I suspect so ... Maybe its also some bad lock contention somewhere ? Anyway, I'll try to narrow this down even more ... I hope to find the problem soon.

Naturally CPU usage will go down if you change an application from multiple to single threading, that does not necessarily indicate a problem, quite the opposite, it could indicate
that all is working as it should.

If I set the thread maximum to just one thread, the load is too much for clamd,
I get timeouts very fast (30-60 seconds after startup). With this additional codepath mutex the application is still multithreaded, only multipart mails get handled serially.

Let me explain a bit first what threading libraries exist in freebsd:

FreeBSD has three threading libraries:

libc_r:         a userland threading library, all threads are library intern and
                it can't use multiple CPUs.
libthr:         a 1:1 kernel threading library.
libpthread:     a n:m kernel threading library.

Only libpthread shows those problems. Libpthread can run in two modes: PTHREAD_SCOPE_SYSTEM, and PTHREAD_SCOPE_PROCESS.
With PTHREAD_SCOPE_SYSTEM the kernel threads get each its own
part of the CPU, while PTHREAD_SCOPE_PROCESS they have to share
it. Indepentent of those modes, clamd shows always the same symptoms !

Without this patch we see the thr_alive going up to the maximum, and haveing
idle_threads beeing always 1 or even zero. With a maximum of 100 threads we see
99 busy threads, and one idle, and all those busy threads are somewhere in
do_multipart, but they don't do any work. They starve and get killed after the timeout, but the real scan doesn't happen. When the maximum of clamd threads is reached, the clamd caller (in our case mimedefang) gets busy timeouts and kills it slaves so the scans never finish. If you set the thread maximum to 10 or 20 threads it doesn't change anything on the starvation.

All you need are multipart mails, so the do_multipart codepath is executed. The server we use has two cores, and HTT is activated, so 4 kernel threads can be executed the same time.

With this patch, we can load the server 4-5 times over the current load, that means 15-20 mails per second, and clamd still manages to scan those mails while haveing half of the threads being idle. And the CPU consumation is a lot lower. Without this patch, even 2-3 mails per second lead clamd to deadlock and
clamd spends a lot of CPU while it only manages to scan 1-2 mails in ten 
seconds !

Do you have an explaination for this ?

--
Martin
_______________________________________________
http://lurker.clamav.net/list/clamav-devel.html
Please submit your patches to our Bugzilla: http://bugs.clamav.net

Reply via email to