[Clamav-devel] pthreads

James Stevens Sun, 07 Sep 2003 08:43:36 -0700

The current problem: If a clamd thread runs into a timeout, it is/should be cancelled by the watcher thread using pthread_cancel(3). It really is cancelled, but any resources allocated before (open directories, strdup()ed pathnames,

Check that a "malloc" is necessary in all cases.

For example, I noticed that "fname" (the full path name) is malloced (and freed) for each file, in all directory tree walks. This is unnecessary and expensive. POSIX defines the macro PATH_MAX, which is the maximum path length for that platform.

So, if you define fname as "char fname[PATH_MAX];" mallocing and freeing becomes unnecessary and hence the risk of a leak is reduced.

Also, in the same code, the full pathname is build up using :-

sprintf(fname,"%s/%s"....

all "printf" style calls are expensive. If you define "fname" as PATH_MAX, then (before you start the walking loop) you can prepare the full path using :-

strcpy(fname,dirname);
strcat(fname,"/");
cp = fname + strlen(fname);

(where cp is "char *"). Then, for each file, all you have to do is :-

strcpy(cp,d->name);

to make "fname" into the full path name. This is much less expensive than the previous technique and reduces the chances of a leak.

(see my alternative "rmdirs" code from a previous posting).

> file buffers etc) are not free()d,

resulting in more or less big memleaks. SO one can now either manage a list of resources to be free()d/close()d and install a cleanup handler for that, or set a thread-global which is honored in all the scanning loops to break out and clean up as usual.

This is exactly why I think (and have expressed in this mailing list) that using threads for this application is not such a good idea.

(see my previous posting for making a more robust "clamd").

All I have heard about threads (from a programmers pov) is that is mainly equal to multiple processes, but global variables are to be avoided. And I don't know about variable scope in threads, either :-(

Basic theory (put simply):-

Unix programs are split into three segments, program (binary+libs), data / heap (global variables) and stack. When you fork() the parent and child have a shared program segment, but each process has its own stack and data segment. However, at the time of the call, the stack and data segments of the child appear to be a copy of the parent. i.e. their function call stacks and global data are independent but copies, and they share the same code.

A thread is almost exactly the same, except the data segments (globals and heap) are also shared between the parent and child. Hence your problem, that if a child leaks memory and dies, the leak is inherited by the parent. Unlike a fork(), where when the child dies the O/S cleans up.

However, note, most modern Unices have a very clever fork() algorithm. They apply a "copy on write" technique to the actual memory blocks. Meaning that for a parent and child process, although their global variables appear to be independant, no memory block are actually copied, until either process makes a change. This means that a fork() becomes a much lighter weight call than it used to be. It also means that, in system terms, a process fork() isn't really any more expensive than a thread fork(). Although a exec() is still quite expensive.

At least that's my understanding of it.

James

-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
Clamav-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/clamav-devel

[Clamav-devel] pthreads

Reply via email to