Re: pthreads question

Nadav Har'El Sat, 16 Mar 2002 07:27:16 -0800

On Sat, Mar 16, 2002, Malcolm Kavalsky wrote about "Re: pthreads question":
> Before you attack me, try the following test:
> 
> 1. Malloc a 100 Mb buffer, fill it with random data
> 2. Send the buffer over a Unix socket to another process
> 3. Time how long it takes to send the data ...
> 
> The result will be, that it takes almost nothing. Certainly faster than 
> memory speed.
> This is because the data is not actually copied, but the OS does clever 
> pointer manipulation,
> copy on write, etc to speed stuff like this up.


I didn't try this, but I'll bet you're wrong even without trying ;)
In Linux, sending on sockets (and it doesn't really matter what type)
is *not* zero-copy. It can't be, if you consider the fact that write()
takes one process's buffer and the other process's read() takes that
processes buffer - so the kernel MUST copy data from one buffer to another.
No amount of pointer manipulation in the world can do anything against that -
they'd need to change the socket API to do that... (the sendfile() system
call in Linux 2.4 is a start, but not relevant to this discussion).
In reality more copies may be (and probably are) done inside the kernel.

Explicit "message-passing" is indeed a valid approach for parallel programming
(and is what you *have* to do when you don't have shared memory between your
CPUs), but it isn't an easy way to program either. It's easy for a beginner
to get into deadlocks (one process waits for a message from a second, but the
seconds wait for the first), bugs (one process expects a certain type of
message, the second sends it a different one), and so on. These bugs are
almost as hard to catch as thread bugs. It's naive to say you can always
debug each process seperately: sometimes when you single-step one process,
you're "hanging" the rest of the processes and drasticly changing the flow
of the program.

As a sidenote (if anybody on the list is still reading this :)), PVM and MPI
are two examples of message passing libraries: it is far easier to use them
than to try to implement your own send-on-sockets code.

Having written both threaded and PVM programs, I honestly can't say that
writing or debugging PVM programs is easier than threaded programs. The
opposite might be true...

You know, SMP was invented exactly because shared-memory is such a useful
paradigm. Otherwise, we could have just had a machine with several CPUs with
seperate memory (and we have those: they are called MPPs)!
Hardware makers are also trying to give a shared-memory disguise to non-SMP
machines (look up NUMA and Virtual Shared Memory). All this because some
things are easier to program using shared memory.

> What do we gain by passing large data structures between processes 
> instead of using shared
> memory ? Simple synchronization and protection of data.
> 
> Another alternative, which I like is to write a separate server process 
> which manages the data structure and all
> processes that want to access this data are clients of this process. 

All this is a good and valid approach. But in some (many?) cases, it is
harder to code than a threaded alternative, where all memory is shared and
you count on your good programming (or language facilities for simpler
protection, like in C++) not to mess things up.
Anyway, in most programs (maybe not in programs controlling space shuttles
or nuclear reactors...) once you have a bug, it doesn't really matter if
only one thread dies or the entire program dies: they all work together,
and once one dies, the entire program failed anyway. So the extra protection
of processes is nothing more than a debugging aid, usually.

> Once programmers get intoxicated with threads, I have noticed that at 
> every opportunity,
> instead of just calling a function, they spawn a thread to execute it, 

I already agreed that a bad programmer can wreak havoc using threads, just
like her or she can do with other complex things they don't understand (e.g.,
C++, who probably has 10 times more features to (ab)use than C).

But I assume neither of us is a bad programmer, nor would want to hire
one...

> settling on the final design. You also need to take into account and 
> protect yourself from
> pitfalls by using well-tried and "foolproof" mechanisms, especially when 
> you have a team
> of programmers.

I guess that if you have a team of programmers without previous experience
in some area (such as parallel programming) you'd want to "play it safe"
and not use threads (especially on Linux, where threads' benefits are more
limited). I can understand that.

pthread_exit(NULL);  /* I think this thread has run long enough ;) */

-- 
Nadav Har'El                        |      Saturday, Mar 16 2002, 3 Nisan 5762
[EMAIL PROTECTED]             |-----------------------------------------
Phone: +972-53-245868, ICQ 13349191 |Communism is the equal distribution of
http://nadav.harel.org.il           |poverty.

=================================================================
To unsubscribe, send mail to [EMAIL PROTECTED] with
the word "unsubscribe" in the message body, e.g., run the command
echo unsubscribe | mail [EMAIL PROTECTED]

Re: pthreads question

Reply via email to