Sorry, long non-cocoa post, but maybe there some useful info for someone.

On 7 May 2008, at 18:33, Army Research Lab wrote:
Pay
particular attention to the section titled "HDL and programming languages".
Chip designers have had to contend with these problems for years, and
developed languages for expressing parallelism with implicit threading
already (everything in an HDL is parallel unless you carefully force it to
be sequential).  We should be using ideas from those languages.

As somebody who's day job is writing HDL I'd like to just repeat this for emphasis, but it's not the languages that solves the race conditions, it's the architectures employed by the engineers. I think a lot of the problems software engineers have with with threading come from bad architectures when viewed from a parallel execution point of view.

Locks and Semaphores are the workaround for this, and (good) hardware engineers (almost) never use them.

For example: Pipe-lining. If faced with a set of tasks that need to be performed sequentially on some data blocks a software engineer might decompose the problem like this (MIGHT, I said MIGHT):

(PA means process A, D1 means data block 1)

Thread 1: PA - D1 | PB - D1 | PC - D1 | PD - D1
Thread 2: PA - D2 | PB - D2 | PC - D2 | PD - D2
Thread 3: PA - D3 | PB - D3 | PC - D3 | PD - D3
Thread 4: PA - D4 | PB - D4 | PC - D4 | PD - D4

The thing to note is how each thread is running the same code (which must therefore be re-entrant) on different data.

A Hardware engineer would probably do this:

Thread 1: PA - D1 | PA - D2 | PA - D3 | PA - D4
Thread 2:           PB - D1 | PB - D2 | PB - D3 | PB - D4
Thread 3:                     PC - D1 | PC - D2 | PC - D3 | PC - D4
Thread 4: PD - D1 | PD - D2 | PD - D3 | PD - D4

Note how the data is passed from thread to thread so only one thread owns the data at any time (no locks necessary), and how no process is being run in more than one thread at a time so code doesn't have to worry about being re-entrant.

Granted there's a start-up / shut-down cost where full parallelism isn't achieved (which is overwhelming in this example, but give it more data blocks and it becomes negligible), and this doesn't work for all problems, but it's a useful pattern for data-processing. The other thing is to make sure that you're stages are of similar complexity, as the slowest stage will define the performance of the system.

Passing the ownership of data from thread to thread would be done with FIFOs which can also be written without locks, with some care. (e.g. http://msmvps.com/blogs/vandooren/archive/2007/01/05/creating-a-thread-safe-producer-consumer-queue-in-c-without-using-locks.aspx , but read the comments esp. w.r.t out of order execution).

Yes there can be issues with something like this in software (passing data between NUMA processors and non-shared caches), but believe me... It's makes code far, far, far easier to read, write and DEBUG (unit tests for each stage).

_______________________________________________

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to [EMAIL PROTECTED]

Reply via email to