On Wed, 2 Apr 2014, Greg Woods wrote:


My experience says there isn't. Granted I am not an expert in parallel
computing, but I work for a supercomputing site. About 15 years ago,
high performance computing hit the wall with regard to how fast a single
processor can be. We had CRAY computers that used vector processing;
that means executing the same instructions on a range of memory words at
the same time in one instruction cycle. This means that code like

for i = 1,100 do
a[i]=a[i]*2
done

would execute at the same speed as "x=x*2" (in this admittedly trivial
example, you get a factor of 100 speedup).  That was a lot easier to
program for than multiprocessing, but even that required careful
attention when writing code so that it would vectorize and get the
performance boost.

After single processor computing hit the wall, we and every other HPC
site had to go to parallel processing (modern supercomputers have tens
of thousands of processors running on thousands of separate nodes). This
too requires special coding, so that your program will naturally break
up into separate tasks that can be executed in parallel. That is true
whether you are talking about using multiple processors on a single
machine, or spreading a code over multiple systems. There are MPI
libraries to make this task easier, but it is never as simple as "OK,
now execute this unmodified code five times as fast using five machines
instead of one".

How difficult it is to parallelize the code depends, as has already been
said here, on the particular application to be parallelized.

--Greg



Right.  A lot of image processing tasks are amenable to parallelization.

Consider an algorithm called "adaptive histogram equalization."  What this does 
is take:

1) Get a pixel and a small area around it (say the surrounding 100 pixels).

2) Do a contrast enhancement method called "histogram equalization" on that 
group of pixels.  This will change the value of the pixel in question.  Let's say that 
this process involves 500 high-level instructions.

3) Move to the next pixel.  Do the same thing.

If you have a 12-megapixel image (say, 11,760,000 pixels), that's 5,880,000,000 
instructions.  That 500 instruction block is impossible to parallelize well.  However, 
each pixel is independent, so you can parallelize the work on each pixel easily.  I 
remember back in the 80s implementing this on a microVAX GPX II.  It took about 3 hours 
to do a 512x512 greyscale image by brute force.  Then Henry Fuchs et al. developed the 
PixelPlanes machine, and Austin et al.  implemented it on that -- it took about 4 
seconds.  Even today on my laptop with an i7, a brute-force contrast-limited adaptive 
histogram equalization on a 10 megapixel image takes a "go get a cup of coffe" 
time period.  There are, of course, short cuts such as the Pizer-Cromartie algorithm, but 
they introduce interpolation artifacts.

Of course, that's why we have GPUs now, and most of this stuff is done on the 
GPU using CUDA.

Oh well, as I said, I remember back in the day trying to build a Beowulf 
cluster and deciding that it just wasn't worth the effort.  I was hoping that 
new tools were around to make it easier, with all the new advances in cloud and 
virtualization, but no such luck, it seems.

billo
--
users mailing list
users@lists.fedoraproject.org
To unsubscribe or change subscription options:
https://admin.fedoraproject.org/mailman/listinfo/users
Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct
Guidelines: http://fedoraproject.org/wiki/Mailing_list_guidelines
Have a question? Ask away: http://ask.fedoraproject.org

Reply via email to