------- Comment #3 from rob1weld at aol dot com  2010-07-19 08:25 -------
> ... this does not get parallelized at all ...
Also see 34501

Perhaps we could make some use of Pluto. It is a fully automatic (C to OpenMP
C) parallelizer that makes code amenable to auto-vectorization.

http://pluto-compiler.sourceforge.net/


Also see these Parallelizers:
http://cri.ensmp.fr/pips/ or http://pips4u.org/
There was something I found a few days ago from here that I can no longer
locate
http://en.wikipedia.org/wiki/Automatic_parallelization

It would be great to take that inner loop (if it were much larger) and
'Kernelize' it for co-processing on our Graphics Card. We could expand GCCs
'x-parallelize-x' and threading options to automatically find the sweeter spots
to offload for co=processing (on a GPU, using OpenCL).

Barra - NVIDIA G80 GPU Functional Simulator
http://gpgpu.univ-perp.fr/index.php/Barra

If we were 'allowed' to call a post-processor (like LTO used to do) we could
call ATI's GPU SDK which supports OpenCL and outputs code BOTH to x86 and it's
own GPUs. 


Commercial Projects:
Auto-parallelizer and SIMDinator by Dalsoft
http://www.dalsoft.com/documentation_simdinator.html

NVidia's PTX
http://en.wikipedia.org/wiki/Parallel_Thread_Execution

Cray's work with LLVM
http://llvm.org/devmtg/2009-10/Greene_180k_Cores.pdf

Larrabee
http://www.drdobbs.com/architecture-and-design/216402188?pgno=5


Rob


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36281

Reply via email to