------- Comment #3 from rob1weld at aol dot com 2010-07-19 08:25 ------- > ... this does not get parallelized at all ... Also see 34501
Perhaps we could make some use of Pluto. It is a fully automatic (C to OpenMP C) parallelizer that makes code amenable to auto-vectorization. http://pluto-compiler.sourceforge.net/ Also see these Parallelizers: http://cri.ensmp.fr/pips/ or http://pips4u.org/ There was something I found a few days ago from here that I can no longer locate http://en.wikipedia.org/wiki/Automatic_parallelization It would be great to take that inner loop (if it were much larger) and 'Kernelize' it for co-processing on our Graphics Card. We could expand GCCs 'x-parallelize-x' and threading options to automatically find the sweeter spots to offload for co=processing (on a GPU, using OpenCL). Barra - NVIDIA G80 GPU Functional Simulator http://gpgpu.univ-perp.fr/index.php/Barra If we were 'allowed' to call a post-processor (like LTO used to do) we could call ATI's GPU SDK which supports OpenCL and outputs code BOTH to x86 and it's own GPUs. Commercial Projects: Auto-parallelizer and SIMDinator by Dalsoft http://www.dalsoft.com/documentation_simdinator.html NVidia's PTX http://en.wikipedia.org/wiki/Parallel_Thread_Execution Cray's work with LLVM http://llvm.org/devmtg/2009-10/Greene_180k_Cores.pdf Larrabee http://www.drdobbs.com/architecture-and-design/216402188?pgno=5 Rob -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36281