Hi Phil, Sorry I couldn't reply earlier to your email (have a few deadlines at the moment) so will reply here:
I would be extremely interested to see the support for OpenCL and either GPU or CELL implemented in GCC. My personal interest here is to extend work on adaptive scheduling. A few years ago together with UPC colleagues I started a small project to do predictive code scheduling and data manipulation for heterogeneous processors. We used CUDA and CPU/GPU processors. The projects stalled but we managed to get some preliminary results - if you are interested you can find more info about run-time predictive scheduling at this paper and presentation: http://unidapt.org/index.php/Dissemination#JGVP2009 I will not be available much this summer due to personal constraints, but if you will manage to get support for this work, I would be interested to see if we can connect this framework with the Collective Optimization Framework at the end to automatically learn how to predict good scheduling strategy based on kernel and dataset parameters. This relates to your "Able to manage scheduling, compute and memory resources"... Take care and good luck, Grigori Paolo, Thanks for the feedback, I am not very experienced in compilers so it is hard to judge how long a task will take... By sharing I meant sharing of code between NVIDIA and GCC. It probably won't happen I guess. Here is my proposal for an OpenCL runtime with a target runtime as well. If you think it is too ambitious or not ambitious enough, I will change it. ================================================================================= Project Title: Make the OpenCL Platform Layer API and Runtime API for the Cell Processor and CPUs. Project Synopsis: The aim of this project is to create an implementation that supports the Platform Layer API and Runtime API of OpenCL that can target the Cell Processor and CPUs. The Platform Layer API is: -A hardware abstraction layer over diverse computational resources -An interface to query, select and initialize compute devices -An interface to create compute devices and work-queues The Runtime API is: -Able to execute compute kernels -Able to manage scheduling, compute and memory resources (I am confused as to the wording of this, does it mean: manage the scheduling of compute and memory resources?) (Source http://www.khronos.org/developers/library/overview/opencl_overview.pdf, page 13). This project will use the existing gcc and ppu-gcc/spu-gcc compilers for offline compilation of binary programs. Project Details: (Part 1) In this project I will make a C library and runtime that supports some of the functions listed here: http://www.khronos.org/registry/cl/api/1.0/cl.h Specifically I will add support for: clGetPlatformInfo - Get info about OpenCL clGetDeviceIDs - Get what devices are supported on system clGetDeviceInfo - Get info about a specific device clCreateContext - Create an OpenCL context clReleaseContext - Release an OpenCL context clCreateCommandQueue - Create a command-queue on a specific device clReleaseCommandQueue - Release a command-queue clCreateBuffer - Create a buffer object clEnqueueReadBuffer - Enqueue a read clEnqueueWriteBuffer - Enqueue a write clCreateProgramWithBinary - Create a program object from a pre-compiled binary. clReleaseProgram - Release a program object clCreateKernel - Create a kernel object clReleaseKernel - Release a kernel object clSetKernelArg - Set the kernel arguments clEnqueueNDRangeKernel - Enqueue a command to execute a kernel on a device clEnqueueTask - Enqueue a single work item clWaitForEvents - Wait for events to complete clReleaseEvent - Release an event This will allow for rudimentary launches of CPU and Cell kernels in a common interface. Any functions that are required for the above to work will also be added. The OpenCL compiler will not be implemented. (Part 2) Also, a runtime library for the target (CPU or Cell) must be created that includes the following intrinsics: Information Functions: (section 6.11.1 of http://www.khronos.org/registry/cl/specs/opencl-1.0.33.pdf) uint get_work_dim () size_t get_global_size (uint dimindx) size_t get_global_id (uint dimindx) size_t get_local_size (uint dimindx) size_t get_local_id (uint dimindx) size_t get_num_groups (uint dimindx) size_t get_group_id (uint dimindx) Synchronization Functions: (sections 6.11.9 - 6.11.10 of http://www.khronos.org/registry/cl/specs/opencl-1.0.33.pdf) void barrier (cl_mem_fence_flags flags) void mem_fence (cl_mem_fence_flags flags) void read_mem_fence (cl_mem_fence_flags flags) void write_mem_fence (cl_mem_fence_flags flags) Async Copies to/from Memory: (section 6.11.11 of http://www.khronos.org/registry/cl/specs/opencl-1.0.33.pdf) event_t async_work_group_copy (__local gentype *dst, const __global gentype *src,size_t num_elements, event_t event) event_t async_work_group_copy (__global gentype *dst,const __local gentype *src,size_t num_elements, event_t event) void wait_group_events (int num_events, event_t *event_list) Developer Details: I am a Ph.D. student in Computer Science at Syracuse University in Syracuse, NY, USA. I have been programming prefessionally in C since I was about 15 years old (My first job was a Win32 device driver for a furnace controller ISA add-in card. This device driver has run for 10 years, 351 days a year, 24 hours a day. There have been no bugs or updates ever to the driver). I have lately been fixing bugs in open source software and I have been on the mailing list of gcc for about 6 months. In short, programming is my life, I love it. Experience with the Cell Processor: I have been working with the Cell Processor on the playstation 3 for about a year now. I am familiar with using ppu-gcc and spu-gcc, creation of shared memory between the local stores and main memory, explicit mfc_put, mfc_get calls, mailboxes, etc. Success Criteria: 1. There is a working OpenCL runtime for the CPU and Cell Processor that supports the limited functionality above (Part 1) 2. There is a working runtime library for the target that supports the limitied functionality above (Part 2) Road Map: April 20th - Start getting up to speed and get to know mentors. Prepare design. May 23rd - Begin coding. June 30th - Target runtime library is done. July 30th - OpenCL runtime is done. August 10th - Support Documentation is done. Benefit to GCC: This project will bring GCC a step closer to supporting OpenCL for the CPU and Cell targets. The target runtime library will be an intermediate deliverable that can be used until the OpenCL C compiler is done. ========================================================================================= Thanks for all of the support and interest so far everyone! Sincerely, Phil Pratt-Szeliga