Hi Phil,

Sorry I couldn't reply earlier to your email (have a few deadlines at the 
moment)
so will reply here:

I would be extremely interested to see the support for OpenCL and either GPU
or CELL implemented in GCC. My personal interest here is to extend work on
adaptive scheduling. A few years ago together with UPC colleagues I started a 
small project 
to do predictive code scheduling and data manipulation for heterogeneous
processors. We used CUDA and CPU/GPU processors. The projects stalled
but we managed to get some preliminary results - if you are interested you can
find more info about run-time predictive scheduling at this paper and 
presentation:

http://unidapt.org/index.php/Dissemination#JGVP2009

I will not be available much this summer due to personal constraints, but if 
you will
manage to get support for this work, I would be interested to see if we can 
connect
this framework with the Collective Optimization Framework at the end to 
automatically learn how to 
predict good scheduling strategy based on kernel and dataset parameters. This 
relates to your 
"Able to manage scheduling, compute and memory resources"...

Take care and good luck,
Grigori


Paolo,

Thanks for the feedback, I am not very experienced in compilers so it
is hard to judge how long a task will take...

By sharing I meant sharing of code between NVIDIA and GCC.  It
probably won't happen I guess.

Here is my proposal for an OpenCL runtime with a target runtime as
well.  If you think it is too ambitious or not
ambitious enough, I will change it.
=================================================================================

Project Title:
Make the OpenCL Platform Layer API and Runtime API for the Cell
Processor and CPUs.

Project Synopsis:
The aim of this project is to create an implementation that supports
the Platform Layer API
and Runtime API of OpenCL that can target the Cell Processor and CPUs.
The Platform
Layer API is:
-A hardware abstraction layer over diverse computational resources
-An interface to query, select and initialize compute devices
-An interface to create compute devices and work-queues
The Runtime API is:
-Able to execute compute kernels
-Able to manage scheduling, compute and memory resources (I am
confused as to the wording
of this, does it mean: manage the scheduling of compute and memory resources?)
(Source http://www.khronos.org/developers/library/overview/opencl_overview.pdf,
page 13).

This project will use the existing gcc and ppu-gcc/spu-gcc compilers for offline
compilation of binary programs.

Project Details:
(Part 1)
In this project I will make a C library and runtime that supports some
of the functions listed
here: http://www.khronos.org/registry/cl/api/1.0/cl.h
Specifically I will add support for:
clGetPlatformInfo - Get info about OpenCL
clGetDeviceIDs - Get what devices are supported on system
clGetDeviceInfo - Get info about a specific device
clCreateContext - Create an OpenCL context
clReleaseContext - Release an OpenCL context
clCreateCommandQueue - Create a command-queue on a specific device
clReleaseCommandQueue - Release a command-queue
clCreateBuffer - Create a buffer object
clEnqueueReadBuffer - Enqueue a read
clEnqueueWriteBuffer - Enqueue a write
clCreateProgramWithBinary - Create a program object from a pre-compiled binary.
clReleaseProgram - Release a program object
clCreateKernel - Create a kernel object
clReleaseKernel - Release a kernel object
clSetKernelArg - Set the kernel arguments
clEnqueueNDRangeKernel - Enqueue a command to execute a kernel on a device
clEnqueueTask - Enqueue a single work item
clWaitForEvents - Wait for events to complete
clReleaseEvent - Release an event

This will allow for rudimentary launches of CPU and Cell kernels in a
common interface.  Any functions
that are required for the above to work will also be added.  The
OpenCL compiler will not be implemented.

(Part 2)
Also, a runtime library for the target (CPU or Cell) must be created
that includes the following intrinsics:
Information Functions: (section 6.11.1 of
http://www.khronos.org/registry/cl/specs/opencl-1.0.33.pdf)
uint get_work_dim ()
size_t get_global_size (uint dimindx)
size_t get_global_id (uint dimindx)
size_t get_local_size (uint dimindx)
size_t get_local_id (uint dimindx)
size_t get_num_groups (uint dimindx)
size_t get_group_id (uint dimindx)
Synchronization Functions: (sections 6.11.9 - 6.11.10 of
http://www.khronos.org/registry/cl/specs/opencl-1.0.33.pdf)
void barrier (cl_mem_fence_flags flags)
void mem_fence (cl_mem_fence_flags flags)
void read_mem_fence (cl_mem_fence_flags flags)
void write_mem_fence (cl_mem_fence_flags flags)
Async Copies to/from Memory: (section 6.11.11 of
http://www.khronos.org/registry/cl/specs/opencl-1.0.33.pdf)
event_t async_work_group_copy (__local gentype *dst, const __global
gentype *src,size_t num_elements, event_t event)
event_t async_work_group_copy (__global gentype *dst,const __local
gentype *src,size_t num_elements, event_t event)
void wait_group_events (int num_events, event_t *event_list)

Developer Details:
I am a Ph.D. student in Computer Science at Syracuse University in
Syracuse, NY, USA.  I have been
programming prefessionally in C since I was about 15 years old (My
first job was a Win32 device driver
for a furnace controller ISA add-in card.  This device driver has run
for 10 years, 351 days a year, 24 hours
a day.  There have been no bugs or updates ever to the driver).  I
have lately been fixing bugs in open
source software and I have been on the mailing list of gcc for about 6
months.  In short, programming
is my life, I love it.

Experience with the Cell Processor:
I have been working with the Cell Processor on the playstation 3 for
about a year now.  I am familiar
with using ppu-gcc and spu-gcc, creation of shared memory between the
local stores and main memory,
explicit mfc_put, mfc_get calls, mailboxes, etc.

Success Criteria:
1. There is a working OpenCL runtime for the CPU and Cell Processor
that supports the limited functionality above (Part 1)
2. There is a working runtime library for the target that supports the
limitied functionality above (Part 2)

Road Map:
April 20th - Start getting up to speed and get to know mentors.  Prepare design.
May 23rd - Begin coding.
June 30th - Target runtime library is done.
July 30th - OpenCL runtime is done.
August 10th - Support Documentation is done.

Benefit to GCC:
This project will bring GCC a step closer to supporting OpenCL for the
CPU and Cell
targets.  The target runtime library will be an intermediate
deliverable that can be
used until the OpenCL C compiler is done.
=========================================================================================

Thanks for all of the support and interest so far everyone!

Sincerely,
Phil Pratt-Szeliga

Reply via email to