Here is my initial draft for my proposal. Please provide as much feedback as 
possible. Upon more research I realize I will almost definitely have time to do 
more than just the bind and device_type clause. I have prepared significant 
research for the cache directive as well, and could definitely include this in 
my proposal. I will also expand upon the device_type section but need to tend 
to an obligation and wanted to post this ASAP. Rip it shreds please!

Thank you!
Carter

Implementing support for the bind clause:

The bind clause will dictate what name will be used when a procedure is called 
from an accelerator device such as a GPU or FPGA. Essentially, users can 
specify multiple versions of the same procedure, one to be run on a host call 
and others that accelerators can call. This is done by specifying either an 
identifier or a string after the bind clause.

The primary benefits of implementing this functionality will be allowing highly 
specialized implementations of procedures, such as custom FFT or BLAS routines 
to be used instead of compiler generated code. It also allows clear definitions 
between host/device functions and greater interoperability with external 
libraries.

Implementation Notes:
We first will need to extend the parser to handle specification of the bind 
clause.
The files that we’ll need to use for this will be 
gcc/c-family/c-pragma.cc<http://c-pragma.cc/>, gcc/c-parser.cc, and 
cp/cp-parser.cc<http://cp-parser.cc/> (For just the C and C++ implementations). 
Fortran implementation may also be considered (likely) but to begin, we’ll just 
consider these two. Some validation methods already present in GCC may be 
utilized, such as oacc_verify and oacc_finalize. Semantic analysis will also of 
course be performed to ensure proper bind-names are passed.

The parsed clause will then need to be represented in GIMPLE IR (from the AST). 
This will be done by adding a GIMPLE node to represent and store the specified 
procedure to bind to. We will need to make some additions to gcc/tree.h, 
gcc/gimple.def, and gcc/gimple.h such as adding a new bind clause definition 
{OMP_CLAUSE_BIND(NODE)} for example. Storing this symbol-name information in 
GIMPLE IR ensures that, during the linking stage, GCC correctly produces object 
files containing accelerator IR sections and metadata (.gnu.offload_lto_* 
sections).

This new information will also need to be lowered into a device specific IR 
(PTX and GCN) that will allow the accelerator to pull the procedure from the 
bind clause for use. The relevant files here will be gc/config/nvptx and 
gcc/config/gcn.

Timeline:
Week1: Finish research and general design details for implementing bind and 
device_type clause.
Week2: Implement parser support for the bind clause. This will include 
debugging and ensure all new code passes in GCC’s current testing 
infrastructure. Additional testing will be designed to ensure proper 
functionality.
Week3: Implement GIMPLE representation of bind clause. Ensure debugging and 
testing are performed on both the parser and GIMPLE rep.
Week4: Map the routines to external devices (PTX and GCN). We will be able to 
test if the accelerators are implementing the correct routine by seeing what is 
emitted from the backend. Another useful metric would be the compute time, as 
we expect better performance when offloading to an accelerator.
Week5: Finalize implementation and test everything as a whole, ensuring current 
GCC testing passes as well as any important new test that will be implemented 
along the way.

Implementing support for the device_type clause:
Similarly to the bind clause, the device_type clause can specify on which 
device to use a specific procedure. This would allow a user to design many 
different procedures, with device specific implementations. Bind and 
device_type are very much thematic related, hence my interest in both of them.

Thankfully, implementing this function is very similar in the general method. 
The device_type will be parsed the same as any OpenACC directive, lowered into 
both GIMPLE and device-specific IR. If Fortran is included as well (which is 
very likely), it will go from the AST to GENERIC then to GIMPLE. One major 
difference between device_type and bind is that device_type is a configuration 
setting, so we would ideally like to store this setting for any future compiler 
use by the user. A feature to change or disable a specific device for a 
procedure should be considered.

The relevant files are also similar, though I will add some additional ones for 
configuration purposes.
gcc/common/config.cc
gcc/config/gcn/gcn.cc
gcc/config/nvptx/nvptx.cc
gcc/common/config/nvptx/nvptx-common.cc
gcc/lto-section-in.cc/lto-section-out.cc<http://lto-section-in.cc/lto-section-out.cc>
gcc/lto-wrapper.c

Week6: Begin implementing device_type in parsers. Testing + semantic 
analysis/validity
Week7: Implement relevant GIMPLE (GENERIC) nodes for device_type. Will include 
storing info until the proper backend is linked. Afterwards, this can be 
cleared. (This also goes for the bind clause)
Week8: Ensuring handling is done for explicit function declaration in nvptx ad 
gcn.
Week9: Finalize implementation and perform thorough testing on clause.

Background:
I am strong in C/C++ programming and my background is overwhelmingly OS and CPU 
architecture related. I have implemented CPU and memory schedulers for VM’s to 
use, synchronization and barrier algorithms using OpenMP, MPI, and POSIX. I’ve 
seen and developed several different caching schemes for use in many levels of 
cache, both in single and multicore CPUs (where coherence is important).
I have experience with distributed systems as well, though this likely won’t be 
particularly valuable for the tasks I will be completing related to GCC. 
Regardless, libraries like gRPC and Apache Thrift are within my utility.
I did my undergrad in math and physics and am currently pursuing my masters in 
CS for a sort of career shift. I did plenty of programming in undergrad but 
mostly in Python (Physics research), so not entirely relevant.

Reply via email to