I've been kicking around an idea for some time, of starting a Clojure->CUDA compiler. I would like to start a discussion about this to figure out what some possible solutions are. First of all let me start a simple fact list:
CUDA (for those who don't know) is NVIDIA's technology for writing general use code for modern GPUs. The current system uses a subset of C++ as it's input. The code looks like small functions/classes that are executed for each thread of the GPU. These threads can number in the thousands, and the GPU commonly executes hundreds of these at one time. So, basically we're talking of running pmap on a system will 512+ cores. CUDA 4.0 supports some very advanced C++ features. As of 4.0 CUDA supports virtual functions, and new/delete....yes...your GPU code can allocate memory on the fly (if you have a GeForce 4xx or greater). My idea is to make a subset of Clojure translatable to CUDA. So you would create input data in native memory, the the Clojure functions would be translated to CUDA C++, then to CUDA binaries where they would be executed in the CPU. A very simple approach would be to take the view that may Clojure->SQL frameworks do, and simply do a translation. In this method all CUDA Clojure functions would take only arrays and scalar values as inputs, and the functions would read data from arrays, and output them to arrays. No sequences, on-the-fly allocation, or any such thing would be allowed. On top of that, all input and output data must be of the same type, so no mixing doubles and floats, or ints and longs. All data must be resolved to staticlly defined types, and mutating the variable's type on the fly is not allowed. The more complex approach would be to use something like ClojureScript to compile core.clj to CUDA, and actually run a subset of Clojure on the GPU. In this case we would have to come up with a simple type system, and then rewrite the ClojureScript compiler to output C++ code instead of JS. In addition, some sort of simple GC (reference counting?) would have to be developed. The result would be slower than my first approach, but would be much more flexible. ---- So in the first version we have a simple to create system, but we can't use many of the functions we are familiar with in CUDA. In the second method, we have a slower, but much more powerful system that would integrate much more tightly with existing code. ---- Any thoughts? Besides that I'm crazy... Timothy -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en