Re: [Pharo-users] Using VirtualGPU

Ronie Salgado Tue, 19 May 2015 02:31:47 -0700

Hi all,

Sorry for answering very late, I am busy in ICSE where tomorrow I have to
defend my poster for the Student Research Competition. The topic is
profiling over the OpenCL bindings.


We are looking at the code of OpenCL and VirtualGPU done by Ronnie.
> What we have understand until now :
> - OpenCL package : low-level stuff to be able to interface OpenCL
> kernels with Pharo
> - VirtualGPU: high-level API on top of OpenCL in order to ease the
> task of people who wants to use OpenCL. VirtualGPU provide high-level
> operations on matrix and image at the moment.

This is correct.

@Ronie: What is not clear at the moment in our mind : when you build a
> VirtualGPU program with the DSL, do you have the overhead of
> communications every time you execute a VirtualGPU instruction or all
> the the instructions are sent at the same time and run on the GPU ?
>
The DSL, is actually an abstraction over the OpenCL API. Each operation, is
stored in a simple intermediate representation, that is used to call a
single OpenCL kernel.

There is no overhead in terms of memory transfers between intermediate
operations, because there kept in the GPU the whole. There is an overhead
in terms of kernel dispatching. For example, the expression a + b * 0.5  in
the VGPU DSL is interpreted as the following pseudo code:

 temp := opencl invokeKernel: 'add' a with: b.
 temp2 := opencl invokeKernel: 'mulScalar' temp with: 0.5.

A custom crafted code would do something like this:

temp := opencl invokeKernel: 'addAndMulScalar' with: a with: b with: 0.5

The VGPU does not do the latter for simplicity. Currently, it does not
generate any kind of OpenCL C code. It works by composing simple functions.
Perhaps, in the future I will add a code generation step for optimization.

Another problem, is the proliferation of intermediate buffers. There are
some samples that avoid using intermediate buffers by using add:into:
instead of +. The into buffer is just where the result is going to be
placed. Look into VirtualGPUSamples >> #imageChangedForGradient,
VirtualGPUSamples >> #imageChangedForGradientOptimized , VirtualGPUSamples
>> #imageChangedForGradientOptimizedMore .



> In our context, for building a GSSA algorithm, I guess we just have to
> combine same VGPU instructions (matrix computations) but for doing SPH
> simulations, we will have to provide our own instructions. Is there
> any documentation in order to add own kernel and instructions ?
>

There is not documentation. We should have some tool like doxygen so that I
can write the documentation when I am writing the methods.

Anyway, I will document here for now.

First of all, look at the existing kernels. For that, put this in a
playground and do it (you need the GTInspector):

EmbObjectBrowser openBrowser.

That will open a browser that I use to edit the OpenCL C code. It does have
some bugs, but it is better than editting a huge string in a smalltalk
method. Lets look in VGPULinearAlgebraSources. There you will see the
'kernels' category and inside of it two methods: #matrixKernels' and
#vectorKernels . If you look #vectorKernels , you will see just the OpenCL
C code.

If you now go to Nautilus, and look the VGPULinearAlgebraSources class, you
will see that it is a subclass of EmbObjectContainer. vectorKernels and
matrixKernels are Smalltalk methods. vectorKernels looks like this:

vectorKernels
    <embeddedObject>
    ^ '
// Vector binary operations
__kernel void floatVector_add(__global float *left, __global float *right,
__global float *result)
...
'

For an example, of actually invoking the kernel, you should look at the
following methods:
VGPUFloatMatrix >> #abs
VGPUFloatMatrix >> #absInto
VGPUFloatMatrix >> #discreteGradient
VGPUFloatMatrix >> #discreteGradientInto

As for the OpenCL package, it just provides bindings for the C OpenCL API.
You can also use it if you want, but your are on your own :) .

I made some experiments with GPU computing with JavaScript. I understood
> that the computings are made forever inside the GPU, so you just throw data
> and programs once and you let the GPU compute for you. Yet the problem is
> to read the data once they are computed. With webGL it seems impossible,
> with webCL it is possible and not easy. Hence the choice of openCL I guess.
>
> I relate here how I could compute the powers of a Markov matrix here:
> http://revue.sesamath.net/spip.php?article651 (especially click on "webGL
> sans three.js"). I also made some experiments here:
> http://irem.univ-reunion.fr/spip.php?article797 (but they use three.js if
> I remember well)


I have been taking a look at those so called HTML5 WebGL technology. It
does not convince me, since I am more interested in desktop application. I
was thinking on making a 3D level editor in html5, js and webgl to test the
technology, it seems to have many problems. The last week with Milton we
found an easy way to draw a Morph into a Woden texture, so I am going to be
using Pharo for the editor.

The main problem that I have with the web technologies, is the one size fit
all mentality that surrounds them. Javascript as IR, the ugly asm.js hack,
and the lack of support for UDP socket which any reasonable real time
online game requires.

Best regards,
Ronie

2015-05-12 18:36 GMT+02:00 Alain Busser <alain.bus...@gmail.com>:

> Hi Serge,
>
> I made some experiments with GPU computing with JavaScript. I understood
> that the computings are made forever inside the GPU, so you just throw data
> and programs once and you let the GPU compute for you. Yet the problem is
> to read the data once they are computed. With webGL it seems impossible,
> with webCL it is possible and not easy. Hence the choice of openCL I guess.
>
> I relate here how I could compute the powers of a Markov matrix here:
> http://revue.sesamath.net/spip.php?article651 (especially click on "webGL
> sans three.js"). I also made some experiments here:
> http://irem.univ-reunion.fr/spip.php?article797 (but they use three.js if
> I remember well)
>
> Happy readings, and, yes, I feel interersted in these subjects
>
> Alain
>
> On Tue, May 12, 2015 at 7:48 PM, Serge Stinckwich <
> serge.stinckw...@gmail.com> wrote:
>
>> Dear all,
>>
>> just to let you know, Cheikhou (in CC) is starting a student
>> internship in my lab.
>> He will work on Epidemiology Modelling with KENDRICK:
>> http://ummisco.github.io/kendrick/
>> the platform that we are developing in order to analyse and visualise
>> diseases models behaviours.
>>
>> We would like first to implement a GPU version of the Gillespie
>> Stochastic Simulation Algorithm (GSSA):
>> http://en.wikipedia.org/wiki/Gillespie_algorithm and after that also
>> implement SPH simulations:
>> https://en.wikipedia.org/wiki/Smoothed-particle_hydrodynamics
>>
>> We are looking at the code of OpenCL and VirtualGPU done by Ronnie.
>> What we have understand until now :
>> - OpenCL package : low-level stuff to be able to interface OpenCL
>> kernels with Pharo
>> - VirtualGPU: high-level API on top of OpenCL in order to ease the
>> task of people who wants to use OpenCL. VirtualGPU provide high-level
>> operations on matrix and image at the moment.
>>
>> @Ronie: What is not clear at the moment in our mind : when you build a
>> VirtualGPU program with the DSL, do you have the overhead of
>> communications every time you execute a VirtualGPU instruction or all
>> the the instructions are sent at the same time and run on the GPU ?
>>
>> In our context, for building a GSSA algorithm, I guess we just have to
>> combine same VGPU instructions (matrix computations) but for doing SPH
>> simulations, we will have to provide our own instructions. Is there
>> any documentation in order to add own kernel and instructions ?
>>
>> I know that others guys at INRIA (Stéphane ?) are interested by GPU.
>> Is it possible to join our effort to share what we are doing ?
>>
>> Regards,
>> --
>> Serge Stinckwich
>> UCBN & UMI UMMISCO 209 (IRD/UPMC)
>> Every DSL ends up being Smalltalk
>> http://www.doesnotunderstand.org/
>>
>>
>

Re: [Pharo-users] Using VirtualGPU

Reply via email to