JEP Vector API (Incubator). A funny use case and a question.

Davide Perini Fri, 09 Aug 2024 08:56:45 -0700

Hi there,
thanks for the opportunity that you give us to write on this mailing-list.

I'm am playing with the Vector API bundled in Java 22 and wow, they areamazing.I have some serious benefits using them even for simple tasks on my AMDRyzen 9 7950X3D CPU that uses Zen4 architecture.

Can't wait to see how bigger the benefits will be on the upcomingprocessors that has some serious optimized AVX512 instructions (AMD Zen5architecture and Intel AV10 instructions).


I'll try to give you some context.

I am writing an open source software that is basically a free clone ofthe Philips Ambilight effect.


What is it?

Basically you put a LED strip behind your monitor/TV, the softwarecapture the screen,it calculates the average colors of your screen, and sends those averagevalues to a microcontroller (arduino) that drives the strip and light upthe LEDs accordingly.

This effect is also known as dynamic bias light.
More info here if you are curious:
https://github.com/sblantipodi/firefly_luciferin

Most of the computations involved are on the GPU side but some intensiveones are on the CPU side.


Let's go deeper on the Vector API.

GPU acquire the screen image 60 times per seconds (or even more), everyframe is a Buffer that contains colors information for each pixel of theframe.This buffer is a Java Direct IntBuffer that doesn't have a correspondingarray inside the heap for performance reason.

Once I have this IntBuffer I need to calculate the average colors of thescreen and this thing can be made on the fly on the IntBuffer withoutcopying the IntBuffer inside an Array. This kind of copy is reallyreally heavy and degrade performance.


Just a snippet that shows it without using the Vector API...

for (int y =0; y < pixelInUseY; y++) {
    for (int x =0; x < pixelInUseX; x++) {
        int offsetX = (xCoordinate + x);
        int offsetY = (yCoordinate + y);
        int bufferOffset = (Math.min(offsetX,widthPlusStride)) + ((offsetY 
<height) ? (offsetY *widthPlusStride) : (height *widthPlusStride));
        int rgb =rgbBuffer.get(Math.min(rgbBuffer.capacity() -1, bufferOffset));
        r += rgb >>16 &0xFF;
        g += rgb >>8 &0xFF;
        b += rgb &0xFF;
        pickNumber++;
    }
}
leds[key -1] = ImageProcessor.correctColors(r, g, b, pickNumber);

Now I'm trying to use the Vector API to accelerate this computationseven more and hey, it worked awesome.Using AVX512 (Species512) the computations is 40%-80% faster thanwithout the Vector API.


int firstLimit;
int secondLimit;

// Processing the buffer in the correct order is crucial for SIMDperformance if (pixelInUseX < pixelInUseY) {

    firstLimit = pixelInUseX;
    secondLimit = pixelInUseY;
}else {
    firstLimit = pixelInUseY;
    secondLimit = pixelInUseX;
}
// SIMD iteration for (int x =0; x < firstLimit; x++) {
    for (int y =0; y < secondLimit; y += 
MainSingleton.getInstance().SPECIES.length()) {
        int offsetX;
        int offsetY;
        if (pixelInUseX < pixelInUseY) {
            offsetX = (xCoordinate + x);
            offsetY = (yCoordinate + y);
        }else {
            offsetX = (xCoordinate + y);
            offsetY = (yCoordinate + x);
        }
        int bufferOffset = (Math.min(offsetX,widthPlusStride)) + ((offsetY 
<height) ? (offsetY *widthPlusStride) : (height *widthPlusStride));
        // Load RGB values using SIMD int[] rgbArray =new 
int[MainSingleton.getInstance().SPECIES.length()];
        rgbBuffer.position(bufferOffset);
        rgbBuffer.get(rgbArray,0, 
Math.min(MainSingleton.getInstance().SPECIES.length(),rgbBuffer.remaining()));
        IntVector rgbVector = 
IntVector.fromArray(MainSingleton.getInstance().SPECIES, rgbArray,0);
        r += rgbVector.lane(0) >>16 &0xFF;
        g += rgbVector.lane(1) >>8 &0xFF;
        b += rgbVector.lane(2) &0xFF;
        pickNumber++;
    }
}
leds[key -1] = ImageProcessor.correctColors(r, g, b, pickNumber);

The computation itself is at least ten times faster but at the end it'sonly 40%-80% faster because I'm not able to process the IntBuffer on thefly using Vector API.As you can see in the previous snippet I need to copy part of theIntBuffer into an int[] array and then process it using the Vector API.

This copy alone is the thing that requires more time.

Is it possible to process a direct IntBuffer with the Vector API withoutloosing time in an array copy?


Thank you for this wonderful API.

Kind regards
Davide

JEP Vector API (Incubator). A funny use case and a question.

Reply via email to