Henrik

you amaze me :)

Stef

Le 9/6/15 14:59, Henrik Johansen a écrit :
There are many ways to Rome :)
If you just need some externally allocated objects in the formats you specified you can do the cache extraction using nothing but normal Smalltalk:

intArray := (NBExternalArray ofType: 'int').

data := intArray new: 1000.
1 to:data size do:[:i |data at:i put: i].
cache := intArray new: 50.
0 to: 4 do: [:j |
1 to: 10 do: [ :k |
cache at: (j* 10) + k put: (data at: 199 + (30 * j ) + k)] ].

But if you want to take full advantage of the performance boost NB offers, you'd write a NativeBoost function to do the cache extraction*, as I outlined last time:
MyClass class >> #createCacheOf: aSource in: aDestination
createCacheOf: aSource in: aDestination
<primitive: #primitiveNativeCall module: #NativeBoostPlugin>
"Should work on both x86 and x64, as long as sizeOf: lookups work correctly"
^ self nbCallout
function: #(void (int * aSource, int * aDestination) )
emit: [:gen :proxy :asm | |destReg srcReg tmpReg intSize ptrSize|
intSize := NBExternalType sizeOf: 'int'.
ptrSize := NBExternalType sizeOf: 'void *'.
"Only use caller-saved regs, no preservation needed"
destReg := asm EAX as: ptrSize.
srcReg := asm ECX as: ptrSize.
tmpReg := asm EDX as: intSize.
asm pop: srcReg.
asm pop: destReg.
0 to: 4 do: [ :j | 0 to: 9 do: [ :offset |
asm
"Displacement in bytes, not ptr element size :S, so we have to multiply offset by that manually :S"
mov: tmpReg with: srcReg ptr + (199 + (j * 30) + offset * intSize);
mov: destReg ptr  + ((j* 10) + offset * intSize) with: tmpReg]]]

and use that;
intArray := (NBExternalArray ofType: 'int').
data := intArray new: 1000.
1 to:data size do:[:i |data at:i put: i].
cache := intArray new: 50.
MyClass createCacheOf: data in: cache.

The difference using a simple [] bench is about two orders of magnitude; 11million cache extractions per seconds for the inline assembly version, while the naive loop achieves around 110k.

Cheers,
Henry

*as: is not yet defined, could be something like:
AJx86GPRegister >> #as: aSize
^ self isHighByte
ifTrue: [ self asLowByte as: aSize ]
ifFalse: [
AJx86Registers
generalPurposeWithIndex: self index
size: aSize
requiresRex: self index > (aSize > 1 ifTrue: [7] ifFalse: [ 3])
prohibitsRex: false ]


On 09 Jun 2015, at 9:46 , Matthieu Lacaton <matthieu.laca...@gmail.com <mailto:matthieu.laca...@gmail.com>> wrote:

Hello Henrik,

Thank you very much for your answer. However, the code you provided is some sort of assembly right ? So does it mean that I need to learn assembly to do what I want ?

I'm asking that because I don't know anything about assembly so it will take me some time to learn.

Cheers,

Matthieu

2015-06-08 19:56 GMT+02:00 Henrik Johansen <henrik.s.johan...@veloxit.no <mailto:henrik.s.johan...@veloxit.no>>:


    > On 08 Jun 2015, at 4:41 , Matthieu Lacaton
    <matthieu.laca...@gmail.com <mailto:matthieu.laca...@gmail.com>>
    wrote:
    >
    > Hello everyone,
    >
    > I have a small question about NativeBoost : How does the "+"
    operator when applied to a pointer translates into NativeBoost code ?
    >
    > To give a bit of context, what I want to do is to reallocate
    some non-contiguous bytes in memory to a buffer. Basically, I
    have an array of integers in a buffer and I want to copy some
    chunks of it in another buffer. The chunks are always the same
    size and the offset between each chunk is always the same too.
    >
    > Because a bit of actual code is easier to understand here is
    what I'd like to do in Pharo :
    >
    > ...
    >
    > int i, j;
    > int *data = malloc(1000*sizeof(int));
    > int *newData = malloc(50*sizeof(int));
    >
    > // Allocate initial data
    > for (i = 0 ; i < 1000, i++) {
    >   data[i] = i;
    > }
    >
    > //Copy desired chunks into new buffer
    > for (i = 0; i < 5; i++ ) {
    >   memcpy( newData + j*10, data + 200 + j*30, 10*sizeof(int));
    >   j++;
    > }
    >
    > free(data);



    You can do relative addressing like this:
    (destReg ptr: dataSize) + offsetReg + constant

    So with offSetRegs containing j* 10 and j* 30, you might end up
    with an unrolled inner loop (barring using any fancier
    longer-than-int moves) like:

    0 to: 9 do: [:constantOffset |
            asm mov: (destReg ptr: currentPlatform sizeOfInt) +
    dstOffsetReg + constantOffset  with: (srcReg ptr: currentPlatform
    sizeOfInt) + 200 + srcOffsetReg + constantOffset]

    If the range of j is constant, you can just as easily unroll the
    whole thing in a similarly compact fashion, space and
    sensibilites permitting:

    0 to: 4 do: [ :j | 0 to: 9 do: [ :consOffset |
            asm mov: (destReg ptr: currentPlatform sizeOfInt) + (j*
    10) + constOffset  with: (srcReg ptr: currentPlatform sizeOfInt)
    + 200 + (j * 30) + constOffset]

    Cheers,
    Henry




Reply via email to