There are many ways to Rome :)
If you just need some externally allocated objects in the formats you specified 
you can do the cache extraction using nothing but normal Smalltalk:

intArray := (NBExternalArray ofType: 'int').

data := intArray new: 1000.
1 to:data size do:[:i |data at:i put: i].
cache := intArray new: 50.
0 to: 4 do: [:j | 
        1 to: 10 do: [ :k |
                cache at: (j* 10) + k put: (data at: 199 + (30 * j ) + k)] ].

But if you want to take full advantage of the performance boost NB offers, 
you'd write a NativeBoost function to do the cache extraction*, as I outlined 
last time:
MyClass class >> #createCacheOf: aSource in: aDestination
createCacheOf: aSource in: aDestination
<primitive: #primitiveNativeCall module: #NativeBoostPlugin>
        "Should work on both x86 and x64, as long as sizeOf: lookups work 
correctly"
        ^ self nbCallout 
                function: #(void (int * aSource, int * aDestination) ) 
                emit: [:gen :proxy :asm | |destReg srcReg tmpReg intSize 
ptrSize|
                        intSize := NBExternalType sizeOf: 'int'.
                        ptrSize := NBExternalType sizeOf: 'void *'.
                        "Only use caller-saved regs, no preservation needed"
                        destReg := asm EAX as: ptrSize.
                        srcReg := asm ECX as: ptrSize.
                        tmpReg := asm EDX as: intSize.
                        asm pop: srcReg.
                        asm pop: destReg.
                        0 to: 4 do: [ :j | 0 to: 9 do: [ :offset |
                                asm 
                                        "Displacement in bytes, not ptr element 
size :S, so we have to multiply offset by that manually :S"
                                        mov: tmpReg with: srcReg ptr + (199 + 
(j * 30) + offset * intSize);
                                        mov: destReg ptr  + ((j* 10) + offset * 
intSize) with: tmpReg]]]  

and use that;
intArray := (NBExternalArray ofType: 'int').
data := intArray new: 1000. 
1 to:data size do:[:i |data at:i put: i].
cache := intArray new: 50.
MyClass createCacheOf: data in: cache.

The difference using a simple [] bench is about two orders of magnitude; 
11million cache extractions per seconds for the inline assembly version, while 
the naive loop achieves around 110k.

Cheers,
Henry

*as: is not yet defined, could be something like:
AJx86GPRegister >> #as: aSize
        ^ self isHighByte
                ifTrue: [ self asLowByte as: aSize ]
                ifFalse: [ 
                        AJx86Registers
                                generalPurposeWithIndex: self index
                                size: aSize
                                requiresRex: self index > (aSize > 1 ifTrue: 
[7] ifFalse: [ 3])
                                prohibitsRex: false ]


> On 09 Jun 2015, at 9:46 , Matthieu Lacaton <matthieu.laca...@gmail.com> wrote:
> 
> Hello Henrik,
> 
> Thank you very much for your answer. However, the code you provided is some 
> sort of assembly right ? So does it mean that I need to learn assembly to do 
> what I want ?
> 
> I'm asking that because I don't know anything about assembly so it will take 
> me some time to learn.
> 
> Cheers, 
> 
> Matthieu
> 
> 2015-06-08 19:56 GMT+02:00 Henrik Johansen <henrik.s.johan...@veloxit.no 
> <mailto:henrik.s.johan...@veloxit.no>>:
> 
> > On 08 Jun 2015, at 4:41 , Matthieu Lacaton <matthieu.laca...@gmail.com 
> > <mailto:matthieu.laca...@gmail.com>> wrote:
> >
> > Hello everyone,
> >
> > I have a small question about NativeBoost : How does the "+" operator when 
> > applied to a pointer translates into NativeBoost code ?
> >
> > To give a bit of context, what I want to do is to reallocate some 
> > non-contiguous bytes in memory to a buffer. Basically, I have an array of 
> > integers in a buffer and I want to copy some chunks of it in another 
> > buffer. The chunks are always the same size and the offset between each 
> > chunk is always the same too.
> >
> > Because a bit of actual code is easier to understand here is what I'd like 
> > to do in Pharo :
> >
> > ...
> >
> > int i, j;
> > int *data = malloc(1000*sizeof(int));
> > int *newData = malloc(50*sizeof(int));
> >
> > // Allocate initial data
> > for (i = 0 ; i < 1000, i++) {
> >   data[i] = i;
> > }
> >
> > //Copy desired chunks into new buffer
> > for (i = 0; i < 5; i++ ) {
> >   memcpy( newData + j*10, data + 200 + j*30, 10*sizeof(int));
> >   j++;
> > }
> >
> > free(data);
> 
> 
> 
> You can do relative addressing like this:
> (destReg ptr: dataSize) + offsetReg + constant
> 
> So with offSetRegs containing j* 10 and j* 30, you might end up with an 
> unrolled inner loop (barring using any fancier longer-than-int moves) like:
> 
> 0 to: 9 do: [:constantOffset |
>         asm mov: (destReg ptr: currentPlatform sizeOfInt) + dstOffsetReg + 
> constantOffset  with: (srcReg ptr: currentPlatform sizeOfInt) + 200 + 
> srcOffsetReg + constantOffset]
> 
> If the range of j is constant, you can just as easily unroll the whole thing 
> in a similarly compact fashion, space and sensibilites permitting:
> 
> 0 to: 4 do: [ :j | 0 to: 9 do: [ :consOffset |
>         asm mov: (destReg ptr: currentPlatform sizeOfInt) + (j* 10) + 
> constOffset  with: (srcReg ptr: currentPlatform sizeOfInt) + 200 + (j * 30) + 
> constOffset]
> 
> Cheers,
> Henry
> 

Reply via email to