Henrik
you amaze me :)
Stef
Le 9/6/15 14:59, Henrik Johansen a écrit :
There are many ways to Rome :)
If you just need some externally allocated objects in the formats you
specified you can do the cache extraction using nothing but normal
Smalltalk:
intArray := (NBExternalArray ofType: 'int').
data := intArray new: 1000.
1 to:data size do:[:i |data at:i put: i].
cache := intArray new: 50.
0 to: 4 do: [:j |
1 to: 10 do: [ :k |
cache at: (j* 10) + k put: (data at: 199 + (30 * j ) + k)] ].
But if you want to take full advantage of the performance boost NB
offers, you'd write a NativeBoost function to do the cache
extraction*, as I outlined last time:
MyClass class >> #createCacheOf: aSource in: aDestination
createCacheOf: aSource in: aDestination
<primitive: #primitiveNativeCall module: #NativeBoostPlugin>
"Should work on both x86 and x64, as long as sizeOf: lookups work
correctly"
^ self nbCallout
function: #(void (int * aSource, int * aDestination) )
emit: [:gen :proxy :asm | |destReg srcReg tmpReg intSize ptrSize|
intSize := NBExternalType sizeOf: 'int'.
ptrSize := NBExternalType sizeOf: 'void *'.
"Only use caller-saved regs, no preservation needed"
destReg := asm EAX as: ptrSize.
srcReg := asm ECX as: ptrSize.
tmpReg := asm EDX as: intSize.
asm pop: srcReg.
asm pop: destReg.
0 to: 4 do: [ :j | 0 to: 9 do: [ :offset |
asm
"Displacement in bytes, not ptr element size :S, so we have to
multiply offset by that manually :S"
mov: tmpReg with: srcReg ptr + (199 + (j * 30) + offset * intSize);
mov: destReg ptr + ((j* 10) + offset * intSize) with: tmpReg]]]
and use that;
intArray := (NBExternalArray ofType: 'int').
data := intArray new: 1000.
1 to:data size do:[:i |data at:i put: i].
cache := intArray new: 50.
MyClass createCacheOf: data in: cache.
The difference using a simple [] bench is about two orders of
magnitude; 11million cache extractions per seconds for the inline
assembly version, while the naive loop achieves around 110k.
Cheers,
Henry
*as: is not yet defined, could be something like:
AJx86GPRegister >> #as: aSize
^ self isHighByte
ifTrue: [ self asLowByte as: aSize ]
ifFalse: [
AJx86Registers
generalPurposeWithIndex: self index
size: aSize
requiresRex: self index > (aSize > 1 ifTrue: [7] ifFalse: [ 3])
prohibitsRex: false ]
On 09 Jun 2015, at 9:46 , Matthieu Lacaton
<matthieu.laca...@gmail.com <mailto:matthieu.laca...@gmail.com>> wrote:
Hello Henrik,
Thank you very much for your answer. However, the code you provided
is some sort of assembly right ? So does it mean that I need to learn
assembly to do what I want ?
I'm asking that because I don't know anything about assembly so it
will take me some time to learn.
Cheers,
Matthieu
2015-06-08 19:56 GMT+02:00 Henrik Johansen
<henrik.s.johan...@veloxit.no <mailto:henrik.s.johan...@veloxit.no>>:
> On 08 Jun 2015, at 4:41 , Matthieu Lacaton
<matthieu.laca...@gmail.com <mailto:matthieu.laca...@gmail.com>>
wrote:
>
> Hello everyone,
>
> I have a small question about NativeBoost : How does the "+"
operator when applied to a pointer translates into NativeBoost code ?
>
> To give a bit of context, what I want to do is to reallocate
some non-contiguous bytes in memory to a buffer. Basically, I
have an array of integers in a buffer and I want to copy some
chunks of it in another buffer. The chunks are always the same
size and the offset between each chunk is always the same too.
>
> Because a bit of actual code is easier to understand here is
what I'd like to do in Pharo :
>
> ...
>
> int i, j;
> int *data = malloc(1000*sizeof(int));
> int *newData = malloc(50*sizeof(int));
>
> // Allocate initial data
> for (i = 0 ; i < 1000, i++) {
> data[i] = i;
> }
>
> //Copy desired chunks into new buffer
> for (i = 0; i < 5; i++ ) {
> memcpy( newData + j*10, data + 200 + j*30, 10*sizeof(int));
> j++;
> }
>
> free(data);
You can do relative addressing like this:
(destReg ptr: dataSize) + offsetReg + constant
So with offSetRegs containing j* 10 and j* 30, you might end up
with an unrolled inner loop (barring using any fancier
longer-than-int moves) like:
0 to: 9 do: [:constantOffset |
asm mov: (destReg ptr: currentPlatform sizeOfInt) +
dstOffsetReg + constantOffset with: (srcReg ptr: currentPlatform
sizeOfInt) + 200 + srcOffsetReg + constantOffset]
If the range of j is constant, you can just as easily unroll the
whole thing in a similarly compact fashion, space and
sensibilites permitting:
0 to: 4 do: [ :j | 0 to: 9 do: [ :consOffset |
asm mov: (destReg ptr: currentPlatform sizeOfInt) + (j*
10) + constOffset with: (srcReg ptr: currentPlatform sizeOfInt)
+ 200 + (j * 30) + constOffset]
Cheers,
Henry