There are many ways to Rome :) If you just need some externally allocated objects in the formats you specified you can do the cache extraction using nothing but normal Smalltalk:
intArray := (NBExternalArray ofType: 'int'). data := intArray new: 1000. 1 to:data size do:[:i |data at:i put: i]. cache := intArray new: 50. 0 to: 4 do: [:j | 1 to: 10 do: [ :k | cache at: (j* 10) + k put: (data at: 199 + (30 * j ) + k)] ]. But if you want to take full advantage of the performance boost NB offers, you'd write a NativeBoost function to do the cache extraction*, as I outlined last time: MyClass class >> #createCacheOf: aSource in: aDestination createCacheOf: aSource in: aDestination <primitive: #primitiveNativeCall module: #NativeBoostPlugin> "Should work on both x86 and x64, as long as sizeOf: lookups work correctly" ^ self nbCallout function: #(void (int * aSource, int * aDestination) ) emit: [:gen :proxy :asm | |destReg srcReg tmpReg intSize ptrSize| intSize := NBExternalType sizeOf: 'int'. ptrSize := NBExternalType sizeOf: 'void *'. "Only use caller-saved regs, no preservation needed" destReg := asm EAX as: ptrSize. srcReg := asm ECX as: ptrSize. tmpReg := asm EDX as: intSize. asm pop: srcReg. asm pop: destReg. 0 to: 4 do: [ :j | 0 to: 9 do: [ :offset | asm "Displacement in bytes, not ptr element size :S, so we have to multiply offset by that manually :S" mov: tmpReg with: srcReg ptr + (199 + (j * 30) + offset * intSize); mov: destReg ptr + ((j* 10) + offset * intSize) with: tmpReg]]] and use that; intArray := (NBExternalArray ofType: 'int'). data := intArray new: 1000. 1 to:data size do:[:i |data at:i put: i]. cache := intArray new: 50. MyClass createCacheOf: data in: cache. The difference using a simple [] bench is about two orders of magnitude; 11million cache extractions per seconds for the inline assembly version, while the naive loop achieves around 110k. Cheers, Henry *as: is not yet defined, could be something like: AJx86GPRegister >> #as: aSize ^ self isHighByte ifTrue: [ self asLowByte as: aSize ] ifFalse: [ AJx86Registers generalPurposeWithIndex: self index size: aSize requiresRex: self index > (aSize > 1 ifTrue: [7] ifFalse: [ 3]) prohibitsRex: false ] > On 09 Jun 2015, at 9:46 , Matthieu Lacaton <matthieu.laca...@gmail.com> wrote: > > Hello Henrik, > > Thank you very much for your answer. However, the code you provided is some > sort of assembly right ? So does it mean that I need to learn assembly to do > what I want ? > > I'm asking that because I don't know anything about assembly so it will take > me some time to learn. > > Cheers, > > Matthieu > > 2015-06-08 19:56 GMT+02:00 Henrik Johansen <henrik.s.johan...@veloxit.no > <mailto:henrik.s.johan...@veloxit.no>>: > > > On 08 Jun 2015, at 4:41 , Matthieu Lacaton <matthieu.laca...@gmail.com > > <mailto:matthieu.laca...@gmail.com>> wrote: > > > > Hello everyone, > > > > I have a small question about NativeBoost : How does the "+" operator when > > applied to a pointer translates into NativeBoost code ? > > > > To give a bit of context, what I want to do is to reallocate some > > non-contiguous bytes in memory to a buffer. Basically, I have an array of > > integers in a buffer and I want to copy some chunks of it in another > > buffer. The chunks are always the same size and the offset between each > > chunk is always the same too. > > > > Because a bit of actual code is easier to understand here is what I'd like > > to do in Pharo : > > > > ... > > > > int i, j; > > int *data = malloc(1000*sizeof(int)); > > int *newData = malloc(50*sizeof(int)); > > > > // Allocate initial data > > for (i = 0 ; i < 1000, i++) { > > data[i] = i; > > } > > > > //Copy desired chunks into new buffer > > for (i = 0; i < 5; i++ ) { > > memcpy( newData + j*10, data + 200 + j*30, 10*sizeof(int)); > > j++; > > } > > > > free(data); > > > > You can do relative addressing like this: > (destReg ptr: dataSize) + offsetReg + constant > > So with offSetRegs containing j* 10 and j* 30, you might end up with an > unrolled inner loop (barring using any fancier longer-than-int moves) like: > > 0 to: 9 do: [:constantOffset | > asm mov: (destReg ptr: currentPlatform sizeOfInt) + dstOffsetReg + > constantOffset with: (srcReg ptr: currentPlatform sizeOfInt) + 200 + > srcOffsetReg + constantOffset] > > If the range of j is constant, you can just as easily unroll the whole thing > in a similarly compact fashion, space and sensibilites permitting: > > 0 to: 4 do: [ :j | 0 to: 9 do: [ :consOffset | > asm mov: (destReg ptr: currentPlatform sizeOfInt) + (j* 10) + > constOffset with: (srcReg ptr: currentPlatform sizeOfInt) + 200 + (j * 30) + > constOffset] > > Cheers, > Henry >