Heinz:
> void myFunct()
> {
> uint* p = myarray.ptr;
> asm
> {
> mov EBX, p;
>
> mov EAX, [EBX + 4];
> rol EAX, 8;
> mov [EBX + 4], EAX;
>
> mov EAX, [EBX + 8];
> rol EAX, 16;
> mov [EBX + 8], EAX;
>
> mov EAX, [EBX + 12];
> rol EAX, 24;
> mov [EBX + 12], EAX;
> }
> }
I see you have removed the asm guard I have shown you.
I suggest you to benchmark it against another normal D function. Keep in mind
that asm blocks kill inlining.
Also try to perform a load-load-load processing-processing-processing
store-store-store instead a load-processing-store load-processing-store
load-processing-store, because this often helps the pipelining of the processor
(expecially when you use SSE/AVX registers).
Bye,
bearophile