Hi,
I am working on a compact library [1] for JIT compilation of array
operations. It only runs on AMD64 processors. Currently it supports array
operations using booleans, integers, and integer RGB and integer complex
numbers.
There are still important things missing: floating point numbers,
compiling calls to C methods (e.g. sin, cos, ...), tensor operations,
convolutions, ... I would like to eventually do numerical processing
similar to Python's NumPy (but more generic), Theano (but more compact
syntax as facilitated by macros), and OpenCV.
Here is an example adding an integer to each element of a 2D array and
returning the result:
scheme@(guile-user)> (use-modules (oop goops) (aiscm jit) (aiscm int)
(aiscm pointer) (aiscm sequence))
scheme@(guile-user)> (+ (arr (2 3 5) (7 11 13)) 3)
$1 = #<sequence<sequence<int<8,unsigned>>>>:
((5 6 8)
(10 14 16))
The fallback method for the GOOPS generic "+" adds a JIT compiled plus
operation for the specific array types to the generic and then calls "+"
again.
The corresponding machine code to produce the unsigned byte array is
shown below:
0: 4c 89 64 24 f0 mov QWORD PTR [rsp-0x10],r12
5: 48 89 6c 24 e8 mov QWORD PTR [rsp-0x18],rbp
a: 4c 89 7c 24 e0 mov QWORD PTR [rsp-0x20],r15
f: 4c 89 74 24 d8 mov QWORD PTR [rsp-0x28],r14
14: 4c 89 6c 24 d0 mov QWORD PTR [rsp-0x30],r13
19: 48 89 5c 24 c8 mov QWORD PTR [rsp-0x38],rbx
1e: 48 89 7c 24 f8 mov QWORD PTR [rsp-0x8],rdi
23: 4c 8b 64 24 08 mov r12,QWORD PTR [rsp+0x8]
28: 48 8b 7c 24 18 mov rdi,QWORD PTR [rsp+0x18]
2d: 48 8b 6c 24 20 mov rbp,QWORD PTR [rsp+0x20]
32: 8a 44 24 28 mov al,BYTE PTR [rsp+0x28]
36: 48 6b de 01 imul rbx,rsi,0x1
3a: 49 8b f0 mov rsi,r8
3d: 4d 6b cc 01 imul r9,r12,0x1
41: 4c 8b fd mov r15,rbp
44: 49 be 00 00 00 00 00 movabs r14,0x0
4b: 00 00 00
4e: 4c 8b 44 24 f8 mov r8,QWORD PTR [rsp-0x8]
53: 4d 3b f0 cmp r14,r8
56: 74 3e je 0x96
58: 49 ff c6 inc r14
5b: 4c 6b d9 01 imul r11,rcx,0x1
5f: 4c 8b ee mov r13,rsi
62: 4c 6b d7 01 imul r10,rdi,0x1
66: 4d 8b e7 mov r12,r15
69: 48 bd 00 00 00 00 00 movabs rbp,0x0
70: 00 00 00
73: 48 3b ea cmp rbp,rdx
76: 74 16 je 0x8e
78: 48 ff c5 inc rbp
7b: 45 8a 04 24 mov r8b,BYTE PTR [r12]
7f: 44 02 c0 add r8b,al
82: 45 88 45 00 mov BYTE PTR [r13+0x0],r8b
86: 4d 03 eb add r13,r11
89: 4d 03 e2 add r12,r10
8c: eb e5 jmp 0x73
8e: 48 03 f3 add rsi,rbx
91: 4d 03 f9 add r15,r9
94: eb b8 jmp 0x4e
96: 4c 8b 64 24 f0 mov r12,QWORD PTR [rsp-0x10]
9b: 48 8b 6c 24 e8 mov rbp,QWORD PTR [rsp-0x18]
a0: 4c 8b 7c 24 e0 mov r15,QWORD PTR [rsp-0x20]
a5: 4c 8b 74 24 d8 mov r14,QWORD PTR [rsp-0x28]
aa: 4c 8b 6c 24 d0 mov r13,QWORD PTR [rsp-0x30]
af: 48 8b 5c 24 c8 mov rbx,QWORD PTR [rsp-0x38]
b4: c3 ret
Any comments,suggestions, and feedback are welcome!
Regards
Jan
[1] https://github.com/wedesoft/aiscm