... And with that limitation, I'd rather have a lower-overhead JIT with a win for the shorter programs than a high-overhead one with a win for long-running programs.
I see that limitation. But currently we have a high overhead JIT. The problem is not so much program run time, but load time.
One example: t/op/stacks_33.pasm (8242 lines) because of macros expands to 38955 lines, giving 4102 basic blocks and 6150 edges connecting them.
compile/run options and timings (first 4 include running) plain 1.07 -P 1.09 -j 2.4 -Oj 2.3 -ox.pbc / -j 1.07 + 1.3 -ox.pbc -Oj /-j 2.1 + 0.2
So writing out minimal CFG (blocks & Branch targets) + register usage gives 6 times the startup speed for this -Oj compiled PBC file. Program run time is ~0.
BTW, running the -Oj compiled PBC with a normal core does succeed (including correct output), albeit there are a lot of out of bound register accesses (which go to high integer regs)
PC=12; OP=82 (set_n_ic); ARGS=(N-2=0, 0) PC=15; OP=82 (set_n_ic); ARGS=(N-4=0, 1024) PC=18; OP=79 (set_n_n); ARGS=(N3=0, N-4=1024) PC=21; OP=79 (set_n_n); ARGS=(N2=0, N-3=0) PC=24; OP=79 (set_n_n); ARGS=(N1=0, N-2=0) PC=27; OP=79 (set_n_n); ARGS=(N0=0, N-1=0) PC=30; OP=678 (pushn)
I think, that the -b option should have a check for this.
(timings from a PIII/600, imcc -O3 compiled)
leo