I see two ways to fix it:
1) use frame pointer relative addressing: + prederefed code is usable by different threads too - ~4 times increase in code size of core_ops_*.{c,o} [1]
2) Re-prederef on function calls, if frame pointer differs + no impact on code size - needs precise code length of functions - threads need distinct prederefed code - possibly slower then 1)
Comments welcome, leo
[1] due to absolute addressing a constant argument and a register argument have the same code, set_i_ic and set_i_i are the same.