Hi, I'm currently implementing support for hardware transactional memory in the S/390 backend and ran into a problem with saving and restoring the floating point registers.
On S/390 the tbegin instruction starts a transaction. If a subsequent memory access collides with another the transaction is aborted. The execution then continues *after* the tbegin instruction. All memory writes after the tbegin are rolled back, the general purpose registers selected in the tbegin operand are restored, and the condition code is set in order indicate that an abort occurred. What the code then is supposed to do is to check the condition code and either jump back to the transaction if it is a temporary failure or provide an alternate implementation using e.g. a lock. Unfortunately our tbegin instruction does not save the floating point registers leaving it to the compiler to make sure the old values get restored. This will be necessary if the abort code relies on these values and the transaction body modifies them. With my current approach I try to place FPR clobbers to trigger GCC generating the right save/restore operations. This has some drawbacks: - Bundling the clobbers with the tbegin causes FPRs to be restored even in the good path (the transaction never aborts). - Placing the clobbers on the abort path kinda works. However it is not really correct. GCC could decide to wrap the save/restore operations just around the clobbers what would be wrong. A solution to that might be to (that's what I'm currently working on): - Bundle the tbegin with the condtional jump to the abort code in order to prevent GCC from saving the FPRs right after the tbegin. - Direct an abnormal edge to the abort code to tell GCC that the FPRs are actually clobbered from somewhere outside (as with EH). Does this sound reasonable? The point is that not all the execution paths through tbegin actually clobber FPRs. It is only true for the paths which lead to the abort code in the end. So another solution might be to implement support for conditional clobbers. Clobbers wrapped into a cond_exec perhaps. I'm not sure how difficult this would be to implement and whether it would be worth it?! This also has implications for the ABI and the prologue/epilogue generation. Consider a function with just a tbegin: int foo () { return __builtin_tbegin (); } foo needs to save and restore *all* the call-saved FPRs since the transaction body continuing in the caller of foo might modify a call-saved FPR and trigger an abort. If foo would not save and restore the FPRs it could end up clobbering call-saved FPRs violating the ABI. (Note: Be aware that since transactions roll back all memory operations this also applies to stack manipulations. So with a function like foo above it will happen that during an abort you return to a callee which already returned. The stack frame of foo will be restored by the transaction. So compared to setjmp/longjmp jumping to a callee is supposed to work reliably even if the stack content of the callee has been clobbered in between.) The additional prologue/epilogue FPR backups for TXs can only be avoided if the transaction is fully contained in the function body (and does not use the FPRs). I call these non-escaping transactions. I've implemented a check which deals with the most common situations using the post-dominance tree. If all the tbegin BBs are post-dominated by a tend BB I redo the df_regs_ever_live computation from scratch after reload removed the clobbers. But this unfortunately doesn't help with TX instructions being used as part of a library like with libitm. So I still see lots of superfluous FPR save/restore operations in transactional code which eat up a lot of the benefits. Any ideas on improving the situation are welcome! Bye, -Andreas-