On 05/10/2011 01:54 PM, Blue Swirl wrote: > TCG the generator backend > -AREG0 is used for qemu_ld/st ops for TLB access. It should be > possible for the translators to pass instead a pointer to either > CPUState or directly to the TLB.
I believe that AREG0 should continue to be present in the generated code. There are simply too many references to it throughout the translated code for allocating this dynamically to be a win. What should change, however, is the removal of AREG0 outside the generated code. The cpu-state pointer should be passed as a regular parameter wherever it is required. This includes tcg_qemu_tb_exec, which means that the generated prologue would change, setting up AREG0 in the process. > New qemu_ld/st ops are needed for all TCG targets. Yes, qemu_ld/st would have to change to accommodate the new parameter being passed. While we're at it, let us change things a bit further to allow guest byte-swap load/store insns to be implemented more efficiently. For instance, currently a sparc load_asr (little-endian), as emulated on an x86 host, does the byte swap twice. There is, currently, a const int parameter to qemu_ld/st that encodes the size of the load. Almost all TCG backends behind the scenes extend this parameter with a bit to indicate byte swap needed. Let us formalize this, and allow this to be set in the original TCG op, with appropriate new inlines in tcg-op.h to access it from the translators. We can also make things easier for the backends by allowing them to declare that they do or do not have byte swap load/store insns. If the such are not available, a separate bswap opcode is emitted right from tcg_gen_qemu_st32 et al. This would allow a nice cleanup for i386, which currently has a small register allocation problem in the store path, what with needing to not clobber the input register while byte swapping. (This problem is solved by restricting the set of input registers for qemu_ld/st.) All this does require the slow path to be changed to accommodate this. In particular, if byte-swap memory ops are available, we need slow path functions that also byte swap. Indeed, I'd expect them to use the byte-swap memory ops themselves. Further, if byte-swap memory ops are not available, the slow path should always return memory in the host byte order, because a separate bswap operation will be done on behalf of the fast path. > -TCG temps are stored in CPUState field temp_buf[], accessed via > AREG0. Maybe a regular stack frame should be allocated instead? Probably. Most of the backends manage a stack frame anyway, to handle registers saved in the prologue. All that would be needed is a define from TCG to tell the backends how much memory is required, and some value passed from the backends to tell TCG what the offset of that area is from the stack pointer. r~