This is a request for comments as well as a request for help :-) I've been experimenting with making TCGContext per-thread, so that we can run most of tcg_gen_code in parallel. I've made some progress, but haven't yet got it to work.
My guess is that the TCG stack is still global instead of per-vCPU (it's been global since tmp_buf was removed from CPUState, right?), but I'm having trouble following that code so most likely I'm wrong. Any help would be appreciated--please disregard minor nits, I want to see whether I can make this work to then take measurements to decide whether this is worth the trouble. - Patch 1 is a trivial doc fixup, feel free to pick it up - Patches 2-3 remove *tbs[] to use a binary search tree instead. This removes the assumption in tb_find_pc that *tbs[] are ordered by tc_ptr, thereby allowing us to generate code regardless of its location on the host (as we do after patch 6). - Patch 4 addresses a reporting issue: ever since we embedded the struct TB's in code_gen_buffer (6e3b2bfd6), we have been misreporting the size of the generated code. Not a huge deal, but I noticed while I was working on this. - Patches 5-7 make TCGContext per-thread in softmmu. I have put there some XXX's to note that I'm aware of those issues, so don't worry too much about those--except of course if you have any input on what the cause of the race(s) might be. Thanks, Emilio