v2: https://lists.gnu.org/archive/html/qemu-devel/2017-07/msg04749.html
v3 applies on top of the current master (d4e59218a). To ease review/testing, you can pull this series from: https://github.com/cota/qemu/tree/multi-tcg-v3 Note: I cannot even compile-test _WIN32 bits, help appreciated! See patches 39/40. Changes from v2: - Rebase on top of current master (therefore dropping the first 2 patches, which are already on master) - Add sh4 bits, touching: - Removal of argument to tb_lookup_ptr (merged into otherwise same v2 patch) - tb_cflags() inline (new patch in v3 for sh4 and all other arches) - CF_PARALLEL instead of parallel_cpus (sh4-only patch in v3) - Add R-b tags - Drop the patch removing the tb->invalid check. - Introduce the patch implementing tb_lookup__cpu_state before the patches that fiddle with tb->cflags, so that we have a single place where to do that fiddling - Update commit log of the tb_lookup__cpu_state patch explaining why tb->invalid must be checked when obtaining the *tb from tb_jmp_cache - Improve comment next to CF_INVALID - CF_PARALLEL: - Introduce tb_cflags inline to hide the atomic_read - Add an extra patch to convert tb->cflags readers to tb_cflags - Rename curr_cf_mask() to curr_cflags() - Remove many superfluous if (parallel_cpus) checks; just call curr_cflags() - Drop tb_cf_mask(); use CF_HASH_MASK instead - m68k: use gen_helper_exit_atomic instead of implementing cas2w_parallel - s390x: Richard: I dropped your R-b tag because v3 also includes csst. - sh4: add sh4 patch, as mentioned above - tcg_ctx.cf_parallel: use a bool instead of a u8 - Do if (foo && (tb_cflags(tb) & BAR)) instead of (foo && tb_cflags() & BAR) - Use a size_t for struct tb_tc.size, plugging the 4-byte hole - Dynamically allocate TCG optimizer globals - Use directly a bitmap, instead of TCGTempSet for temps_used, which saves some space - Add perf numbers for the change: ~2% slowdown - **tcg_ctxs: get rid of tcg_ctxs_init - TCGProfile: s/PROF_ADD_MAX/PROF_MAX/ - real_host_page_size: move to its own file with an init constructor, as suggested by Richard (Richard: I kept your R-b tag). - qemu_mprotect helpers: g_assert on page-aligned address and size - Adapt callers in translate-all.c to pass page-aligned address and size - TCG regions: - Hide the computation of n_regions from tcg_region_init's callers. The function now takes no arguments. Add a comment about qemu_tcg_mttcg_enabled(). - if (!inited) { inited = true; do_init(); } in cpus.c - Use assert instead of if (err) tcg_abort(); - Use QEMU_ALIGN_DOWN instead of &= mask - Inline set_guard_pages() into tcg_region_init - Merge patch that removes code_gen_buffer's guard page into the TCG regions' patch - TCG __thread: - Inline tcg_ctxs_init into tcg_context_init - Move the code that determines the number of regions from the previous patch to this patch. To be done after this series: - Get rid of tb_lock, or at least push it down so that we take advantage of multiple TCG contexts in MTTCG. (I'm doing this in my testing, but doing it well will require another patch series.) Improvements that were suggested during this series' development: - Order tb->[*] comparisons by likelihood of mismatch. - Get rid of parallel_cpus from from cpu_exec_step_atomic -- I'm not sure whether just removing it is safe, since we call curr_cflags from several places. - Perhaps parse -accel=tcg command-line arguments before TCG is initialized, so that those arguments can be used during TCG initialization. Thanks, Emilio