On Thu, May 07, 2015 at 03:05:45PM +0530, Bharata B Rao wrote: > On Wed, Mar 18, 2015 at 11:50:04AM +0530, Bharata B Rao wrote: > > On Tue, Mar 17, 2015 at 11:56:04AM +0100, Andreas Färber wrote: > > > Am 17.03.2015 um 09:39 schrieb Bharata B Rao: > > > > On Tue, Mar 17, 2015 at 07:56:41AM +0100, Alexander Graf wrote: > > > >> > > > >> > > > >> On 13.03.15 12:56, Bharata B Rao wrote: > > > >>> From: Bharata B Rao <bharata....@gmail.com> > > > >>> > > > >>> Currently CPUState.cpu_index is monotonically increasing and a newly > > > >>> created CPU always gets the next higher index. The next available > > > >>> index is calculated by counting the existing number of CPUs. This is > > > >>> fine as long as we only add CPUs, but there are architectures which > > > >>> are starting to support CPU removal too. For an architecture like > > > >>> PowerPC > > > >>> which derives its CPU identifier (device tree ID) from cpu_index, the > > > >>> existing logic of generating cpu_index values causes problems. > > > >>> > > > >>> With the currently proposed method of handling vCPU removal by parking > > > >>> the vCPU fd in QEMU > > > >>> (Ref: > > > >>> http://lists.gnu.org/archive/html/qemu-devel/2015-02/msg02604.html), > > > >>> generating cpu_index this way will not work for PowerPC. > > > >>> > > > >>> This patch changes the way cpu_index is handed out by maintaining > > > >>> a bit map of the CPUs that tracks both addition and removal of CPUs. > > > >>> > > > >>> I am not sure if this is the right and an acceptable approach. The > > > >>> alternative is to do something similar for PowerPC alone and not > > > >>> depend on cpu_index. > > > >>> > > > >>> I have tested this with out-of-the-tree patches for CPU hot plug and > > > >>> removal on x86 and sPAPR PowerPC. > > > >>> > > > >>> Signed-off-by: Bharata B Rao <bhar...@linux.vnet.ibm.com> > > > >>> --- > > > >>> exec.c | 39 > > > >>> +++++++++++++++++++++++++++++---------- > > > >>> include/exec/exec-all.h | 1 + > > > >>> target-alpha/cpu.c | 6 ++++++ > > > >>> target-arm/cpu.c | 1 + > > > >>> target-cris/cpu.c | 6 ++++++ > > > >>> target-i386/cpu.c | 6 ++++++ > > > >>> target-lm32/cpu.c | 6 ++++++ > > > >>> target-m68k/cpu.c | 6 ++++++ > > > >>> target-microblaze/cpu.c | 6 ++++++ > > > >>> target-mips/cpu.c | 6 ++++++ > > > >>> target-moxie/cpu.c | 6 ++++++ > > > >>> target-openrisc/cpu.c | 6 ++++++ > > > >>> target-ppc/translate_init.c | 6 ++++++ > > > >>> target-s390x/cpu.c | 1 + > > > >>> target-sh4/cpu.c | 6 ++++++ > > > >>> target-sparc/cpu.c | 1 + > > > >>> target-tricore/cpu.c | 5 +++++ > > > >>> target-unicore32/cpu.c | 6 ++++++ > > > >>> target-xtensa/cpu.c | 6 ++++++ > > > >>> 19 files changed, 116 insertions(+), 10 deletions(-) > > > >>> > > > >>> diff --git a/exec.c b/exec.c > > > >>> index e97071a..7760f2d 100644 > > > >>> --- a/exec.c > > > >>> +++ b/exec.c > > > >>> @@ -530,21 +530,40 @@ void tcg_cpu_address_space_init(CPUState *cpu, > > > >>> AddressSpace *as) > > > >>> } > > > >>> #endif > > > >>> > > > >>> +static DECLARE_BITMAP(cpu_index_map, MAX_CPUMASK_BITS); > > > >>> + > > > >>> +#ifdef CONFIG_USER_ONLY > > > >>> +int max_cpus = 1; /* TODO: Check if this is correct ? */ > > > >>> +#endif > > > >>> + > > > >>> +static int cpu_get_free_index(void) > > > >>> +{ > > > >>> + int cpu = find_first_zero_bit(cpu_index_map, max_cpus); > > > >>> + > > > >>> + if (cpu == max_cpus) { > > > >>> + fprintf(stderr, "WARNING: qemu: Trying to use more " > > > >>> + "CPUs than allowed max of %d\n", max_cpus); > > > >>> + return max_cpus; > > > >>> + } else { > > > >>> + bitmap_set(cpu_index_map, cpu, 1); > > > >>> + return cpu; > > > >>> + } > > > >>> +} > > > >>> + > > > >>> +void cpu_exec_exit(CPUState *cpu) > > > >>> +{ > > > >>> + bitmap_clear(cpu_index_map, cpu->cpu_index, 1); > > > >>> +} > > > >>> + > > > >>> void cpu_exec_init(CPUArchState *env) > > > >>> { > > > >>> CPUState *cpu = ENV_GET_CPU(env); > > > >>> CPUClass *cc = CPU_GET_CLASS(cpu); > > > >>> - CPUState *some_cpu; > > > >>> - int cpu_index; > > > >>> > > > >>> #if defined(CONFIG_USER_ONLY) > > > >>> cpu_list_lock(); > > > >>> #endif > > > >>> - cpu_index = 0; > > > >>> - CPU_FOREACH(some_cpu) { > > > >>> - cpu_index++; > > > >>> - } > > > >>> - cpu->cpu_index = cpu_index; > > > >>> + cpu->cpu_index = cpu_get_free_index(); > > > >>> cpu->numa_node = 0; > > > >>> QTAILQ_INIT(&cpu->breakpoints); > > > >>> QTAILQ_INIT(&cpu->watchpoints); > > > >>> @@ -558,16 +577,16 @@ void cpu_exec_init(CPUArchState *env) > > > >>> cpu_list_unlock(); > > > >>> #endif > > > >>> if (qdev_get_vmsd(DEVICE(cpu)) == NULL) { > > > >>> - vmstate_register(NULL, cpu_index, &vmstate_cpu_common, cpu); > > > >>> + vmstate_register(NULL, cpu->cpu_index, &vmstate_cpu_common, > > > >>> cpu); > > > >>> } > > > >>> #if defined(CPU_SAVE_VERSION) && !defined(CONFIG_USER_ONLY) > > > >>> - register_savevm(NULL, "cpu", cpu_index, CPU_SAVE_VERSION, > > > >>> + register_savevm(NULL, "cpu", cpu->cpu_index, CPU_SAVE_VERSION, > > > >>> cpu_save, cpu_load, env); > > > >>> assert(cc->vmsd == NULL); > > > >>> assert(qdev_get_vmsd(DEVICE(cpu)) == NULL); > > > >>> #endif > > > >>> if (cc->vmsd != NULL) { > > > >>> - vmstate_register(NULL, cpu_index, cc->vmsd, cpu); > > > >>> + vmstate_register(NULL, cpu->cpu_index, cc->vmsd, cpu); > > > >>> } > > > >>> } > > > >>> > > > >>> diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h > > > >>> index 8eb0db3..95fbba0 100644 > > > >>> --- a/include/exec/exec-all.h > > > >>> +++ b/include/exec/exec-all.h > > > >>> @@ -89,6 +89,7 @@ TranslationBlock *tb_gen_code(CPUState *cpu, > > > >>> target_ulong pc, target_ulong cs_base, > > > >>> int flags, > > > >>> int cflags); > > > >>> void cpu_exec_init(CPUArchState *env); > > > >>> +void cpu_exec_exit(CPUState *cpu); > > > >>> void QEMU_NORETURN cpu_loop_exit(CPUState *cpu); > > > >>> int page_unprotect(target_ulong address, uintptr_t pc, void *puc); > > > >>> void tb_invalidate_phys_page_range(tb_page_addr_t start, > > > >>> tb_page_addr_t end, > > > >>> diff --git a/target-alpha/cpu.c b/target-alpha/cpu.c > > > >>> index a98b7d8..7c57165 100644 > > > >>> --- a/target-alpha/cpu.c > > > >>> +++ b/target-alpha/cpu.c > > > >>> @@ -250,6 +250,11 @@ static const TypeInfo ev68_cpu_type_info = { > > > >>> .parent = TYPE("ev67"), > > > >>> }; > > > >>> > > > >>> +static void alpha_cpu_finalize(Object *obj) > > > >>> +{ > > > >>> + cpu_exec_exit(CPU(obj)); > > > >>> +} > > > >>> + > > > >>> static void alpha_cpu_initfn(Object *obj) > > > >>> { > > > >>> CPUState *cs = CPU(obj); > > > >>> @@ -305,6 +310,7 @@ static const TypeInfo alpha_cpu_type_info = { > > > >>> .parent = TYPE_CPU, > > > >>> .instance_size = sizeof(AlphaCPU), > > > >>> .instance_init = alpha_cpu_initfn, > > > >>> + .instance_finalize = alpha_cpu_finalize, > > > >> > > > >> Would it be possible to put this into TYPE_CPU->instance_finalize > > > >> instead? > > > > > > > > Yes possible and that would be much cleaner since I wouldn't have to > > > > touch > > > > all archs. But it will be asymmetric in some sense as cpu_exec_init() is > > > > called from all individual cpus' instance_init but cpu_exec_exit() will > > > > be > > > > called from parent's (TYPE_CPU) instance_finalize. If that is fine, I > > > > shall > > > > post v2 with this change. > > > > > > Could you check: Wasn't there a patch from Fujitsu to move > > > cpu_exec_init() to generic code? If both were generic, that would be > > > fine. If that is problematic, I would accept the mismatch as long as it > > > is "safe". That is, instance_finalize needs to handle any state of the > > > object, > > > > There is a patch from Zhu to move only vmstate_register related bits > > from cpu_exec_init to cpu_common_realizefn. > > > > http://lists.gnu.org/archive/html/qemu-devel/2015-01/msg01550.html > > > > > and I think these two are better suited for realize/unrealize > > > than instance_init/instance_finalize. > > > > And Eduardo has a patch to move cpu_exec_init call from instance_init > > to realize for target-i386. > > > > https://lists.gnu.org/archive/html/qemu-devel/2015-03/msg01056.html > > > > So if the preferred way is to call cpu_exec_init from realize, then > > Eduardo - Can you do this for all archs ? My limited testing shows > > that it (moving cpu_exec_init from instance_init to realize) works for > > sPAPR PowerPC too, but not sure about other targets. > > > > After there is consensus on the above two patches, I can do the > > cpu_index to bitmap changes. > > I was planning to re-post this patch now, but wanted to know the plan > regarding the related two patchsets that I have mentioned above. > > 1. Patch by Eduardo to move cpu_exec_init from instance_init to realizefn > 2. Patch by Zhu to move vmstate_register related bits from cpu_exec_init > to cpu_common_realizefn. > > I don't see any more updates on this, are these obsolete or still under > consideration ?
#1 is still planned for x86. Other architectures probably need it, but I think it should be done by somebody familiar with each architecture. There may be subtle initialization ordering requirements in arch-specific CPU init code (that happened on x86, when we had cpu_index-based apic_id initialization on initfn). I don't know if this was suggested before: if you make cpu_exec_exit() idempotent (and safe to call even if cpu_exec_init() wasn't called), we can simply call it unconditionally in CPUClass::instance_finalize, because if the CPU object is already being destroyed, we know it must be removed from the global CPU list. Later, arch-specific code may decide to call cpu_exec_exit() earlier, if appropriate (on unrealizefn, for example). But at least we don't need to force all architectures to provide a instance_finalize function just because of cpu_exec_exit(). -- Eduardo