On Tue, Nov 24, 2015 at 12:45:48PM +1100, Benjamin Herrenschmidt wrote: > On Fri, 2015-11-20 at 19:21 +1100, David Gibson wrote: > > On Wed, Nov 11, 2015 at 11:27:39AM +1100, Benjamin Herrenschmidt > > wrote: > > > No devices yet, not even an interrupt controller, just to get > > > started. > > > > > > Signed-off-by: Benjamin Herrenschmidt <b...@kernel.crashing.org> > > > --- > > > default-configs/ppc64-softmmu.mak | 1 + > > > hw/ppc/Makefile.objs | 2 + > > > hw/ppc/pnv.c | 600 > > > ++++++++++++++++++++++++++++++++++++++ > > > include/hw/ppc/pnv.h | 36 +++ > > > 4 files changed, 639 insertions(+) > > > create mode 100644 hw/ppc/pnv.c > > > create mode 100644 include/hw/ppc/pnv.h > > > > Many of my comments below may be made irrelevant by later patches in > > the series. > > Heh, well there is where the "meat" of the new platform starts showing > up :-) > > .../... > > > > +#define _FDT(exp) \ > > > + do { \ > > > + int ret = (exp); \ > > > + if (ret < 0) { \ > > > + fprintf(stderr, "qemu: error creating device tree: %s: > > > %s\n", \ > > > + #exp, fdt_strerror(ret)); \ > > > + exit(1); \ > > > + } \ > > > + } while (0) > > > > We should probably make a file where helper routines used by both > > spapr.c and pnv.c can live. > > Probably but I'd see that as a later cleanup rather than doing it > in this series...
Ok. > > .../... > > > > + if (pcc->l1_dcache_size) { > > > + _FDT((fdt_property_cell(fdt, "d-cache-size", > > > pcc->l1_dcache_size))); > > > + } else { > > > + fprintf(stderr, "Warning: Unknown L1 dcache size for cpu\n"); > > > > Hmm (note to self) should probably change a bunch of these both in > > spapr.c and pnv.c from explicit fprintfs() to modern error_report() > > and similar. > > That's a train I completely missed, but yes. > > .../... > > > > + } > > > + _FDT((fdt_property(fdt, "ibm,ppc-interrupt-server#s", > > > + servers_prop, sizeof(servers_prop)))); > > > + _FDT((fdt_property(fdt, "ibm,ppc-interrupt-gserver#s", > > > + gservers_prop, sizeof(gservers_prop)))); > > > + > > > + _FDT((fdt_end_node(fdt))); > > > +} > > > + > > > +static void *powernv_create_fdt(PnvSystem *sys, uint32_t initrd_base, > > > uint32_t initrd_size) > > > +{ > > > > So.. does it make sense for qemu to create a device tree for powernv, > > rather than leaving it to Opal? > > Well, OPAL only creates a device-tree if you are on an FSP machine in > which case it expects a complex data structure (HDAT) coming from the > FSP to use as a source of info. > > On OpenPower machines, which is closer to what we simulate here, we > do get a device-tree as an input in OPAL, it's generated by HostBoot. > > Now, I am not running HostBoot in qemu because most of what it does > is completely irrelevant to an emulated system (training the various > links, initializing the memory buffer chips, etc...). > > However we do need to pass a number of platform information to OPAL > which HB does via the device-tree, such as which cores are enabled, > the memory map configured for PCI, which PHBs are enabled, etc... so > creating a DT in qemu makes sense, it mimmics HB in essence. > > OPAL will enrich that device-tree before starting Linux. Ok. Some comments mentioning that you're simulating the exit state from HostBoot would be good then. > > > + /* > > > + * Add info to guest to indentify which host is it being run on > > > + * and what is the uuid of the guest > > > + */ > > > + if (kvmppc_get_host_model(&buf)) { > > > + _FDT((fdt_property_string(fdt, "host-model", buf))); > > > + g_free(buf); > > > + } > > > + if (kvmppc_get_host_serial(&buf)) { > > > + _FDT((fdt_property_string(fdt, "host-serial", buf))); > > > + g_free(buf); > > > + } > > > > Since you're emulating a "bare metal" machine, surely the host > > properties aren't relevant here. > > They may or may not. But yes, I can take that out. > > > > + buf = g_strdup_printf(UUID_FMT, qemu_uuid[0], qemu_uuid[1], > > > + qemu_uuid[2], qemu_uuid[3], qemu_uuid[4], > > > + qemu_uuid[5], qemu_uuid[6], qemu_uuid[7], > > > + qemu_uuid[8], qemu_uuid[9], qemu_uuid[10], > > > + qemu_uuid[11], qemu_uuid[12], qemu_uuid[13], > > > + qemu_uuid[14], qemu_uuid[15]); > > > + > > > + _FDT((fdt_property_string(fdt, "vm,uuid", buf))); > > > + g_free(buf); > > > + > > > + _FDT((fdt_begin_node(fdt, "chosen"))); > > > + _FDT((fdt_property(fdt, "linux,initrd-start", > > > + &start_prop, sizeof(start_prop)))); > > > + _FDT((fdt_property(fdt, "linux,initrd-end", > > > + &end_prop, sizeof(end_prop)))); > > > + _FDT((fdt_end_node(fdt))); > > > + > > > + _FDT((fdt_property_cell(fdt, "#address-cells", 0x2))); > > > + _FDT((fdt_property_cell(fdt, "#size-cells", 0x2))); > > > + > > > + /* cpus */ > > > + _FDT((fdt_begin_node(fdt, "cpus"))); > > > + _FDT((fdt_property_cell(fdt, "#address-cells", 0x1))); > > > + _FDT((fdt_property_cell(fdt, "#size-cells", 0x0))); > > > + > > > + CPU_FOREACH(cs) { > > > + powernv_create_cpu_node(fdt, cs, smt); > > > + } > > > + > > > + _FDT((fdt_end_node(fdt))); > > > + > > > + /* Memory */ > > > + _FDT((powernv_populate_memory(fdt))); > > > + > > > + /* /hypervisor node */ > > > + if (kvm_enabled()) { > > > + uint8_t hypercall[16]; > > > + > > > + /* indicate KVM hypercall interface */ > > > + _FDT((fdt_begin_node(fdt, "hypervisor"))); > > > + _FDT((fdt_property_string(fdt, "compatible", "linux,kvm"))); > > > + if (kvmppc_has_cap_fixup_hcalls()) { > > > + /* > > > + * Older KVM versions with older guest kernels were broken > > > with the > > > + * magic page, don't allow the guest to map it. > > > + */ > > > + kvmppc_get_hypercall(first_cpu->env_ptr, hypercall, > > > + sizeof(hypercall)); > > > + _FDT((fdt_property(fdt, "hcall-instructions", hypercall, > > > + sizeof(hypercall)))); > > > + } > > > + _FDT((fdt_end_node(fdt))); > > > + } > > > > And a hypercall interface surely doesn't make sense for powernv. > > It's qemu paravirt, it exist on G5 too :-) It's for PR KVM, it allows > to speed up some bits and pieces. But yeah we don't yet really > "support" it at this point. However we might. Ah, yes, I forgot about that. > > > + > > > + _FDT((fdt_end_node(fdt))); /* close root node */ > > > + _FDT((fdt_finish(fdt))); > > > + > > > + return fdt; > > > +} > > > + > > > +static void powernv_cpu_reset(void *opaque) > > > +{ > > > + PowerPCCPU *cpu = opaque; > > > + CPUState *cs = CPU(cpu); > > > + CPUPPCState *env = &cpu->env; > > > + > > > + cpu_reset(cs); > > > + > > > + env->spr[SPR_PIR] = ppc_get_vcpu_dt_id(cpu); > > > + env->spr[SPR_HIOR] = 0; > > > + env->gpr[3] = FDT_ADDR; > > > + env->nip = 0x10; > > > + env->msr |= MSR_HVB; > > > +} > > > > So, I believe the qemu-ishly correct way of doing this is to have the > > cpu initialization in the cpu code, rather than the platform code, as > > much as possible. On PAPR we kind of get away with initialization in > > the platform code on the grounds that it's a paravirt platform, but > > powernv doesn't have that excuse. > > > > But this may well be stuff that changes in later patches, so.. > > Well no, not really. But here too, we mimmic the state as coming out of > HostBoot, which isn't quite the same thing. We need to provide the FDT > entry, etc... > > The "real" reset state of a P8 isn't something we can easily > simulate... > > It runs some microcode from a SEEPROM with a small microcontroller > which initializes a core, which then runs some HB code off it's L3 > cache etc... really not something we want to do in qemu at least > for now. > > So the initial state here is somewhat in between full virt and > paravirt, we simulate a platform that has been partially initialized by > HostBoot, to the state it has when it enters OPAL. Ok, that makes sense, but I think it needs a bit more explanation in the code to that effect. > > > +static const VMStateDescription vmstate_powernv = { > > > + .name = "powernv", > > > + .version_id = 1, > > > + .minimum_version_id = 1, > > > +}; > > > > It might be best to leave out the vmstate entirely until you're ready > > to implement migration, rather than having a partial, probably not > > working migration implementation. > > Ok. > > > > + > > > +static void pnv_create_chip(PnvSystem *sys, unsigned int chip_no) > > > +{ > > > + PnvChip *chip = &sys->chips[chip_no]; > > > + > > > + if (chip_no >= PNV_MAX_CHIPS) { > > > + return; > > > + } > > > + > > > + /* XXX Improve chip numbering to better match HW */ > > > + chip->chip_id = chip_no; > > > > I think modern qemu conventions would suggest creating the chips as > > QOM objects rather than having a fixed array. > > Yeah, more code & much larger memory footprint for the same result :-) > > I can look into it but it's low priority. I still want to rework some > of that chip stuff in future patches anyway. > > > > +} > > > + > > > +static void ppc_powernv_init(MachineState *machine) > > > +{ > > > + ram_addr_t ram_size = machine->ram_size; > > > + const char *cpu_model = machine->cpu_model; > > > + const char *kernel_filename = machine->kernel_filename; > > > + const char *initrd_filename = machine->initrd_filename; > > > + uint32_t initrd_base = 0; > > > + long initrd_size = 0; > > > + PowerPCCPU *cpu; > > > + CPUPPCState *env; > > > + MemoryRegion *sysmem = get_system_memory(); > > > + MemoryRegion *ram = g_new(MemoryRegion, 1); > > > + sPowerNVMachineState *pnv_machine = POWERNV_MACHINE(machine); > > > + PnvSystem *sys = &pnv_machine->sys; > > > + long fw_size; > > > + char *filename; > > > + void *fdt; > > > + int i; > > > + > > > + /* init CPUs */ > > > + if (cpu_model == NULL) { > > > + cpu_model = kvm_enabled() ? "host" : "POWER8"; > > > + } > > > + > > > + for (i = 0; i < smp_cpus; i++) { > > > + cpu = cpu_ppc_init(cpu_model); > > > + if (cpu == NULL) { > > > + fprintf(stderr, "Unable to find PowerPC CPU definition\n"); > > > + exit(1); > > > + } > > > + env = &cpu->env; > > > + > > > + /* Set time-base frequency to 512 MHz */ > > > + cpu_ppc_tb_init(env, TIMEBASE_FREQ); > > > + > > > + /* MSR[IP] doesn't exist nowadays */ > > > + env->msr_mask &= ~(1 << 6); > > > + > > > + qemu_register_reset(powernv_cpu_reset, cpu); > > > + } > > > + > > > + /* allocate RAM */ > > > + memory_region_allocate_system_memory(ram, NULL, "ppc_powernv.ram", > > > ram_size); > > > + memory_region_add_subregion(sysmem, 0, ram); > > > + > > > + /* XXX We should decide how many chips to create based on #cores and > > > + * Venice vs. Murano vs. Naples chip type etc..., for now, just > > > create > > > + * one chip. Also creation of the CPUs should be done per-chip > > > + */ > > > + sys->num_chips = 1; > > > + > > > + /* Create only one PHB for now until I figure out what's wrong > > > + * when I create more (resource assignment failures in Linux) > > > + */ > > > + pnv_create_chip(sys, 0); > > > + > > > + if (bios_name == NULL) { > > > + bios_name = FW_FILE_NAME; > > > + } > > > + filename = qemu_find_file(QEMU_FILE_TYPE_BIOS, bios_name); > > > + fw_size = load_image_targphys(filename, 0, FW_MAX_SIZE); > > > + if (fw_size < 0) { > > > + hw_error("qemu: could not load OPAL '%s'\n", filename); > > > + exit(1); > > > + } > > > + g_free(filename); > > > + > > > + > > > + if (kernel_filename == NULL) { > > > + kernel_filename = KERNEL_FILE_NAME; > > > + } > > > + filename = qemu_find_file(QEMU_FILE_TYPE_BIOS, > > > kernel_filename); > > > > The commit withe Opal image should go in before this, no? > > Now this is a bit of an open discussion at the moment :-) > > The way OPAL is built on OPP machines today is by essentially building > a complete flash image which contains HostBoot, OPAL and the petitboot- > based bootloader which contains a Linux kernel etc... > > We could create a target without HB and with a slimmed down Linux but > it would still probably be about 12MB I reckon, if not more. It feels a > bit "big" to ship as a binary as part of qemu... > > We would also have to add code to qemu to "find" OPAL in that image, > and then add a model for the flash controller. > > The other option is to bundle just OPAL itself. However that means > you can't go anywhere without a -kernel argument, which would then > be either a petitboot-based bootloader or your actual target kernel. Hm, ok. But in order for this to be usable, we need some way to get a suitable image. So medium term, I think it makes sense to include both opal and PetitBoot, so you can boot the guest like a real machine. However, including only Opal and requiring -kernel would be a reasonable interim step. -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson
signature.asc
Description: PGP signature