On Tue, Jan 21, 2020 at 06:25:36PM +1100, Alexey Kardashevskiy wrote: > > > On 21/01/2020 16:11, David Gibson wrote: > > On Fri, Jan 10, 2020 at 01:09:25PM +1100, Alexey Kardashevskiy wrote: > >> The Petitboot bootloader is way more advanced than SLOF is ever going to > >> be as Petitboot comes with the full-featured Linux kernel with all > >> the drivers, and initramdisk with quite user friendly interface. > >> The problem with ditching SLOF is that an unmodified pseries kernel can > >> either start via: > >> 1. kexec, this requires presence of RTAS and skips > >> ibm,client-architecture-support entirely; > >> 2. normal boot, this heavily relies on the OF1275 client interface to > >> fetch the device tree and do early setup (claim memory). > >> > >> This adds a new bios-less mode to the pseries machine: "bios=on|off". > >> When enabled, QEMU does not load SLOF and jumps to the kernel from > >> "-kernel". > > > > I don't love the name "bios" for this flag, since BIOS tends to refer > > to old-school x86 firmware. Given the various plans we're considering > > the future, I'd suggest "firmware=slof" for the current in-guest SLOF > > mode, and say "firmware=vof" (Virtual Open Firmware) for the new > > model. We can consider firmware=petitboot or firmware=none (for > > direct kexec-style boot into -kernel) or whatever in the future > > Ok. We could also enforce default loading addresses for SLOF/kernel/grub > and drop "kernel-addr", although it is going to be confusing if it > changes in not so obvious way...
Yes, I think that would be confusing, so I think adding the kernel-addr override is a good idea, I'd just like it split out for clarity. > In fact, I will ideally need 3 flags: > -bios: on|off to stop loading SLOF; > -kernel-addr: 0x0 for slof/kernel; 0x20000 for grub; I'm happy for that one to be separate from the "firmware style" option. > -kernel-translate-hack: on|off - as grub is linked to run from 0x20000 > and it only works when placed there, the hack breaks it. Hrm. I don't really understand what this one is about. That doesn't really seem like something the user would ever want to fiddle with directly. > Or we can pass grub via -bios and not via -kernel but strictly speaking > there is still a firmware - that new 20 bytes blob so it would not be > accurate. > > We can put this all into a single > -firmware slof|vof|grub|linux. Not sure. I'm not thinking of "grub" as a separate option - that would be the same as "vof". Using vof + no -kernel we'd need to scan the disks in the same way SLOF does, and look for a boot partition, which will probably contain a GRUB image. Then we'd need enough faked OF client calls to let GRUB operate. > >> The client interface is implemented exactly as RTAS - a 20 bytes blob, > >> right next after the RTAS blob. The entry point is passed to the kernel > >> via GPR5. > >> > >> This implements a handful of client interface methods just to get going. > >> In particular, this implements the device tree fetching, > >> ibm,client-architecture-support and instantiate-rtas. > >> > >> This implements changing FDT properties for RTAS (for vmlinux and zImage) > >> and initramdisk location (for zImage). To make this work, this skips > >> fdt_pack() when bios=off as not packing the blob leaves some room for > >> appending. > >> > >> This assigns "phandles" to device tree nodes as there is no more SLOF > >> and OF nodes addresses of which served as phandle values. > >> This keeps predefined nodes (such as XICS/NVLINK/...) unchanged. > >> phandles are regenerated at every FDT rebuild. > >> > >> When bios=off, this adds "/chosen" every time QEMU builds a tree. > >> > >> This implements "claim" which the client (Linux) uses for memory > >> allocation; this is also used by QEMU for claiming kernel/initrd images, > >> client interface entry point, RTAS and the initial stack. > >> > >> While at this, add a "kernel-addr" machine parameter to allow moving > >> the kernel in memory. This is useful for debugging if the kernel is > >> loaded at @0, although not necessary. > >> > >> This adds basic instances support which are managed by a hashmap > >> ihandle->[phandle, DeviceState, Chardev]. > >> > >> Note that a 64bit PCI fix is required for Linux: > >> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=df5be5be8735e > >> > >> The test command line: > >> > >> qemu-system-ppc64 \ > >> -nodefaults \ > >> -chardev stdio,id=STDIO0,signal=off,mux=on \ > >> -device spapr-vty,id=svty0,reg=0x71000110,chardev=STDIO0 \ > >> -mon id=MON0,chardev=STDIO0,mode=readline \ > >> -nographic \ > >> -vga none \ > >> -kernel pbuild/kernel-le-guest/arch/powerpc/boot/zImage.pseries \ > >> -machine pseries,bios=off,cap-cfpc=broken,cap-sbbc=broken,cap-ibs=broken \ > >> -m 4G \ > >> -enable-kvm \ > >> -initrd pb/rootfs.cpio.xz \ > >> -device nec-usb-xhci,id=nec-usb-xhci0 \ > >> -netdev tap,id=TAP0,helper=/home/aik/qemu-bridge-helper --br=br0 \ > >> -device virtio-net-pci,id=vnet0,netdev=TAP0 img/f30le.qcow2 \ > >> -snapshot \ > >> -smp 8,threads=8 \ > >> -trace events=qemu_trace_events \ > >> -d guest_errors \ > >> -chardev socket,id=SOCKET0,server,nowait,path=qemu.mon.ssh54088 \ > >> -mon chardev=SOCKET0,mode=control > >> > >> Signed-off-by: Alexey Kardashevskiy <a...@ozlabs.ru> > > > > It'd be nice to split this patch up a bit, though I'll admit it's not > > very obvious where to do so. > > > v6 is a patchset. > > >> --- > >> Changes: > >> v5: > >> * made instances keep device and chardev pointers > >> * removed VIO dependencies > >> * print error if RTAS memory is not claimed as it should have been > >> * pack FDT as "quiesce" > >> > >> v4: > >> * fixed open > >> * validate ihandles in "call-method" > >> > >> v3: > >> * fixed phandles allocation > >> * s/__be32/uint32_t/ as we do not normally have __be32 type in qemu > >> * fixed size of /chosen/stdout > >> * bunch of renames > >> * do not create rtas properties at all, let the client deal with it; > >> instead setprop allows changing these in the FDT > >> * no more packing FDT when bios=off - nobody needs it and getprop does not > >> work otherwise > >> * allow updating initramdisk device tree properties (for zImage) > >> * added instances > >> * fixed stdout on OF's "write" > >> * removed special handling for stdout in OF client, spapr-vty handles it > >> instead > >> > >> v2: > >> * fixed claim() > >> * added "setprop" > >> * cleaner client interface and RTAS blobs management > >> * boots to petitboot and further to the target system > >> * more trace points > >> --- > >> hw/ppc/Makefile.objs | 1 + > >> include/hw/ppc/spapr.h | 28 +- > >> hw/ppc/spapr.c | 266 ++++++++++++++-- > >> hw/ppc/spapr_hcall.c | 74 +++-- > >> hw/ppc/spapr_of_client.c | 633 +++++++++++++++++++++++++++++++++++++++ > >> hw/ppc/trace-events | 12 + > >> 6 files changed, 959 insertions(+), 55 deletions(-) > >> create mode 100644 hw/ppc/spapr_of_client.c > >> > >> diff --git a/hw/ppc/Makefile.objs b/hw/ppc/Makefile.objs > >> index 101e9fc59185..20efc0aa6f9b 100644 > >> --- a/hw/ppc/Makefile.objs > >> +++ b/hw/ppc/Makefile.objs > >> @@ -6,6 +6,7 @@ obj-$(CONFIG_PSERIES) += spapr_hcall.o spapr_iommu.o > >> spapr_rtas.o > >> obj-$(CONFIG_PSERIES) += spapr_pci.o spapr_rtc.o spapr_drc.o > >> obj-$(CONFIG_PSERIES) += spapr_cpu_core.o spapr_ovec.o spapr_irq.o > >> obj-$(CONFIG_PSERIES) += spapr_tpm_proxy.o > >> +obj-$(CONFIG_PSERIES) += spapr_of_client.o > >> obj-$(CONFIG_SPAPR_RNG) += spapr_rng.o > >> # IBM PowerNV > >> obj-$(CONFIG_POWERNV) += pnv.o pnv_xscom.o pnv_core.o pnv_lpc.o pnv_psi.o > >> pnv_occ.o pnv_bmc.o > >> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h > >> index 61f005c6f686..efc2c70abf99 100644 > >> --- a/include/hw/ppc/spapr.h > >> +++ b/include/hw/ppc/spapr.h > >> @@ -105,6 +105,11 @@ struct SpaprCapabilities { > >> uint8_t caps[SPAPR_CAP_NUM]; > >> }; > >> > >> +typedef struct { > >> + uint64_t start; > >> + uint64_t size; > >> +} SpaprOfClaimed; > >> + > > > > Can we split more of the fake-OF code into a new file? > > > Done in v6, I quite reworked it, this is why I told you to ping me > before you review this one :) Oops, I forgot. Sorry. > >> /** > >> * SpaprMachineClass: > >> */ > >> @@ -160,6 +165,13 @@ struct SpaprMachineState { > >> void *fdt_blob; > >> long kernel_size; > >> bool kernel_le; > >> + uint64_t kernel_addr; > >> + bool bios_enabled; > >> + uint32_t rtas_base; > >> + GArray *claimed; /* array of SpaprOfClaimed */ > >> + uint64_t claimed_base; > >> + GHashTable *of_instances; /* ihandle -> SpaprOfInstance */ > >> + uint32_t of_instance_last; > >> uint32_t initrd_base; > >> long initrd_size; > >> uint64_t rtc_offset; /* Now used only during incoming migration */ > >> @@ -510,7 +522,8 @@ struct SpaprMachineState { > >> /* Client Architecture support */ > >> #define KVMPPC_H_CAS (KVMPPC_HCALL_BASE + 0x2) > >> #define KVMPPC_H_UPDATE_DT (KVMPPC_HCALL_BASE + 0x3) > >> -#define KVMPPC_HCALL_MAX KVMPPC_H_UPDATE_DT > >> +#define KVMPPC_H_CLIENT (KVMPPC_HCALL_BASE + 0x5) > >> +#define KVMPPC_HCALL_MAX KVMPPC_H_CLIENT > >> > >> /* > >> * The hcall range 0xEF00 to 0xEF80 is reserved for use in facilitating > >> @@ -538,6 +551,11 @@ void spapr_register_hypercall(target_ulong opcode, > >> spapr_hcall_fn fn); > >> target_ulong spapr_hypercall(PowerPCCPU *cpu, target_ulong opcode, > >> target_ulong *args); > >> > >> +target_ulong do_client_architecture_support(PowerPCCPU *cpu, > >> + SpaprMachineState *spapr, > >> + target_ulong addr, > >> + target_ulong fdt_bufsize); > >> + > >> /* Virtual Processor Area structure constants */ > >> #define VPA_MIN_SIZE 640 > >> #define VPA_SIZE_OFFSET 0x4 > >> @@ -769,6 +787,11 @@ struct SpaprEventLogEntry { > >> void *spapr_build_fdt(SpaprMachineState *spapr, bool reset, size_t space); > >> void spapr_events_init(SpaprMachineState *sm); > >> void spapr_dt_events(SpaprMachineState *sm, void *fdt); > >> +uint64_t spapr_do_of_client_claim(SpaprMachineState *spapr, uint64_t virt, > >> + uint64_t size, uint64_t align); > >> + > >> +uint32_t spapr_of_client_open(SpaprMachineState *spapr, const char *path); > >> +int spapr_h_client(SpaprMachineState *spapr, target_ulong client_args); > >> void close_htab_fd(SpaprMachineState *spapr); > >> void spapr_setup_hpt_and_vrma(SpaprMachineState *spapr); > >> void spapr_free_hpt(SpaprMachineState *spapr); > >> @@ -891,4 +914,7 @@ void spapr_check_pagesize(SpaprMachineState *spapr, > >> hwaddr pagesize, > >> #define SPAPR_OV5_XIVE_BOTH 0x80 /* Only to advertise on the platform > >> */ > >> > >> void spapr_set_all_lpcrs(target_ulong value, target_ulong mask); > >> + > >> +void spapr_instantiate_rtas(SpaprMachineState *spapr, uint32_t base); > >> + > >> #endif /* HW_SPAPR_H */ > >> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c > >> index e62c89b3dd40..76ce8b973082 100644 > >> --- a/hw/ppc/spapr.c > >> +++ b/hw/ppc/spapr.c > >> @@ -896,6 +896,55 @@ out: > >> return ret; > >> } > >> > >> +/* > >> + * Below is a compiled version of RTAS blob and OF client interface entry > >> point. > >> + * > >> + * gcc -nostdlib -mbig -o spapr-rtas.img spapr-rtas.S > >> + * objcopy -O binary -j .text spapr-rtas.img spapr-rtas.bin > >> + * > >> + * .globl _start > >> + * _start: > >> + * mr 4,3 > >> + * lis 3,KVMPPC_H_RTAS@h > >> + * ori 3,3,KVMPPC_H_RTAS@l > >> + * sc 1 > >> + * blr > >> + * mr 4,3 > >> + * lis 3,KVMPPC_H_CLIENT@h > >> + * ori 3,3,KVMPPC_H_CLIENT@l > >> + * sc 1 > >> + * blr > >> + */ > >> +static struct { > > > > Should be able to add a 'const' here. > > > >> + uint8_t rtas[20], client[20]; > >> +} QEMU_PACKED rtas_client_blob = { > >> + .rtas = { > >> + 0x7c, 0x64, 0x1b, 0x78, > >> + 0x3c, 0x60, 0x00, 0x00, > >> + 0x60, 0x63, 0xf0, 0x00, > >> + 0x44, 0x00, 0x00, 0x22, > >> + 0x4e, 0x80, 0x00, 0x20 > >> + }, > >> + .client = { > >> + 0x7c, 0x64, 0x1b, 0x78, > >> + 0x3c, 0x60, 0x00, 0x00, > >> + 0x60, 0x63, 0xf0, 0x05, > >> + 0x44, 0x00, 0x00, 0x22, > >> + 0x4e, 0x80, 0x00, 0x20 > >> + } > >> +}; > > > > I'd split this into two variables - there's not really any connection > > between the two, AFAICT. > > > > Note that I'm getting closer to merging the fwnmi stuff at which point > > you'll need to pad the RTAS blob with a bunch of extra space for > > taking the fwnmi dumps. > > > >> + > >> +void spapr_instantiate_rtas(SpaprMachineState *spapr, uint32_t base) > >> +{ > >> + if (spapr_do_of_client_claim(spapr, base, > >> sizeof(rtas_client_blob.rtas), > >> + 0) != -1) { > > > > Wait.. == -1 is the success case? That's a very surprising interface. > > > This is a sort of an assert. spapr_do_of_client_claim() returns an > address and the client is expected to claim the memory which it wants > RTAS to be copied to, this makes sure it either happened or we claimed > it here. Ah! Ok, I understand. > >> + error_report("The OF client did not claim RTAS memory at 0x%x", > >> base); > > > > Error message is hard to follow. Maybe "Could not claim memory > > for RTAS" Which makes my suggestion here a bad one too. > > > >> + } > >> + spapr->rtas_base = base; > >> + cpu_physical_memory_write(base, rtas_client_blob.rtas, > >> + sizeof(rtas_client_blob.rtas)); > >> +} > >> + > >> static void spapr_dt_rtas(SpaprMachineState *spapr, void *fdt) > >> { > >> MachineState *ms = MACHINE(spapr); > >> @@ -980,6 +1029,11 @@ static void spapr_dt_rtas(SpaprMachineState *spapr, > >> void *fdt) > >> _FDT(fdt_setprop(fdt, rtas, "ibm,lrdr-capacity", > >> lrdr_capacity, sizeof(lrdr_capacity))); > >> > >> + if (!spapr->bios_enabled) { > >> + _FDT(fdt_setprop_cell(fdt, rtas, "rtas-size", > >> + sizeof(rtas_client_blob.rtas))); > >> + } > >> + > >> spapr_dt_rtas_tokens(fdt, rtas); > >> } > >> > >> @@ -1057,7 +1111,7 @@ static void spapr_dt_chosen(SpaprMachineState > >> *spapr, void *fdt) > >> } > >> > >> if (spapr->kernel_size) { > >> - uint64_t kprop[2] = { cpu_to_be64(KERNEL_LOAD_ADDR), > >> + uint64_t kprop[2] = { cpu_to_be64(spapr->kernel_addr), > > > > Hrm, I really think I would like to see the change to adjustable > > kernel_addr split out - it puts a bunch of noise into the main kill > > slof patch. > > Sure, I'll do that if we decide to proceed with this. > > > > > >> cpu_to_be64(spapr->kernel_size) }; > >> > >> _FDT(fdt_setprop(fdt, chosen, "qemu,boot-kernel", > >> @@ -1245,7 +1299,8 @@ void *spapr_build_fdt(SpaprMachineState *spapr, bool > >> reset, size_t space) > >> /* Build memory reserve map */ > >> if (reset) { > >> if (spapr->kernel_size) { > >> - _FDT((fdt_add_mem_rsv(fdt, KERNEL_LOAD_ADDR, > >> spapr->kernel_size))); > >> + _FDT((fdt_add_mem_rsv(fdt, spapr->kernel_addr, > >> + spapr->kernel_size))); > >> } > >> if (spapr->initrd_size) { > >> _FDT((fdt_add_mem_rsv(fdt, spapr->initrd_base, > >> @@ -1268,12 +1323,56 @@ void *spapr_build_fdt(SpaprMachineState *spapr, > >> bool reset, size_t space) > >> } > >> } > >> > >> + if (!spapr->bios_enabled) { > >> + uint32_t phandle; > >> + int i, offset, proplen = 0; > >> + const void *prop; > >> + bool found = false; > >> + GArray *phandles = g_array_new(false, false, sizeof(uint32_t)); > >> + > >> + /* Find all predefined phandles */ > >> + for (offset = fdt_next_node(fdt, -1, NULL); > >> + offset >= 0; > >> + offset = fdt_next_node(fdt, offset, NULL)) { > >> + prop = fdt_getprop_namelen(fdt, offset, "phandle", 7, > >> &proplen); > > > > You can just use fdt_getprop() rather than the namelen variant (that's > > only really useful when you don't have a \0-terminated string with the > > name). > > Ok, will fix. There are just too many similar functions in libfdt.h and > fdt_getprop() could be inlined, probably. It won't be inlined, but I think it will be tail-call optimized so it might as well be. That is, I think the .o will look something like: fdt_getprop: jiggle some registers bl strlen jiggle some regs fdt_getprop_namelen: ... blr > >> + if (prop && proplen == sizeof(uint32_t)) { > >> + phandle = fdt32_ld(prop); > >> + g_array_append_val(phandles, phandle); > >> + } > >> + } > >> + > >> + /* Assign phandles skipping the predefined ones */ > >> + for (offset = fdt_next_node(fdt, -1, NULL), phandle = 1; > >> + offset >= 0; > >> + offset = fdt_next_node(fdt, offset, NULL), ++phandle) { > >> + prop = fdt_getprop_namelen(fdt, offset, "phandle", 7, > >> &proplen); > >> + if (prop) { > >> + continue; > >> + } > >> + /* Check if the current phandle is not allocated already */ > >> + for ( ; ; ++phandle) { > >> + for (i = 0, found = false; i < phandles->len; ++i) { > >> + if (phandle == g_array_index(phandles, uint32_t, i)) { > >> + found = true; > >> + break; > >> + } > >> + } > >> + if (!found) { > >> + break; > >> + } > >> + } > >> + _FDT(fdt_setprop_cell(fdt, offset, "phandle", phandle)); > >> + } > >> + g_array_unref(phandles); > >> + } > >> + > >> return fdt; > >> } > >> > >> static uint64_t translate_kernel_address(void *opaque, uint64_t addr) > >> { > >> - return (addr & 0x0fffffff) + KERNEL_LOAD_ADDR; > >> + SpaprMachineState *spapr = opaque; > >> + return (addr & 0x0fffffff) + spapr->kernel_addr; > >> } > >> > >> static void emulate_spapr_hypercall(PPCVirtualHypervisor *vhyp, > >> @@ -1660,24 +1759,89 @@ static void spapr_machine_reset(MachineState > >> *machine) > >> */ > >> fdt_addr = MIN(spapr->rma_size, RTAS_MAX_ADDR) - FDT_MAX_SIZE; > >> > >> + /* Set up the entry state */ > >> + if (!spapr->bios_enabled) { > >> + if (spapr->claimed) { > >> + g_array_unref(spapr->claimed); > >> + } > >> + if (spapr->of_instances) { > >> + g_hash_table_unref(spapr->of_instances); > >> + } > >> + > >> + spapr->claimed = g_array_new(false, false, > >> sizeof(SpaprOfClaimed)); > >> + spapr->of_instances = g_hash_table_new(g_direct_hash, > >> g_direct_equal); > >> + > >> + spapr->claimed_base = 0x10000; /* Avoid using the first system > >> page */ > >> + > >> + spapr_cpu_set_entry_state(first_ppc_cpu, spapr->kernel_addr, > >> + spapr->initrd_base); > >> + first_ppc_cpu->env.gpr[4] = spapr->initrd_size; > >> + > >> + if (spapr_do_of_client_claim(spapr, spapr->kernel_addr, > >> + spapr->kernel_size, 0) == -1) { > >> + error_report("Memory for kernel is in use"); > >> + exit(1); > >> + } > >> + if (spapr_do_of_client_claim(spapr, spapr->initrd_base, > >> + spapr->initrd_size, 0) == -1) { > >> + error_report("Memory for initramdisk is in use"); > >> + exit(1); > >> + } > >> + first_ppc_cpu->env.gpr[1] = spapr_do_of_client_claim(spapr, 0, > >> 0x40000, > >> + 0x10000); > >> + if (first_ppc_cpu->env.gpr[1] == -1) { > >> + error_report("Memory allocation for stack failed"); > >> + exit(1); > >> + } > >> + > >> + first_ppc_cpu->env.gpr[5] = > >> + spapr_do_of_client_claim(spapr, 0, > >> sizeof(rtas_client_blob.client), > >> + sizeof(rtas_client_blob.client)); > >> + if (first_ppc_cpu->env.gpr[5] == -1) { > >> + error_report("Memory allocation for OF client failed"); > >> + exit(1); > >> + } > >> + cpu_physical_memory_write(first_ppc_cpu->env.gpr[5], > >> + rtas_client_blob.client, > >> + sizeof(rtas_client_blob.client)); > >> + } else { > >> + spapr_cpu_set_entry_state(first_ppc_cpu, SPAPR_ENTRY_POINT, > >> fdt_addr); > >> + first_ppc_cpu->env.gpr[5] = 0; /* 0 = kexec !0 = prom_init */ > >> + } > >> + > >> fdt = spapr_build_fdt(spapr, true, FDT_MAX_SIZE); > >> > >> - rc = fdt_pack(fdt); > >> - > >> - /* Should only fail if we've built a corrupted tree */ > >> - assert(rc == 0); > >> - > >> - /* Load the fdt */ > >> - qemu_fdt_dumpdtb(fdt, fdt_totalsize(fdt)); > >> - cpu_physical_memory_write(fdt_addr, fdt, fdt_totalsize(fdt)); > >> g_free(spapr->fdt_blob); > >> spapr->fdt_size = fdt_totalsize(fdt); > >> spapr->fdt_initial_size = spapr->fdt_size; > >> spapr->fdt_blob = fdt; > >> > >> - /* Set up the entry state */ > >> - spapr_cpu_set_entry_state(first_ppc_cpu, SPAPR_ENTRY_POINT, fdt_addr); > >> - first_ppc_cpu->env.gpr[5] = 0; > >> + if (spapr->bios_enabled) { > >> + /* Load the fdt */ > >> + rc = fdt_pack(spapr->fdt_blob); > >> + /* Should only fail if we've built a corrupted tree */ > >> + assert(rc == 0); > >> + > >> + spapr->fdt_size = fdt_totalsize(spapr->fdt_blob); > >> + spapr->fdt_initial_size = spapr->fdt_size; > >> + qemu_fdt_dumpdtb(spapr->fdt_blob, spapr->fdt_size); > > > > I think we should still have a dumpdtb call on the !bios path. > > > >> + cpu_physical_memory_write(fdt_addr, spapr->fdt_blob, > >> spapr->fdt_size); > >> + } else { > >> + char *stdout_path = spapr_vio_stdout_path(spapr->vio_bus); > >> + int offset = fdt_path_offset(fdt, "/chosen"); > >> + > >> + /* > >> + * SLOF-less setup requires an open instance of stdout for early > >> + * kernel printk. By now all phandles are settled so we can open > >> + * the default serial console. > >> + * We skip writing FDT as nothing expects it; OF client interface > >> is > >> + * going to be used for reading the device tree. > >> + */ > >> + if (stdout_path) { > >> + _FDT(fdt_setprop_cell(fdt, offset, "stdout", > >> + spapr_of_client_open(spapr, > >> stdout_path))); > >> + } > >> + } > >> > >> spapr->cas_reboot = false; > >> } > >> @@ -2897,12 +3061,12 @@ static void spapr_machine_init(MachineState > >> *machine) > >> uint64_t lowaddr = 0; > >> > >> spapr->kernel_size = load_elf(kernel_filename, NULL, > >> - translate_kernel_address, NULL, > >> + translate_kernel_address, spapr, > >> NULL, &lowaddr, NULL, 1, > >> PPC_ELF_MACHINE, 0, 0); > >> if (spapr->kernel_size == ELF_LOAD_WRONG_ENDIAN) { > >> spapr->kernel_size = load_elf(kernel_filename, NULL, > >> - translate_kernel_address, NULL, > >> NULL, > >> + translate_kernel_address, > >> spapr, NULL, > >> &lowaddr, NULL, 0, > >> PPC_ELF_MACHINE, > >> 0, 0); > >> spapr->kernel_le = spapr->kernel_size > 0; > >> @@ -2918,7 +3082,7 @@ static void spapr_machine_init(MachineState *machine) > >> /* Try to locate the initrd in the gap between the kernel > >> * and the firmware. Add a bit of space just in case > >> */ > >> - spapr->initrd_base = (KERNEL_LOAD_ADDR + spapr->kernel_size > >> + spapr->initrd_base = (spapr->kernel_addr + spapr->kernel_size > >> + 0x1ffff) & ~0xffff; > >> spapr->initrd_size = load_image_targphys(initrd_filename, > >> spapr->initrd_base, > >> @@ -2932,20 +3096,22 @@ static void spapr_machine_init(MachineState > >> *machine) > >> } > >> } > >> > >> - if (bios_name == NULL) { > >> - bios_name = FW_FILE_NAME; > >> + if (spapr->bios_enabled) { > >> + if (bios_name == NULL) { > >> + bios_name = FW_FILE_NAME; > >> + } > >> + filename = qemu_find_file(QEMU_FILE_TYPE_BIOS, bios_name); > >> + if (!filename) { > >> + error_report("Could not find LPAR firmware '%s'", bios_name); > >> + exit(1); > >> + } > >> + fw_size = load_image_targphys(filename, 0, FW_MAX_SIZE); > >> + if (fw_size <= 0) { > >> + error_report("Could not load LPAR firmware '%s'", filename); > >> + exit(1); > >> + } > >> + g_free(filename); > >> } > >> - filename = qemu_find_file(QEMU_FILE_TYPE_BIOS, bios_name); > >> - if (!filename) { > >> - error_report("Could not find LPAR firmware '%s'", bios_name); > >> - exit(1); > >> - } > >> - fw_size = load_image_targphys(filename, 0, FW_MAX_SIZE); > >> - if (fw_size <= 0) { > >> - error_report("Could not load LPAR firmware '%s'", filename); > >> - exit(1); > >> - } > >> - g_free(filename); > >> > >> /* FIXME: Should register things through the MachineState's qdev > >> * interface, this is a legacy from the sPAPREnvironment structure > >> @@ -3162,6 +3328,32 @@ static void spapr_set_vsmt(Object *obj, Visitor *v, > >> const char *name, > >> visit_type_uint32(v, name, (uint32_t *)opaque, errp); > >> } > >> > >> +static void spapr_get_kernel_addr(Object *obj, Visitor *v, const char > >> *name, > >> + void *opaque, Error **errp) > >> +{ > >> + visit_type_uint64(v, name, (uint64_t *)opaque, errp); > >> +} > >> + > >> +static void spapr_set_kernel_addr(Object *obj, Visitor *v, const char > >> *name, > >> + void *opaque, Error **errp) > >> +{ > >> + visit_type_uint64(v, name, (uint64_t *)opaque, errp); > >> +} > >> + > >> +static bool spapr_get_bios_enabled(Object *obj, Error **errp) > >> +{ > >> + SpaprMachineState *spapr = SPAPR_MACHINE(obj); > >> + > >> + return spapr->bios_enabled; > >> +} > >> + > >> +static void spapr_set_bios_enabled(Object *obj, bool value, Error **errp) > >> +{ > >> + SpaprMachineState *spapr = SPAPR_MACHINE(obj); > >> + > >> + spapr->bios_enabled = value; > >> +} > >> + > >> static char *spapr_get_ic_mode(Object *obj, Error **errp) > >> { > >> SpaprMachineState *spapr = SPAPR_MACHINE(obj); > >> @@ -3267,6 +3459,20 @@ static void spapr_instance_init(Object *obj) > >> object_property_add_bool(obj, "vfio-no-msix-emulation", > >> spapr_get_msix_emulation, NULL, NULL); > >> > >> + object_property_add(obj, "kernel-addr", "uint64", > >> spapr_get_kernel_addr, > >> + spapr_set_kernel_addr, NULL, &spapr->kernel_addr, > >> + &error_abort); > >> + object_property_set_description(obj, "kernel-addr", > >> + stringify(KERNEL_LOAD_ADDR) > >> + " for -kernel is the default", > >> + NULL); > >> + spapr->kernel_addr = KERNEL_LOAD_ADDR; > >> + object_property_add_bool(obj, "bios", spapr_get_bios_enabled, > >> + spapr_set_bios_enabled, NULL); > >> + object_property_set_description(obj, "bios", "Conrols whether to load > >> bios", > >> + NULL); > >> + spapr->bios_enabled = true; > >> + > >> /* The machine class defines the default interrupt controller mode */ > >> spapr->irq = smc->irq; > >> object_property_add_str(obj, "ic-mode", spapr_get_ic_mode, > >> diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c > >> index f1799b1b707d..f2d8823d2c3a 100644 > >> --- a/hw/ppc/spapr_hcall.c > >> +++ b/hw/ppc/spapr_hcall.c > >> @@ -1660,15 +1660,11 @@ static bool spapr_hotplugged_dev_before_cas(void) > >> return false; > >> } > >> > >> -static target_ulong h_client_architecture_support(PowerPCCPU *cpu, > >> - SpaprMachineState > >> *spapr, > >> - target_ulong opcode, > >> - target_ulong *args) > >> +target_ulong do_client_architecture_support(PowerPCCPU *cpu, > >> + SpaprMachineState *spapr, > >> + target_ulong addr, > >> + target_ulong fdt_bufsize) > >> { > >> - /* Working address in data buffer */ > >> - target_ulong addr = ppc64_phys_to_real(args[0]); > >> - target_ulong fdt_buf = args[1]; > >> - target_ulong fdt_bufsize = args[2]; > >> target_ulong ov_table; > >> uint32_t cas_pvr; > >> SpaprOptionVector *ov1_guest, *ov5_guest, *ov5_cas_old; > >> @@ -1816,7 +1812,6 @@ static target_ulong > >> h_client_architecture_support(PowerPCCPU *cpu, > >> > >> if (!spapr->cas_reboot) { > >> void *fdt; > >> - SpaprDeviceTreeUpdateHeader hdr = { .version_id = 1 }; > >> > >> /* If spapr_machine_reset() did not set up a HPT but one is > >> necessary > >> * (because the guest isn't going to use radix) then set it up > >> here. */ > >> @@ -1825,21 +1820,7 @@ static target_ulong > >> h_client_architecture_support(PowerPCCPU *cpu, > >> spapr_setup_hpt_and_vrma(spapr); > >> } > >> > >> - if (fdt_bufsize < sizeof(hdr)) { > >> - error_report("SLOF provided insufficient CAS buffer " > >> - TARGET_FMT_lu " (min: %zu)", fdt_bufsize, > >> sizeof(hdr)); > >> - exit(EXIT_FAILURE); > >> - } > >> - > >> - fdt_bufsize -= sizeof(hdr); > >> - > >> - fdt = spapr_build_fdt(spapr, false, fdt_bufsize); > >> - _FDT((fdt_pack(fdt))); > >> - > >> - cpu_physical_memory_write(fdt_buf, &hdr, sizeof(hdr)); > >> - cpu_physical_memory_write(fdt_buf + sizeof(hdr), fdt, > >> - fdt_totalsize(fdt)); > >> - trace_spapr_cas_continue(fdt_totalsize(fdt) + sizeof(hdr)); > >> + fdt = spapr_build_fdt(spapr, !spapr->bios_enabled, fdt_bufsize); > >> > >> g_free(spapr->fdt_blob); > >> spapr->fdt_size = fdt_totalsize(fdt); > >> @@ -1854,6 +1835,41 @@ static target_ulong > >> h_client_architecture_support(PowerPCCPU *cpu, > >> return H_SUCCESS; > >> } > >> > >> +static target_ulong h_client_architecture_support(PowerPCCPU *cpu, > >> + SpaprMachineState > >> *spapr, > >> + target_ulong opcode, > >> + target_ulong *args) > >> +{ > >> + /* Working address in data buffer */ > >> + target_ulong addr = ppc64_phys_to_real(args[0]); > >> + target_ulong fdt_buf = args[1]; > >> + target_ulong fdt_bufsize = args[2]; > >> + target_ulong ret; > >> + SpaprDeviceTreeUpdateHeader hdr = { .version_id = 1 }; > >> + > >> + if (fdt_bufsize < sizeof(hdr)) { > >> + error_report("SLOF provided insufficient CAS buffer " > >> + TARGET_FMT_lu " (min: %zu)", fdt_bufsize, > >> sizeof(hdr)); > >> + exit(EXIT_FAILURE); > >> + } > >> + > >> + fdt_bufsize -= sizeof(hdr); > >> + > >> + ret = do_client_architecture_support(cpu, spapr, addr, fdt_bufsize); > >> + if (ret == H_SUCCESS) { > >> + _FDT((fdt_pack(spapr->fdt_blob))); > >> + spapr->fdt_size = fdt_totalsize(spapr->fdt_blob); > >> + spapr->fdt_initial_size = spapr->fdt_size; > >> + > >> + cpu_physical_memory_write(fdt_buf, &hdr, sizeof(hdr)); > >> + cpu_physical_memory_write(fdt_buf + sizeof(hdr), spapr->fdt_blob, > >> + spapr->fdt_size); > >> + trace_spapr_cas_continue(spapr->fdt_size + sizeof(hdr)); > >> + } > >> + > >> + return ret; > >> +} > >> + > >> static target_ulong h_home_node_associativity(PowerPCCPU *cpu, > >> SpaprMachineState *spapr, > >> target_ulong opcode, > >> @@ -1998,6 +2014,14 @@ static target_ulong h_update_dt(PowerPCCPU *cpu, > >> SpaprMachineState *spapr, > >> return H_SUCCESS; > >> } > >> > >> +static target_ulong h_client(PowerPCCPU *cpu, SpaprMachineState *spapr, > >> + target_ulong opcode, target_ulong *args) > > > > As I said in an earlier revision, please explan these names from just > > "client", for readability by people who aren't already thinking about > > open firmware. > > Yeah, I missed this one. > > > > > >> +{ > >> + target_ulong client_args = ppc64_phys_to_real(args[0]); > >> + > >> + return spapr_h_client(spapr, client_args); > >> +} > >> + > >> static spapr_hcall_fn papr_hypercall_table[(MAX_HCALL_OPCODE / 4) + 1]; > >> static spapr_hcall_fn kvmppc_hypercall_table[KVMPPC_HCALL_MAX - > >> KVMPPC_HCALL_BASE + 1]; > >> static spapr_hcall_fn svm_hypercall_table[(SVM_HCALL_MAX - > >> SVM_HCALL_BASE) / 4 + 1]; > >> @@ -2121,6 +2145,8 @@ static void hypercall_register_types(void) > >> > >> spapr_register_hypercall(KVMPPC_H_UPDATE_DT, h_update_dt); > >> > >> + spapr_register_hypercall(KVMPPC_H_CLIENT, h_client); > >> + > >> /* Virtual Processor Home Node */ > >> spapr_register_hypercall(H_HOME_NODE_ASSOCIATIVITY, > >> h_home_node_associativity); > >> diff --git a/hw/ppc/spapr_of_client.c b/hw/ppc/spapr_of_client.c > >> new file mode 100644 > >> index 000000000000..24d854b76e51 > >> --- /dev/null > >> +++ b/hw/ppc/spapr_of_client.c > > > > I'd suggest expanding this file to cover as much as you can of the > > virtual OF stuff, not just the client interface. > > This is done in v6. > > > > > >> @@ -0,0 +1,633 @@ > >> +#include "qemu/osdep.h" > >> +#include "qemu-common.h" > >> +#include "qapi/error.h" > >> +#include "exec/memory.h" > >> +#include "hw/ppc/spapr.h" > >> +#include "hw/ppc/spapr_vio.h" > >> +#include "chardev/char.h" > >> +#include "qom/qom-qobject.h" > >> +#include "trace.h" > >> + > >> +typedef struct { > >> + DeviceState *dev; > >> + Chardev *cdev; > >> + uint32_t phandle; > >> +} SpaprOfInstance; > >> + > >> +/* > >> + * OF 1275 "nextprop" description suggests is it 32 bytes max but > >> + * LoPAPR defines "ibm,query-interrupt-source-number" which is 33 chars > >> long. > >> + */ > >> +#define OF_PROPNAME_LEN_MAX 64 > >> + > >> +/* Defined as Big Endian */ > >> +struct prom_args { > >> + uint32_t service; > >> + uint32_t nargs; > >> + uint32_t nret; > >> + uint32_t args[10]; > >> +}; > >> + > >> +static void readstr(hwaddr pa, char *buf, int size) > >> +{ > >> + cpu_physical_memory_read(pa, buf, size - 1); > >> + buf[size - 1] = 0; > >> +} > > > > I'd still like to see this return some kind of error if it had to > > truncate what was passed by the client. > > > But truncating will produce error anyway - libfdt won't find stuff, > etc. Probably, but I think the error will be much more comprehensible if we catch it here. > >> + > >> +static bool _cmpservice(const char *s, size_t len, > > > > Don't use leading _ please - in userland those are reserved for the > > system libraries. > > > >> + unsigned nargs, unsigned nret, > >> + const char *s1, size_t len1, > >> + unsigned nargscheck, unsigned nretcheck) > >> +{ > >> + if (strcmp(s, s1)) { > >> + return false; > >> + } > >> + if (nargscheck == 0 && nretcheck == 0) { > >> + return true; > >> + } > >> + if (nargs != nargscheck || nret != nretcheck) { > >> + trace_spapr_of_client_error_param(s, nargscheck, nretcheck, nargs, > >> + nret); > >> + return false; > >> + } > >> + > >> + return true; > >> +} > >> + > >> +static uint32_t of_client_finddevice(const void *fdt, uint32_t nodeaddr) > >> +{ > >> + char node[256]; > > > > Is 256 enough? OF paths can get pretty long... > > > Hard to imagine that 255 is not enough though. Long parts of the path > would be scsi drive id, PHB but in between we can only have a bunch of > PCI bridges and these are not so long. Hm, ok. I had a look on a Boston and the longest path I see there is 75 characters, I thought it might be a lot more. > What do you think is an appropriate limit? > > > > > >> + int ret; > >> + > >> + readstr(nodeaddr, node, sizeof(node)); > >> + ret = fdt_path_offset(fdt, node); > >> + if (ret >= 0) { > >> + ret = fdt_get_phandle(fdt, ret); > >> + } > >> + > >> + return (uint32_t) ret; > >> +} > >> + > >> +static uint32_t of_client_getprop(const void *fdt, uint32_t nodeph, > >> + uint32_t pname, uint32_t valaddr, > >> + uint32_t vallen) > >> +{ > >> + char propname[OF_PROPNAME_LEN_MAX + 1]; > >> + uint32_t ret = 0; > >> + int proplen = 0; > >> + const void *prop; > >> + > >> + readstr(pname, propname, sizeof(propname)); > >> + prop = fdt_getprop_namelen(fdt, fdt_node_offset_by_phandle(fdt, > >> nodeph), > >> + propname, strlen(propname), &proplen); > > > > Again, you don't need _namelen. > > > >> + if (prop) { > >> + int cb = MIN(proplen, vallen); > >> + > >> + cpu_physical_memory_write(valaddr, prop, cb); > >> + ret = cb; > > > > If I'm reading 1275 correctly, the return value should be the > > untruncated length of the property. > > > "Size is either the actual size of the property". I'd think the actual > size is what we actually copied to the buffer but @proplen is probably > what they meant, I'll change to that and see what breaks. > > > > >> + } else { > >> + ret = -1; > >> + } > >> + trace_spapr_of_client_getprop(nodeph, propname, ret); > >> + > >> + return ret; > >> +} > >> + > >> +static uint32_t of_client_getproplen(const void *fdt, uint32_t nodeph, > >> + uint32_t pname) > >> +{ > >> + char propname[OF_PROPNAME_LEN_MAX + 1]; > >> + uint32_t ret = 0; > >> + int proplen = 0; > >> + const void *prop; > >> + > >> + readstr(pname, propname, sizeof(propname)); > >> + > >> + prop = fdt_getprop_namelen(fdt, fdt_node_offset_by_phandle(fdt, > >> nodeph), > >> + propname, strlen(propname), &proplen); > > > > No _namelen. > > > >> + if (prop) { > >> + ret = proplen; > >> + } else { > >> + ret = -1; > >> + } > >> + > >> + return ret; > >> +} > >> + > >> +static uint32_t of_client_setprop(SpaprMachineState *spapr, > >> + uint32_t nodeph, uint32_t pname, > >> + uint32_t valaddr, uint32_t vallen) > >> +{ > >> + char propname[OF_PROPNAME_LEN_MAX + 1]; > >> + uint32_t ret = -1; > >> + int offset; > > > > A comment noting that you're only allowing a very restricted set of > > setprops would be good. > > > Is not that quite clear from the code itself? Okay... Well, kinda. The rationale for it would be useful here though. > >> + readstr(pname, propname, sizeof(propname)); > >> + if (vallen == sizeof(uint32_t)) { > >> + uint32_t val32 = ldl_be_phys(first_cpu->as, valaddr); > >> + > >> + if ((strcmp(propname, "linux,rtas-base") == 0) || > >> + (strcmp(propname, "linux,rtas-entry") == 0)) { > >> + spapr->rtas_base = val32; > >> + } else if (strcmp(propname, "linux,initrd-start") == 0) { > >> + spapr->initrd_base = val32; > >> + } else if (strcmp(propname, "linux,initrd-end") == 0) { > >> + spapr->initrd_size = val32 - spapr->initrd_base; > >> + } else { > >> + goto trace_exit; > >> + } > >> + } else if (vallen == sizeof(uint64_t)) { > >> + uint64_t val64 = ldq_be_phys(first_cpu->as, valaddr); > >> + > >> + if (strcmp(propname, "linux,initrd-start") == 0) { > >> + spapr->initrd_base = val64; > >> + } else if (strcmp(propname, "linux,initrd-end") == 0) { > >> + spapr->initrd_size = val64 - spapr->initrd_base; > >> + } else { > >> + goto trace_exit; > >> + } > >> + } else { > >> + goto trace_exit; > >> + } > >> + > >> + offset = fdt_node_offset_by_phandle(spapr->fdt_blob, nodeph); > >> + if (offset >= 0) { > >> + uint8_t data[vallen]; > >> + > >> + cpu_physical_memory_read(valaddr, data, vallen); > >> + if (!fdt_setprop(spapr->fdt_blob, offset, propname, data, > >> vallen)) { > >> + ret = vallen; > >> + } > >> + } > >> + > >> +trace_exit: > >> + trace_spapr_of_client_setprop(nodeph, propname, ret); > >> + > >> + return ret; > >> +} > >> + > >> +static uint32_t of_client_nextprop(const void *fdt, uint32_t phandle, > >> + uint32_t prevaddr, uint32_t nameaddr) > >> +{ > >> + int offset = fdt_node_offset_by_phandle(fdt, phandle); > >> + char prev[OF_PROPNAME_LEN_MAX + 1]; > >> + const char *tmp; > >> + > >> + readstr(prevaddr, prev, sizeof(prev)); > >> + for (offset = fdt_first_property_offset(fdt, offset); > >> + offset >= 0; > >> + offset = fdt_next_property_offset(fdt, offset)) { > >> + > >> + if (!fdt_getprop_by_offset(fdt, offset, &tmp, NULL)) { > >> + return 0; > >> + } > >> + if (prev[0] == '\0' || strcmp(prev, tmp) == 0) { > >> + if (prev[0] != '\0') { > >> + offset = fdt_next_property_offset(fdt, offset); > >> + if (offset < 0) { > >> + return 0; > >> + } > >> + } > >> + if (!fdt_getprop_by_offset(fdt, offset, &tmp, NULL)) { > >> + return 0; > >> + } > >> + cpu_physical_memory_write(nameaddr, tmp, strlen(tmp) + 1); > >> + return 1; > >> + } > >> + } > >> + > >> + return 0; > >> +} > >> + > >> +static uint32_t of_client_peer(const void *fdt, uint32_t phandle) > >> +{ > >> + int ret; > >> + > >> + if (phandle == 0) { > >> + ret = fdt_path_offset(fdt, "/"); > >> + } else { > >> + ret = fdt_next_subnode(fdt, fdt_node_offset_by_phandle(fdt, > >> phandle)); > >> + } > >> + > >> + if (ret < 0) { > >> + ret = 0; > >> + } else { > >> + ret = fdt_get_phandle(fdt, ret); > >> + } > >> + > >> + return ret; > >> +} > >> + > >> +static uint32_t of_client_child(const void *fdt, uint32_t phandle) > >> +{ > >> + int ret = fdt_first_subnode(fdt, fdt_node_offset_by_phandle(fdt, > >> phandle)); > >> + > >> + if (ret < 0) { > >> + ret = 0; > >> + } else { > >> + ret = fdt_get_phandle(fdt, ret); > >> + } > >> + > >> + return ret; > >> +} > >> + > >> +static uint32_t of_client_parent(const void *fdt, uint32_t phandle) > >> +{ > >> + int ret = fdt_parent_offset(fdt, fdt_node_offset_by_phandle(fdt, > >> phandle)); > >> + > >> + if (ret < 0) { > >> + ret = 0; > >> + } else { > >> + ret = fdt_get_phandle(fdt, ret); > >> + } > >> + > >> + return ret; > >> +} > >> + > >> +static DeviceState *of_client_find_qom_dev(BusState *bus, const char > >> *path) > >> +{ > >> + BusChild *kid; > >> + > >> + QTAILQ_FOREACH(kid, &bus->children, sibling) { > >> + const char *p = qdev_get_fw_dev_path(kid->child); > >> + BusState *child; > >> + > >> + if (p && strcmp(path, p) == 0) { > >> + return kid->child; > >> + } > >> + QLIST_FOREACH(child, &kid->child->child_bus, sibling) { > >> + DeviceState *d = of_client_find_qom_dev(child, path); > >> + > >> + if (d) { > >> + return d; > >> + } > >> + } > >> + } > >> + return NULL; > >> +} > >> + > >> +uint32_t spapr_of_client_open(SpaprMachineState *spapr, const char *path) > >> +{ > >> + int offset; > >> + uint32_t ret = 0; > >> + SpaprOfInstance *inst; > >> + > >> + if (spapr->of_instance_last == 0xFFFFFFFF) { > >> + /* We do not recycle ihandles yet */ > >> + goto trace_exit; > >> + } > >> + offset = fdt_path_offset(spapr->fdt_blob, path); > >> + if (offset < 0) { > >> + trace_spapr_of_client_error_unknown_path(path); > >> + goto trace_exit; > >> + } > >> + > >> + inst = g_new(SpaprOfInstance, 1); > >> + inst->phandle = fdt_get_phandle(spapr->fdt_blob, offset); > >> + g_assert(inst->phandle); > >> + ++spapr->of_instance_last; > >> + inst->dev = of_client_find_qom_dev(sysbus_get_default(), path); > >> + g_hash_table_insert(spapr->of_instances, > >> + GINT_TO_POINTER(spapr->of_instance_last), > >> + inst); > >> + ret = spapr->of_instance_last; > >> + > >> + if (inst->dev) { > >> + const char *cdevstr = object_property_get_str(OBJECT(inst->dev), > >> + "chardev", NULL); > >> + > >> + if (cdevstr) { > >> + inst->cdev = qemu_chr_find(cdevstr); > >> + } > >> + } > >> + > >> +trace_exit: > >> + trace_spapr_of_client_open(path, inst ? inst->phandle : 0, ret); > >> + > >> + return ret; > >> +} > >> + > >> +static uint32_t of_client_open(SpaprMachineState *spapr, uint32_t > >> pathaddr) > >> +{ > >> + char path[256]; > >> + > >> + readstr(pathaddr, path, sizeof(path)); > >> + > >> + return spapr_of_client_open(spapr, path); > >> +} > >> + > >> +static void of_client_close(SpaprMachineState *spapr, uint32_t ihandle) > >> +{ > >> + if (!g_hash_table_remove(spapr->of_instances, > >> GINT_TO_POINTER(ihandle))) { > >> + trace_spapr_of_client_error_unknown_ihandle_close(ihandle); > >> + } > >> +} > >> + > >> +static uint32_t of_client_instance_to_package(SpaprMachineState *spapr, > >> + uint32_t ihandle) > >> +{ > >> + gpointer instp = g_hash_table_lookup(spapr->of_instances, > >> + GINT_TO_POINTER(ihandle)); > >> + > >> + if (!instp) { > >> + return -1; > >> + } > >> + > >> + return ((SpaprOfInstance *)instp)->phandle; > >> +} > >> + > >> +static uint32_t of_client_package_to_path(const void *fdt, uint32_t > >> phandle, > >> + uint32_t buf, uint32_t len) > >> +{ > >> + char tmp[256]; > >> + > >> + if (0 == fdt_get_path(fdt, fdt_node_offset_by_phandle(fdt, phandle), > >> tmp, > >> + sizeof(tmp))) { > >> + tmp[sizeof(tmp) - 1] = 0; > >> + cpu_physical_memory_write(buf, tmp, MIN(len, strlen(tmp))); > >> + } > >> + return len; > >> +} > >> + > >> +static uint32_t of_client_instance_to_path(SpaprMachineState *spapr, > >> + uint32_t ihandle, uint32_t buf, > >> + uint32_t len) > >> +{ > >> + uint32_t phandle = of_client_instance_to_package(spapr, ihandle); > >> + > >> + if (phandle != -1) { > >> + return of_client_package_to_path(spapr->fdt_blob, phandle, buf, > >> len); > >> + } > >> + > >> + return 0; > >> +} > >> + > >> +static uint32_t of_client_write(SpaprMachineState *spapr, uint32_t > >> ihandle, > >> + uint32_t buf, uint32_t len) > >> +{ > >> + char tmp[256]; > >> + int toread, toprint, cb = MIN(len, 1024); > >> + SpaprOfInstance *inst = (SpaprOfInstance *) > >> + g_hash_table_lookup(spapr->of_instances, > >> GINT_TO_POINTER(ihandle)); > >> + > >> + while (cb > 0) { > >> + toread = MIN(cb + 1, sizeof(tmp)); > >> + readstr(buf, tmp, toread); > >> + toprint = strlen(tmp); > >> + if (inst && inst->cdev) { > >> + toprint = qemu_chr_write(inst->cdev, (uint8_t *) tmp, toprint, > >> + true); > >> + } else { > >> + /* We normally open stdout so this is fallback */ > >> + printf("DBG[%d]%s", ihandle, tmp); > >> + } > >> + buf += toprint; > >> + cb -= toprint; > >> + } > >> + > >> + return len; > >> +} > >> + > >> +static bool of_client_claim_avail(GArray *claimed, uint64_t virt, > >> uint64_t size) > >> +{ > >> + int i; > >> + SpaprOfClaimed *c; > >> + > >> + for (i = 0; i < claimed->len; ++i) { > >> + c = &g_array_index(claimed, SpaprOfClaimed, i); > >> + if ((c->start <= virt && virt < c->start + c->size) || > >> + (virt <= c->start && c->start < virt + size)) { > >> + return false; > >> + } > >> + } > >> + > >> + return true; > >> +} > >> + > >> +static void of_client_claim_add(GArray *claimed, uint64_t virt, uint64_t > >> size) > >> +{ > >> + SpaprOfClaimed newclaim; > >> + > >> + newclaim.start = virt; > >> + newclaim.size = size; > >> + g_array_append_val(claimed, newclaim); > >> +} > >> + > >> +/* > >> + * "claim" claims memory at @virt if @align==0; otherwise it allocates > >> + * memory at the requested alignment. > >> + */ > >> +uint64_t spapr_do_of_client_claim(SpaprMachineState *spapr, uint64_t virt, > >> + uint64_t size, uint64_t align) > >> +{ > >> + uint32_t ret; > >> + > >> + if (align == 0) { > >> + if (!of_client_claim_avail(spapr->claimed, virt, size)) { > >> + return -1; > >> + } > >> + ret = virt; > >> + } else { > >> + align = pow2ceil(align); > > > > Should this be a pow2ceil, or should it just return an error if align > > is not a power of 2. > Note that aligning something to 4 bytes will > > (probably) make it *not* aligned to 3 bytes. > > I did not see any notes about the specific alignment requirements here, > the idea is that clients may just not expect unaligned memory at all; I > could probably just drop it and see what happens... I don't follow you. Isn't the align value coming from the client? > >> + spapr->claimed_base = (spapr->claimed_base + align - 1) & ~(align > >> - 1); > >> + while (1) { > >> + if (spapr->claimed_base >= spapr->rma_size) { > >> + perror("Out of memory"); > > > > error_report() or qemu_log() or something and a message with some more > > specificity, please. > > > What kind of specificity is missing here? That it's on the OF claim interface specifically, and how much they were trying to claim. > >> + return -1; > >> + } > >> + if (of_client_claim_avail(spapr->claimed, spapr->claimed_base, > >> + size)) { > >> + break; > >> + } > >> + spapr->claimed_base += size; > >> + } > >> + ret = spapr->claimed_base; > >> + } > >> + > >> + spapr->claimed_base = MAX(spapr->claimed_base, ret + size); > >> + of_client_claim_add(spapr->claimed, virt, size); > >> + trace_spapr_of_client_claim(virt, size, align, ret); > >> + > >> + return ret; > >> +} > >> + > >> +static uint32_t of_client_claim(SpaprMachineState *spapr, uint32_t virt, > >> + uint32_t size, uint32_t align) > >> +{ > >> + if (align) { > >> + return -1; > >> + } > >> + if (!of_client_claim_avail(spapr->claimed, virt, size)) { > >> + return -1; > >> + } > >> + > >> + spapr->claimed_base = MAX(spapr->claimed_base, virt + size); > >> + of_client_claim_add(spapr->claimed, virt, size); > >> + trace_spapr_of_client_claim(virt, size, align, virt); > > > > Huh. So do_of_client_claim() is never used from of_client_claim(), > > only from "internal" claimers. It definitely needs a different name. > > > It is used in v6. Linux always passes non-zero @virt but grub does not. > > > >> + return virt; > >> +} > >> + > >> +static uint32_t of_client_call_method(SpaprMachineState *spapr, > >> + uint32_t methodaddr, uint32_t > >> ihandle, > >> + uint32_t param, uint32_t *ret2) > >> +{ > >> + uint32_t ret = -1; > >> + char path[256] = "", method[256] = ""; > >> + uint32_t phandle = of_client_instance_to_package(spapr, ihandle); > >> + int offset; > >> + > >> + if (!ihandle) { > >> + goto trace_exit; > >> + } > >> + > >> + readstr(methodaddr, method, sizeof(method)); > >> + phandle = of_client_instance_to_package(spapr, ihandle); > >> + if (!phandle) { > >> + goto trace_exit; > >> + } > >> + > >> + offset = fdt_node_offset_by_phandle(spapr->fdt_blob, phandle); > >> + if (offset < 0) { > >> + goto trace_exit; > >> + } > >> + > >> + if (fdt_get_path(spapr->fdt_blob, offset, path, sizeof(path))) { > >> + goto trace_exit; > >> + } > >> + > >> + if (strcmp(path, "/") == 0) { > >> + if (strcmp(method, "ibm,client-architecture-support") == 0) { > >> + > >> +#define FDT_MAX_SIZE 0x100000 > >> + ret = do_client_architecture_support(POWERPC_CPU(first_cpu), > >> spapr, > >> + param, FDT_MAX_SIZE); > >> + *ret2 = 0; > >> + } > >> + } else if (strcmp(path, "/rtas") == 0) { > >> + if (strcmp(method, "instantiate-rtas") == 0) { > >> + spapr_instantiate_rtas(spapr, param); > >> + ret = 0; > >> + *ret2 = param; /* rtasbase */ > >> + } > >> + } else { > >> + trace_spapr_of_client_error_unknown_method(method); > >> + } > >> + > >> +trace_exit: > >> + trace_spapr_of_client_method(ihandle, method, param, phandle, path, > >> ret); > >> + > >> + return ret; > >> +} > >> + > >> +static void of_client_quiesce(SpaprMachineState *spapr) > >> +{ > >> + int rc = fdt_pack(spapr->fdt_blob); > >> + /* Should only fail if we've built a corrupted tree */ > >> + assert(rc == 0); > >> + > >> + spapr->fdt_size = fdt_totalsize(spapr->fdt_blob); > >> + spapr->fdt_initial_size = spapr->fdt_size; > >> +} > >> + > >> +int spapr_h_client(SpaprMachineState *spapr, target_ulong of_client_args) > >> +{ > >> + struct prom_args args = { 0 }; > >> + char service[64]; > >> + unsigned nargs, nret; > >> + int i, servicelen; > >> + > >> + cpu_physical_memory_read(of_client_args, &args, sizeof(args)); > >> + nargs = be32_to_cpu(args.nargs); > >> + nret = be32_to_cpu(args.nret); > >> + readstr(be32_to_cpu(args.service), service, sizeof(service)); > >> + servicelen = strlen(service); > >> + > >> +#define cmpservice(s, a, r) \ > >> + _cmpservice(service, servicelen, nargs, nret, (s), sizeof(s), (a), > >> (r)) > >> + > >> + if (cmpservice("finddevice", 1, 1)) { > >> + args.args[nargs] = of_client_finddevice(spapr->fdt_blob, > >> + > >> be32_to_cpu(args.args[0])); > >> + } else if (cmpservice("getprop", 4, 1)) { > >> + args.args[nargs] = of_client_getprop(spapr->fdt_blob, > >> + be32_to_cpu(args.args[0]), > >> + be32_to_cpu(args.args[1]), > >> + be32_to_cpu(args.args[2]), > >> + be32_to_cpu(args.args[3])); > >> + } else if (cmpservice("getproplen", 2, 1)) { > >> + args.args[nargs] = of_client_getproplen(spapr->fdt_blob, > >> + be32_to_cpu(args.args[0]), > >> + > >> be32_to_cpu(args.args[1])); > >> + } else if (cmpservice("setprop", 4, 1)) { > >> + args.args[nargs] = of_client_setprop(spapr, > >> + be32_to_cpu(args.args[0]), > >> + be32_to_cpu(args.args[1]), > >> + be32_to_cpu(args.args[2]), > >> + be32_to_cpu(args.args[3])); > >> + } else if (cmpservice("nextprop", 3, 1)) { > >> + args.args[nargs] = of_client_nextprop(spapr->fdt_blob, > >> + be32_to_cpu(args.args[0]), > >> + be32_to_cpu(args.args[1]), > >> + be32_to_cpu(args.args[2])); > >> + } else if (cmpservice("peer", 1, 1)) { > >> + args.args[nargs] = of_client_peer(spapr->fdt_blob, > >> + be32_to_cpu(args.args[0])); > >> + } else if (cmpservice("child", 1, 1)) { > >> + args.args[nargs] = of_client_child(spapr->fdt_blob, > >> + be32_to_cpu(args.args[0])); > >> + } else if (cmpservice("parent", 1, 1)) { > >> + args.args[nargs] = of_client_parent(spapr->fdt_blob, > >> + be32_to_cpu(args.args[0])); > >> + } else if (cmpservice("open", 1, 1)) { > >> + args.args[nargs] = of_client_open(spapr, > >> be32_to_cpu(args.args[0])); > >> + } else if (cmpservice("close", 1, 0)) { > >> + of_client_close(spapr, be32_to_cpu(args.args[0])); > >> + } else if (cmpservice("instance-to-package", 1, 1)) { > >> + args.args[nargs] = > >> + of_client_instance_to_package(spapr, > >> + be32_to_cpu(args.args[0])); > >> + } else if (cmpservice("package-to-path", 3, 1)) { > >> + args.args[nargs] = of_client_package_to_path(spapr->fdt_blob, > >> + > >> be32_to_cpu(args.args[0]), > >> + > >> be32_to_cpu(args.args[1]), > >> + > >> be32_to_cpu(args.args[2])); > >> + } else if (cmpservice("instance-to-path", 3, 1)) { > >> + args.args[nargs] = > >> + of_client_instance_to_path(spapr, > >> + be32_to_cpu(args.args[0]), > >> + be32_to_cpu(args.args[1]), > >> + be32_to_cpu(args.args[2])); > >> + } else if (cmpservice("write", 3, 1)) { > >> + args.args[nargs] = of_client_write(spapr, > >> + be32_to_cpu(args.args[0]), > >> + be32_to_cpu(args.args[1]), > >> + be32_to_cpu(args.args[2])); > >> + } else if (cmpservice("claim", 3, 1)) { > >> + args.args[nargs] = of_client_claim(spapr, > >> + be32_to_cpu(args.args[0]), > >> + be32_to_cpu(args.args[1]), > >> + be32_to_cpu(args.args[2])); > >> + } else if (cmpservice("call-method", 3, 2)) { > >> + args.args[nargs] = of_client_call_method(spapr, > >> + > >> be32_to_cpu(args.args[0]), > >> + > >> be32_to_cpu(args.args[1]), > >> + > >> be32_to_cpu(args.args[2]), > >> + &args.args[nargs + 1]); > >> + } else if (cmpservice("quiesce", 0, 0)) { > >> + of_client_quiesce(spapr); > >> + } else if (cmpservice("exit", 0, 0)) { > >> + error_report("Stopped as the VM requested \"exit\""); > >> + vm_stop(RUN_STATE_PAUSED); > >> + } else { > >> + trace_spapr_of_client_error_unknown_service(service, nargs, nret); > >> + args.args[nargs] = -1; > > > > You've never bounds checked nargs at this point. > > > >> + } > >> + > >> + for (i = 0; i < nret; ++i) { > > > > And likewise you might not have bounds checked nret. > > Oh, true. Thanks, > > > > > >> + args.args[nargs + i] = be32_to_cpu(args.args[nargs + i]); > >> + } > >> + cpu_physical_memory_write(of_client_args, &args, sizeof(args)); > >> + > >> + return H_SUCCESS; > >> +} > >> diff --git a/hw/ppc/trace-events b/hw/ppc/trace-events > >> index 9ea620f23c85..e2d1e58d07c3 100644 > >> --- a/hw/ppc/trace-events > >> +++ b/hw/ppc/trace-events > >> @@ -21,6 +21,18 @@ spapr_update_dt(unsigned cb) "New blob %u bytes" > >> spapr_update_dt_failed_size(unsigned cbold, unsigned cbnew, unsigned > >> magic) "Old blob %u bytes, new blob %u bytes, magic 0x%x" > >> spapr_update_dt_failed_check(unsigned cbold, unsigned cbnew, unsigned > >> magic) "Old blob %u bytes, new blob %u bytes, magic 0x%x" > >> > >> +# spapr_client.c > >> +spapr_of_client_error_param(const char *method, int nargscheck, int > >> nretcheck, int nargs, int nret) "%s takes/returns %d/%d, not %d/%d" > >> +spapr_of_client_error_unknown_service(const char *service, int nargs, int > >> nret) "%s args=%d rets=%d" > >> +spapr_of_client_error_unknown_method(const char *method) "%s" > >> +spapr_of_client_error_unknown_ihandle_close(uint32_t ihandle) "0x%x" > >> +spapr_of_client_error_unknown_path(const char *path) "%s" > >> +spapr_of_client_claim(uint32_t virt, uint32_t size, uint32_t align, > >> uint32_t ret) "virt=0x%x size=0x%x align=0x%x => 0x%x" > >> +spapr_of_client_method(uint32_t ihandle, const char *method, uint32_t > >> param, uint32_t phandle, const char *path, uint32_t ret) "0x%x \"%s\" > >> param=0x%x ph=0x%x \"%s\" => 0x%x" > >> +spapr_of_client_getprop(uint32_t ph, const char *prop, uint32_t ret) > >> "phandle=0x%x \"%s\" => 0x%x" > >> +spapr_of_client_setprop(uint32_t ph, const char *prop, uint32_t ret) > >> "phandle=0x%x \"%s\" => 0x%x" > >> +spapr_of_client_open(const char *path, uint32_t phandle, uint32_t > >> ihandle) "%s 0x%x => 0x%x" > >> + > >> # spapr_hcall_tpm.c > >> spapr_h_tpm_comm(const char *device_path, uint64_t operation) > >> "tpm_device_path=%s operation=0x%"PRIu64 > >> spapr_tpm_execute(uint64_t data_in, uint64_t data_in_sz, uint64_t > >> data_out, uint64_t data_out_sz) "data_in=0x%"PRIx64", > >> data_in_sz=%"PRIu64", data_out=0x%"PRIx64", data_out_sz=%"PRIu64 > > > -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson
signature.asc
Description: PGP signature