On Mon, May 11, 2015 at 12:13 AM, Peter Maydell <peter.mayd...@linaro.org> wrote: > On 11 May 2015 at 07:29, Peter Crosthwaite <crosthwaitepe...@gmail.com> wrote: >> This is target-multi, a system-mode build that can support multiple >> cpu-types. Patches 1-3 are the main infrastructure. The hard part >> is the per-target changes needed to get each arch into an includable >> state. > > Interesting. This is something I'd thought we were still some way > from being able to do :-) > >> The hardest part is what to do about bootloading. Currently each arch >> has it's own architecture specific bootloading which may assume a >> single architecture. I have applied some hacks to at least get this >> RFC testable using a -kernel -firmware split but going forward being >> able to associate an elf/image with a cpu explictitly needs to be >> solved. > > My first thought would be to leave the -kernel/-firmware stuff as > legacy (or at least with semantics defined by the board model in use) > and have per-CPU QOM properties for setting up images for genuinely > multi-CPU configs. >
OK >> For the implementation of this series, the trickiest part is cpu.h >> inclusion management. There are now more than one cpu.h's and different >> parts of the tree need a different include scheme. target-multi defines >> it's own cpu.h which is bare minimum defs as needed by core code only. >> target-foo/cpu.h are mostly the same but refactored to reuse common >> code (with target-multi/cpu-head.h). Inclusion scheme goes something like >> this (for the multi-arch build): >> >> 1: All obj-y modules include target-multi/cpu.h >> 2: Core code includes no other cpu.h's >> 3: target-foo/ implementation code includes target-foo/cpu.h >> 4: System level code (e.g. mach models) can use multiple target-foo/cpu.h's >> >> Point 4 means that cpu.h's needs to be refactored to be able to include one >> after the other. The interrupts for ARM and MB needed to be renamed to avoid >> namespace collision. A few other defs needed multiple include guards, and >> a few defs which where only for user mode are compiled out or relocated. No >> attempt at support for multi-arch linux-user mode (if that even makes >> sense?). > > I don't think it does make much sense -- our linux-user code hardwires > a lot of ABI details like size of 'long' and struct layouts. In any > case we should probably leave it for later. > >> The env as handle by common code now needs to architecture-agnostic. The >> MB and ARM envs are refactored to have CPU_COMMON as the first field(s) >> allowing QOM-style pointer casts to/from a generic env which contains only >> CPU_COMMON. Might need to lock down some struct packing for that but it >> works for me so far. > > Have you managed to retain the "generated code passes around a pointer > to an env which starts with the CPU specific fields"? We have the env > structs the layout we do because it's a performance hit if the registers > aren't a short distance away from the pointer... > OK, I knew there had to be a reason. So I guess the simplest alternative is pad the env out so the arch-specific env sections are the same length followed by a CPU_COMMON. A bit of union { struct {} } stuffs might just do the trick although there will be some earthworks on cpu.h. >> The helper function namespace is going to be tricky. I haven't tackled the >> problem just yet, but looking for ideas on how we can avoid prefacing all >> helpers with arch prefixes to avoid link-time collisions because multiple >> arches use the same helper names. >> >> A lowest common denomintor approach is taken on architecture specifics. E.g. >> TARGET_LONG is 64-bit, and the address space sizes and NUM_MMU_MODES is set >> to the maximum of all the supported arches. > > ...speaking of performance hits. > > I'm not sure you can do lowest-common-denominator for TARGET_PAGE_SIZE, > incidentally. At minimum it will result in a perf hit for the CPUs with > larger pages (because we end up taking the hugepage support paths in the > cputlb.c code), and at worst TLB flushing in the target's helper routines > might not take out the right pages. (I think ARM has some theoretical > bugs here which we don't hit in practice; ARM already has to cope with > a TARGET_PAGE_SIZE smaller than its usual pagesize, though.) > So I have gone for TARGET_PAGE_SIZE = 12 as the only initially supported config. This will go a long way while we figure out mixing page sizes on the core level. I chose to ignore the ARM 1k page size thing as the code comment suggests it's a legacy thing anyway. >> The remaining globally defined interfaces between core code and CPUs are >> QOMified per-cpu (P2) >> >> Microblaze translation needs a change pattern to allow conversion to 64-bit >> TARGET_LONG. Uses of TCGv need to be removed and explicited to 32-bit. > > Yeah, this will be a tedious job for the other targets (I had to do it > for ARM when I added the AArch64 support). > It's very scriptable. I had it to a point where I could use vim s//cg mode to turn it into and interactive conversion. >> This RFC will serve as a reference as I send bits and piece to the respective >> maintainers (many major subsystems are patched). >> >> No support for KVM, im not sure if a mix of TCG and KVM is supported even for >> a single arch? (which would be prerequisite to MA KVM). > > You can build a single binary which supports both TCG and KVM for a > particular architecture. You just can't swap back and forth between > TCG and KVM at runtime. We should probably start by supporting KVM > only on boards with a single CPU architecture. I don't think it's > in-principle impossible to get a setup with 4 KVM CPUs and one > TCG emulated CPUs to work, but it probably needs to wait til we've > got multi-threaded TCG working before we even think about it. > OK. >> Depends (not heavily) on my on-list disas QOMification. Test instructions >> available on request. I have tested ARM & MB elfs handshaking through shared >> memory and both printfing to the same UART (verifying system level >> connectivity). -d in_asm works with the mix of disas arches comming out. > > Did you do any benchmarking to see whether the performance hits are > noticeable in practice? > No, do you have any recommendations? > Do you give each CPU its own codegen buffer? (I'm thinking that some > of this might also be more easily done once multithreadded-TCG is > complete, since that will properly split the datastructures.) > No, the approach taken here is everything is exactly the same as existing SMP. My logic is we already have the core support in that AArch64 SMP lets us runtime mix-and-match arches. E.g. there's nothing stopping the bootloader putting one core in AA32 and the other in 64 leading to basically multi-arch. I just extend that to cross target-foo boundaries with some code re-arrangement. Regards, Peter > thanks > -- PMM >