On 12.11.14 10:11, Paolo Bonzini wrote: > > > On 12/11/2014 10:08, Alexander Graf wrote: >> >> >> On 12.11.14 09:49, Frank Blaschka wrote: >>> On Tue, Nov 11, 2014 at 04:24:24PM +0100, Alexander Graf wrote: >>>> >>>> >>>> On 11.11.14 15:08, Frank Blaschka wrote: >>>>> On Tue, Nov 11, 2014 at 01:51:25PM +0100, Alexander Graf wrote: >>>>>> >>>>>> >>>>>> >>>>>>> Am 11.11.2014 um 13:39 schrieb Frank Blaschka >>>>>>> <blasc...@linux.vnet.ibm.com>: >>>>>>> >>>>>>>> On Tue, Nov 11, 2014 at 01:16:04PM +0100, Alexander Graf wrote: >>>>>>>> >>>>>>>> >>>>>>>>> On 11.11.14 13:10, Frank Blaschka wrote: >>>>>>>>>> On Mon, Nov 10, 2014 at 04:56:21PM +0100, Alexander Graf wrote: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> On 10.11.14 15:20, Frank Blaschka wrote: >>>>>>>>>>> From: Frank Blaschka <frank.blasc...@de.ibm.com> >>>>>>>>>>> >>>>>>>>>>> This patch implements the s390 pci instructions in qemu. It allows >>>>>>>>>>> to access and drive pci devices attached to the s390 pci bus. >>>>>>>>>>> Because of platform constrains devices using IO BARs are not >>>>>>>>>>> supported. Also a device has to support MSI/MSI-X to run on s390. >>>>>>>>>>> >>>>>>>>>>> Signed-off-by: Frank Blaschka <frank.blasc...@de.ibm.com> >>>>>>>>>>> --- >>>>>>>>>>> target-s390x/Makefile.objs | 2 +- >>>>>>>>>>> target-s390x/kvm.c | 52 ++++ >>>>>>>>>>> target-s390x/pci_ic.c | 753 >>>>>>>>>>> +++++++++++++++++++++++++++++++++++++++++++++ >>>>>>>>>>> target-s390x/pci_ic.h | 335 ++++++++++++++++++++ >>>>>>>>>>> 4 files changed, 1141 insertions(+), 1 deletion(-) >>>>>>>>>>> create mode 100644 target-s390x/pci_ic.c >>>>>>>>>>> create mode 100644 target-s390x/pci_ic.h >>>>>>>>>>> >>>>>>>> >>>>>>>> [...] >>>>>>>> >>>>>>>>>>> +int kvm_pcilg_service_call(S390CPU *cpu, struct kvm_run *run) >>>>>>>>>>> +{ >>>>>>>>>>> + CPUS390XState *env = &cpu->env; >>>>>>>>>>> + S390PCIBusDevice *pbdev; >>>>>>>>>>> + uint8_t r1 = (run->s390_sieic.ipb & 0x00f00000) >> 20; >>>>>>>>>>> + uint8_t r2 = (run->s390_sieic.ipb & 0x000f0000) >> 16; >>>>>>>>>>> + PciLgStg *rp; >>>>>>>>>>> + uint64_t offset; >>>>>>>>>>> + uint64_t data; >>>>>>>>>>> + uint8_t len; >>>>>>>>>>> + >>>>>>>>>>> + cpu_synchronize_state(CPU(cpu)); >>>>>>>>>>> + >>>>>>>>>>> + if (env->psw.mask & PSW_MASK_PSTATE) { >>>>>>>>>>> + program_interrupt(env, PGM_PRIVILEGED, 4); >>>>>>>>>>> + return 0; >>>>>>>>>>> + } >>>>>>>>>>> + >>>>>>>>>>> + if (r2 & 0x1) { >>>>>>>>>>> + program_interrupt(env, PGM_SPECIFICATION, 4); >>>>>>>>>>> + return 0; >>>>>>>>>>> + } >>>>>>>>>>> + >>>>>>>>>>> + rp = (PciLgStg *)&env->regs[r2]; >>>>>>>>>>> + offset = env->regs[r2 + 1]; >>>>>>>>>>> + >>>>>>>>>>> + pbdev = s390_pci_find_dev_by_fh(rp->fh); >>>>>>>>>>> + if (!pbdev) { >>>>>>>>>>> + DPRINTF("pcilg no pci dev\n"); >>>>>>>>>>> + setcc(cpu, ZPCI_PCI_LS_INVAL_HANDLE); >>>>>>>>>>> + return 0; >>>>>>>>>>> + } >>>>>>>>>>> + >>>>>>>>>>> + len = rp->len & 0xF; >>>>>>>>>>> + if (rp->pcias < 6) { >>>>>>>>>>> + if ((8 - (offset & 0x7)) < len) { >>>>>>>>>>> + program_interrupt(env, PGM_OPERAND, 4); >>>>>>>>>>> + return 0; >>>>>>>>>>> + } >>>>>>>>>>> + MemoryRegion *mr = >>>>>>>>>>> pbdev->pdev->io_regions[rp->pcias].memory; >>>>>>>>>>> + io_mem_read(mr, offset, &data, len); >>>>>>>>>>> + } else if (rp->pcias == 15) { >>>>>>>>>>> + if ((4 - (offset & 0x3)) < len) { >>>>>>>>>>> + program_interrupt(env, PGM_OPERAND, 4); >>>>>>>>>>> + return 0; >>>>>>>>>>> + } >>>>>>>>>>> + data = pci_host_config_read_common( >>>>>>>>>>> + pbdev->pdev, offset, >>>>>>>>>>> pci_config_size(pbdev->pdev), len); >>>>>>>>>>> + >>>>>>>>>>> + switch (len) { >>>>>>>>>>> + case 1: >>>>>>>>>>> + break; >>>>>>>>>>> + case 2: >>>>>>>>>>> + data = cpu_to_le16(data); >>>>>>>>>>> + break; >>>>>>>>>>> + case 4: >>>>>>>>>>> + data = cpu_to_le32(data); >>>>>>>>>>> + break; >>>>>>>>>>> + case 8: >>>>>>>>>>> + data = cpu_to_le64(data); >>>>>>>>>>> + break; >>>>>>>>>> >>>>>>>>>> Why? Also, this is wrong. cpu_to_le64 convert between host endianness >>>>>>>>>> and LE. So if you're running this on an LE host, you won't swap the >>>>>>>>>> value and get a broken result. >>>>>>>>>> >>>>>>>>>> If you know that the value is always swapped, use bswapxx(). >>>>>>>>>> >>>>>>>>> >>>>>>>>> Actually the code is right and required for a big endian host :-) >>>>>>>>> pcilg/pcistg provide access to the PCI config space which is defined >>>>>>>>> as PCI byte order (little endian). Since pci_host_config_read_common >>>>>>>>> does >>>>>>>>> already a le to cpu conversion we have to convert back to PCI byte >>>>>>>>> order. >>>>>>>>> Doing an unconditional swap would be a bug on a little endian host. >>>>>>>> >>>>>>>> Why would it be a bug? The value you end up writing is contents of a >>>>>>>> register and thus doesn't have endianness. So if QEMU was an LE >>>>>>>> process, >>>>>>> >>>>>>> No, the s390 guest executing pcilg instruction expects to receive >>>>>>> config space data >>>>>>> in PCI byte order. >>>>>>> >>>>>>>> the value of data would be identical as on a BE QEMU before your swab. >>>>>>>> After the swab, it would be bswap'ed on BE, but not LE. So LE hosts >>>>>>>> break. >>>>>>>> >>>>>>> >>>>>>> Again on BE endian host we do the swap because of >>>>>>> pci_host_config_read_common does >>>>>>> read the value and do a byte swap for that value, but we need PCI byte >>>>>>> order not BE here. >>>>>>> >>>>>>> On LE host pci_host_config_read_common does not do a byte swap so we do >>>>>>> not have to >>>>>>> convert back to PCI byte order. >>>>>> >>>>>> We maintain the PCI config space always in LE byte order in memory, >>>>>> that's why there is a bwap in its read function. The return result of >>>>>> the read function however is always the same, regardless of LE or BE >>>>>> host. If I do a read of size 4, I will always get 0x1, not 0x01000000 >>>>>> returned. >>>>>> >>>>>> So now you need to convert that 0x1 into a 0x01000000 manually here >>>>>> because some architect thought that registers have endianness (which >>>>>> they don't). But you need to do it always, even on an LE host, because >>>>>> the pci config space return value is identical on LE and BE. >>>>>> >>>>> so you tell me pci_host_config_read_common does not end up in >>>>> pci_default_read_config? >>>>> >>>>> uint32_t pci_default_read_config(PCIDevice *d, >>>>> uint32_t address, int len) >>>>> { >>>>> uint32_t val = 0; >>>>> >>>>> memcpy(&val, d->config + address, len); >>>>> return le32_to_cpu(val); >>>>> } >>>>> >>>>> What did I miss? >>>> >>>> That's exactly where you end up in - and it's there to convert from the >>>> PCI config space backing storage to a native number. >>>> >>>> Imagine you write 0x12345678 at offset 0. Because PCI config space is >>>> defined to be LE, in the PCI config space memory this gets stored as >>>> >>>> 78 56 34 12 >>>> >>>> The reason we do the internal storage of the config space that way is >>>> that it's (in some PCI implementations) legal to access with single byte >>>> granularities. So you could do a pci_config_read(offset = 1) which >>>> should return 0x56. >>>> >>>> However, that means we completely nullify any effect of host endianness >>>> in the PCI config layer already. So if you do pci_config_write(offset = >>>> 0, size = 4, value = 0x12345678), the contents of d->config will always >>>> be identical, regardless of host endianness. The same holds true for >>>> pci_config_read(offset = 0, size = 4). It will always return 0x12345678. >>>> >>> >>> I understood this from the beginning and I completely agree to this. >>> >>>> In your code, you swab that value again. I assume there's a reason >>>> you're swapping it and that it's the way the architecture expects it >>> >>> Yes, s390 pcilg architecture states: >>> Data in the PCI configuration space are treated >>> as being in little-endian byte ordering >>> >>>> (mind to point me to the respective spec so I can verify?). But if the >>>> architecture expects it, then it expects it regardless of host >>>> endianness. The contents of regs[r1] should always be 0x78563412, no >>>> matter whether we're in an LE or a BE environment. >>>> >>>> Does that make sense now? >>>> >>> Absolutely lets make an example for qemu running on BE and LE >>> >>> byte order config space backing pci_default_read_config pcilg (with >>> cpu_to_le) >>> BE 0x78563412 0x12345678 0x78563412 >>> LE 0x78563412 0x78563412 0x78563412 >> >> No, pci_default_read_config() always returns 0x12345678 because it >> returns a register, not memory. > > So: > > config space pci_default_read_config pcilg > (bytes) memcpy cpu_to_le (with cpu_to_le) > BE 78 56 34 12 0x78563412 0x12345678 0x78563412 > LE 78 56 34 12 0x12345678 0x12345678 0x12345678 > > Right?
Yes, exactly :). Alex