On 11.07.2013, at 07:12, Alexey Kardashevskiy wrote: > On 07/10/2013 08:05 PM, Alexander Graf wrote: >> >> On 10.07.2013, at 07:00, Alexey Kardashevskiy wrote: >> >>> On 07/10/2013 03:02 AM, Alexander Graf wrote: >>>> On 07/06/2013 05:07 PM, Alexey Kardashevskiy wrote: >>>>> This adds real mode handlers for the H_PUT_TCE_INDIRECT and >>>>> H_STUFF_TCE hypercalls for QEMU emulated devices such as IBMVIO >>>>> devices or emulated PCI. These calls allow adding multiple entries >>>>> (up to 512) into the TCE table in one call which saves time on >>>>> transition to/from real mode. >>>> >>>> We don't mention QEMU explicitly in KVM code usually. >>>> >>>>> This adds a tce_tmp cache to kvm_vcpu_arch to save valid TCEs >>>>> (copied from user and verified) before writing the whole list into >>>>> the TCE table. This cache will be utilized more in the upcoming >>>>> VFIO/IOMMU support to continue TCE list processing in the virtual >>>>> mode in the case if the real mode handler failed for some reason. >>>>> >>>>> This adds a guest physical to host real address converter >>>>> and calls the existing H_PUT_TCE handler. The converting function >>>>> is going to be fully utilized by upcoming VFIO supporting patches. >>>>> >>>>> This also implements the KVM_CAP_PPC_MULTITCE capability, >>>>> so in order to support the functionality of this patch, QEMU >>>>> needs to query for this capability and set the "hcall-multi-tce" >>>>> hypertas property only if the capability is present, otherwise >>>>> there will be serious performance degradation. >>>> >>>> Same as above. But really you're only giving recommendations here. What's >>>> the point? Please describe what the benefit of this patch is, not what some >>>> other random subsystem might do with the benefits it brings. >>>> >>>>> >>>>> Signed-off-by: Paul Mackerras<pau...@samba.org> >>>>> Signed-off-by: Alexey Kardashevskiy<a...@ozlabs.ru> >>>>> >>>>> --- >>>>> Changelog: >>>>> 2013/07/06: >>>>> * fixed number of wrong get_page()/put_page() calls >>>>> >>>>> 2013/06/27: >>>>> * fixed clear of BUSY bit in kvmppc_lookup_pte() >>>>> * H_PUT_TCE_INDIRECT does realmode_get_page() now >>>>> * KVM_CAP_SPAPR_MULTITCE now depends on CONFIG_PPC_BOOK3S_64 >>>>> * updated doc >>>>> >>>>> 2013/06/05: >>>>> * fixed mistype about IBMVIO in the commit message >>>>> * updated doc and moved it to another section >>>>> * changed capability number >>>>> >>>>> 2013/05/21: >>>>> * added kvm_vcpu_arch::tce_tmp >>>>> * removed cleanup if put_indirect failed, instead we do not even start >>>>> writing to TCE table if we cannot get TCEs from the user and they are >>>>> invalid >>>>> * kvmppc_emulated_h_put_tce is split to kvmppc_emulated_put_tce >>>>> and kvmppc_emulated_validate_tce (for the previous item) >>>>> * fixed bug with failthrough for H_IPI >>>>> * removed all get_user() from real mode handlers >>>>> * kvmppc_lookup_pte() added (instead of making lookup_linux_pte public) >>>>> >>>>> Signed-off-by: Alexey Kardashevskiy<a...@ozlabs.ru> >>>>> --- >>>>> Documentation/virtual/kvm/api.txt | 25 +++ >>>>> arch/powerpc/include/asm/kvm_host.h | 9 ++ >>>>> arch/powerpc/include/asm/kvm_ppc.h | 16 +- >>>>> arch/powerpc/kvm/book3s_64_vio.c | 154 ++++++++++++++++++- >>>>> arch/powerpc/kvm/book3s_64_vio_hv.c | 260 >>>>> ++++++++++++++++++++++++++++---- >>>>> arch/powerpc/kvm/book3s_hv.c | 41 ++++- >>>>> arch/powerpc/kvm/book3s_hv_rmhandlers.S | 6 + >>>>> arch/powerpc/kvm/book3s_pr_papr.c | 37 ++++- >>>>> arch/powerpc/kvm/powerpc.c | 3 + >>>>> 9 files changed, 517 insertions(+), 34 deletions(-) >>>>> >>>>> diff --git a/Documentation/virtual/kvm/api.txt >>>>> b/Documentation/virtual/kvm/api.txt >>>>> index 6365fef..762c703 100644 >>>>> --- a/Documentation/virtual/kvm/api.txt >>>>> +++ b/Documentation/virtual/kvm/api.txt >>>>> @@ -2362,6 +2362,31 @@ calls by the guest for that service will be passed >>>>> to userspace to be >>>>> handled. >>>>> >>>>> >>>>> +4.86 KVM_CAP_PPC_MULTITCE >>>>> + >>>>> +Capability: KVM_CAP_PPC_MULTITCE >>>>> +Architectures: ppc >>>>> +Type: vm >>>>> + >>>>> +This capability means the kernel is capable of handling hypercalls >>>>> +H_PUT_TCE_INDIRECT and H_STUFF_TCE without passing those into the user >>>>> +space. This significanly accelerates DMA operations for PPC KVM guests. >>>> >>>> significanly? Please run this through a spell checker. >>>> >>>>> +The user space should expect that its handlers for these hypercalls >>>> >>>> s/The// >>>> >>>>> +are not going to be called. >>>> >>>> Is user space guaranteed they will not be called? Or can it still happen? >>> >>> ... if user space previously registered LIOBN in KVM (via >>> KVM_CREATE_SPAPR_TCE or similar calls). >>> >>> ok? >> >> How about this? >> >> The hypercalls mentioned above may or may not be processed successfully in >> the kernel based fast path. If they can not be handled by the kernel, they >> will get passed on to user space. So user space still has to have an >> implementation for these despite the in kernel acceleration. >> >> --- >> >> The target audience for this documentation is user space KVM API users. >> Someone developing kvm tool for example. They want to know implications >> specific CAPs have. >> >>> >>> There is also KVM_CREATE_SPAPR_TCE_IOMMU but it is not in the kernel yet >>> and may never get there. >>> >>> >>>>> +In order to enable H_PUT_TCE_INDIRECT and H_STUFF_TCE use in the guest, >>>>> +the user space might have to advertise it for the guest. For example, >>>>> +IBM pSeries guest starts using them if "hcall-multi-tce" is present in >>>>> +the "ibm,hypertas-functions" device-tree property. >>>> >>>> This paragraph describes sPAPR. That's fine, but please document it as >>>> such. Also please check your grammar. >>> >>>>> + >>>>> +Without this capability, only H_PUT_TCE is handled by the kernel and >>>>> +therefore the use of H_PUT_TCE_INDIRECT and H_STUFF_TCE is not >>>>> recommended >>>>> +unless the capability is present as passing hypercalls to the userspace >>>>> +slows operations a lot. >>>>> + >>>>> +Unlike other capabilities of this section, this one is always enabled. >>>> >>>> Why? Wouldn't that confuse older user space? >>> >>> >>> How? Old user space won't check for this capability and won't tell the >>> guest to use it (via "hcall-multi-tce"). Old H_PUT_TCE is still there. >>> >>> If the guest always uses H_PUT_TCE_INDIRECT/H_STUFF_TCE no matter what, >>> then it is its problem - it won't work now anyway as neither QEMU nor host >>> kernel supports these calls. > > >> Always assume that you are a kernel developer without knowledge >> of any user space code using your interfaces. So there is the theoretical >> possibility that there is a user space client out there that implements >> H_PUT_TCE_INDIRECT and advertises hcall-multi-tce to the guest. >> Would that client break? If so, we should definitely have >> the CAP disabled by default. > > > No, it won't break. Why would it break? I really do not get it. This user > space client has to do an extra step to get this acceleration by calling > ioctl(KVM_CREATE_SPAPR_TCE) anyway. Previously that ioctl only had effect > on H_PUT_TCE, now on all three hcalls.
Hrm. It's a change of behavior, it probably wouldn't break, yes. > > >> But really, it's also as much about consistency as anything else. >> If we leave everything as is and always extend functionality >> by enabling new CAPs, we're pretty much guaranteed that we >> don't break anything by accident. It also makes debugging easier >> because you can for example disable this particular feature >> to see whether something has bad side effects. > > > So I must add one more ioctl to enable MULTITCE in kernel handling. Is it > what you are saying? > > I can see KVM_CHECK_EXTENSION but I do not see KVM_ENABLE_EXTENSION or > anything like that. KVM_ENABLE_CAP. It's how we enable sPAPR capabilities too. > > > >>>>> + >>>>> + >>>>> 5. The kvm_run structure >>>>> ------------------------ >>>>> >>>>> diff --git a/arch/powerpc/include/asm/kvm_host.h >>>>> b/arch/powerpc/include/asm/kvm_host.h >>>>> index af326cd..20d04bd 100644 >>>>> --- a/arch/powerpc/include/asm/kvm_host.h >>>>> +++ b/arch/powerpc/include/asm/kvm_host.h >>>>> @@ -180,6 +180,7 @@ struct kvmppc_spapr_tce_table { >>>>> struct kvm *kvm; >>>>> u64 liobn; >>>>> u32 window_size; >>>>> + struct { struct { unsigned long put, indir, stuff; } rm, vm; } stat; >>>> >>>> You don't need this. >>>> >>>>> struct page *pages[0]; >>>>> }; >>>>> >>>>> @@ -609,6 +610,14 @@ struct kvm_vcpu_arch { >>>>> spinlock_t tbacct_lock; >>>>> u64 busy_stolen; >>>>> u64 busy_preempt; >>>>> + >>>>> + unsigned long *tce_tmp_hpas; /* TCE cache for TCE_PUT_INDIRECT >>>>> hcall */ >>>>> + enum { >>>>> + TCERM_NONE, >>>>> + TCERM_GETPAGE, >>>>> + TCERM_PUTTCE, >>>>> + TCERM_PUTLIST, >>>>> + } tce_rm_fail; /* failed stage of request processing */ >>>>> #endif >>>>> }; >>>>> >>>>> diff --git a/arch/powerpc/include/asm/kvm_ppc.h >>>>> b/arch/powerpc/include/asm/kvm_ppc.h >>>>> index a5287fe..fa722a0 100644 >>>>> --- a/arch/powerpc/include/asm/kvm_ppc.h >>>>> +++ b/arch/powerpc/include/asm/kvm_ppc.h >>>>> @@ -133,8 +133,20 @@ extern int kvmppc_pseries_do_hcall(struct kvm_vcpu >>>>> *vcpu); >>>>> >>>>> extern long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm, >>>>> struct kvm_create_spapr_tce *args); >>>>> -extern long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn, >>>>> - unsigned long ioba, unsigned long tce); >>>>> +extern struct kvmppc_spapr_tce_table *kvmppc_find_tce_table( >>>>> + struct kvm_vcpu *vcpu, unsigned long liobn); >>>>> +extern long kvmppc_emulated_validate_tce(unsigned long tce); >>>>> +extern void kvmppc_emulated_put_tce(struct kvmppc_spapr_tce_table *tt, >>>>> + unsigned long ioba, unsigned long tce); >>>>> +extern long kvmppc_vm_h_put_tce(struct kvm_vcpu *vcpu, >>>>> + unsigned long liobn, unsigned long ioba, >>>>> + unsigned long tce); >>>>> +extern long kvmppc_vm_h_put_tce_indirect(struct kvm_vcpu *vcpu, >>>>> + unsigned long liobn, unsigned long ioba, >>>>> + unsigned long tce_list, unsigned long npages); >>>>> +extern long kvmppc_vm_h_stuff_tce(struct kvm_vcpu *vcpu, >>>>> + unsigned long liobn, unsigned long ioba, >>>>> + unsigned long tce_value, unsigned long npages); >>>>> extern long kvm_vm_ioctl_allocate_rma(struct kvm *kvm, >>>>> struct kvm_allocate_rma *rma); >>>>> extern struct kvmppc_linear_info *kvm_alloc_rma(void); >>>>> diff --git a/arch/powerpc/kvm/book3s_64_vio.c >>>>> b/arch/powerpc/kvm/book3s_64_vio.c >>>>> index b2d3f3b..99bf4e5 100644 >>>>> --- a/arch/powerpc/kvm/book3s_64_vio.c >>>>> +++ b/arch/powerpc/kvm/book3s_64_vio.c >>>>> @@ -14,6 +14,7 @@ >>>>> * >>>>> * Copyright 2010 Paul Mackerras, IBM Corp.<pau...@au1.ibm.com> >>>>> * Copyright 2011 David Gibson, IBM Corporation<d...@au1.ibm.com> >>>>> + * Copyright 2013 Alexey Kardashevskiy, IBM Corporation<a...@au1.ibm.com> >>>>> */ >>>>> >>>>> #include<linux/types.h> >>>>> @@ -36,8 +37,10 @@ >>>>> #include<asm/ppc-opcode.h> >>>>> #include<asm/kvm_host.h> >>>>> #include<asm/udbg.h> >>>>> +#include<asm/iommu.h> >>>>> +#include<asm/tce.h> >>>>> >>>>> -#define TCES_PER_PAGE (PAGE_SIZE / sizeof(u64)) >>>>> +#define ERROR_ADDR ((void *)~(unsigned long)0x0) >>>>> >>>>> static long kvmppc_stt_npages(unsigned long window_size) >>>>> { >>>>> @@ -50,6 +53,20 @@ static void release_spapr_tce_table(struct >>>>> kvmppc_spapr_tce_table *stt) >>>>> struct kvm *kvm = stt->kvm; >>>>> int i; >>>>> >>>>> +#define __SV(x) stt->stat.x >>>>> +#define __SVD(x) (__SV(rm.x)?(__SV(rm.x)-__SV(vm.x)):0) >>>>> + pr_debug("%s stat for liobn=%llx\n" >>>>> + "--------------- realmode ----- virtmode ---\n" >>>>> + "put_tce %10ld %10ld\n" >>>>> + "put_tce_indir %10ld %10ld\n" >>>>> + "stuff_tce %10ld %10ld\n", >>>>> + __func__, stt->liobn, >>>>> + __SVD(put), __SV(vm.put), >>>>> + __SVD(indir), __SV(vm.indir), >>>>> + __SVD(stuff), __SV(vm.stuff)); >>>>> +#undef __SVD >>>>> +#undef __SV >>>> >>>> All of these stat points should just be trace points. You can do the >>>> statistic gathering from user space then. >>>> >>>>> + >>>>> mutex_lock(&kvm->lock); >>>>> list_del(&stt->list); >>>>> for (i = 0; i< kvmppc_stt_npages(stt->window_size); i++) >>>>> @@ -148,3 +165,138 @@ fail: >>>>> } >>>>> return ret; >>>>> } >>>>> + >>>>> +/* Converts guest physical address to host virtual address */ >>>>> +static void __user *kvmppc_vm_gpa_to_hva_and_get(struct kvm_vcpu *vcpu, >>>> >>>> Please don't distinguish _vm versions. They're the normal case. _rm ones >>>> are the special ones. >>>> >>>>> + unsigned long gpa, struct page **pg) >>>>> +{ >>>>> + unsigned long hva, gfn = gpa>> PAGE_SHIFT; >>>>> + struct kvm_memory_slot *memslot; >>>>> + >>>>> + memslot = search_memslots(kvm_memslots(vcpu->kvm), gfn); >>>>> + if (!memslot) >>>>> + return ERROR_ADDR; >>>>> + >>>>> + hva = __gfn_to_hva_memslot(memslot, gfn) + (gpa& ~PAGE_MASK); >>>> >>>> s/+/|/ >>>> >>>>> + >>>>> + if (get_user_pages_fast(hva& PAGE_MASK, 1, 0, pg) != 1) >>>>> + return ERROR_ADDR; >>>>> + >>>>> + return (void *) hva; >>>>> +} >>>>> + >>>>> +long kvmppc_vm_h_put_tce(struct kvm_vcpu *vcpu, >>>>> + unsigned long liobn, unsigned long ioba, >>>>> + unsigned long tce) >>>>> +{ >>>>> + long ret; >>>>> + struct kvmppc_spapr_tce_table *tt; >>>>> + >>>>> + tt = kvmppc_find_tce_table(vcpu, liobn); >>>>> + /* Didn't find the liobn, put it to userspace */ >>>> >>>> Unclear comment. >>> >>> >>> What detail is missing? >> > >> Grammar wise "it" in the second half of the sentence refers to liobn. >> So you "put" the "liobn to userspace". That sentence doesn't >> make any sense. > > > Removed it. H_TOO_HARD itself says enough already. > > >> What you really want to say is: >> >> /* Couldn't find the liobn. Something went wrong. Let user space handle the >> hypercall. That has better ways of dealing with errors. */ >> >>> >>> >>>>> + if (!tt) >>>>> + return H_TOO_HARD; >>>>> + >>>>> + ++tt->stat.vm.put; >>>>> + >>>>> + if (ioba>= tt->window_size) >>>>> + return H_PARAMETER; >>>>> + >>>>> + ret = kvmppc_emulated_validate_tce(tce); >>>>> + if (ret) >>>>> + return ret; >>>>> + >>>>> + kvmppc_emulated_put_tce(tt, ioba, tce); >>>>> + >>>>> + return H_SUCCESS; >>>>> +} >>>>> + >>>>> +long kvmppc_vm_h_put_tce_indirect(struct kvm_vcpu *vcpu, >>>>> + unsigned long liobn, unsigned long ioba, >>>>> + unsigned long tce_list, unsigned long npages) >>>>> +{ >>>>> + struct kvmppc_spapr_tce_table *tt; >>>>> + long i, ret = H_SUCCESS; >>>>> + unsigned long __user *tces; >>>>> + struct page *pg = NULL; >>>>> + >>>>> + tt = kvmppc_find_tce_table(vcpu, liobn); >>>>> + /* Didn't find the liobn, put it to userspace */ >>>>> + if (!tt) >>>>> + return H_TOO_HARD; >>>>> + >>>>> + ++tt->stat.vm.indir; >>>>> + >>>>> + /* >>>>> + * The spec says that the maximum size of the list is 512 TCEs so >>>>> + * so the whole table addressed resides in 4K page >>>> >>>> so so? >>>> >>>>> + */ >>>>> + if (npages> 512) >>>>> + return H_PARAMETER; >>>>> + >>>>> + if (tce_list& ~IOMMU_PAGE_MASK) >>>>> + return H_PARAMETER; >>>>> + >>>>> + if ((ioba + (npages<< IOMMU_PAGE_SHIFT))> tt->window_size) >>>>> + return H_PARAMETER; >>>>> + >>>>> + tces = kvmppc_vm_gpa_to_hva_and_get(vcpu, tce_list,&pg); >>>>> + if (tces == ERROR_ADDR) >>>>> + return H_TOO_HARD; >>>>> + >>>>> + if (vcpu->arch.tce_rm_fail == TCERM_PUTLIST) >>>>> + goto put_list_page_exit; >>>>> + >>>>> + for (i = 0; i< npages; ++i) { >>>>> + if (get_user(vcpu->arch.tce_tmp_hpas[i], tces + i)) { >>>>> + ret = H_PARAMETER; >>>>> + goto put_list_page_exit; >>>>> + } >>>>> + >>>>> + ret = kvmppc_emulated_validate_tce(vcpu->arch.tce_tmp_hpas[i]); >>>>> + if (ret) >>>>> + goto put_list_page_exit; >>>>> + } >>>>> + >>>>> + for (i = 0; i< npages; ++i) >>>>> + kvmppc_emulated_put_tce(tt, ioba + (i<< IOMMU_PAGE_SHIFT), >>>>> + vcpu->arch.tce_tmp_hpas[i]); >>>>> +put_list_page_exit: >>>>> + if (pg) >>>>> + put_page(pg); >>>>> + >>>>> + if (vcpu->arch.tce_rm_fail != TCERM_NONE) { >>>>> + vcpu->arch.tce_rm_fail = TCERM_NONE; >>>>> + if (pg&& !PageCompound(pg)) >>>>> + put_page(pg); /* finish pending realmode_put_page() */ >>>>> + } >>>>> + >>>>> + return ret; >>>>> +} >>>>> + >>>>> +long kvmppc_vm_h_stuff_tce(struct kvm_vcpu *vcpu, >>>>> + unsigned long liobn, unsigned long ioba, >>>>> + unsigned long tce_value, unsigned long npages) >>>>> +{ >>>>> + struct kvmppc_spapr_tce_table *tt; >>>>> + long i, ret; >>>>> + >>>>> + tt = kvmppc_find_tce_table(vcpu, liobn); >>>>> + /* Didn't find the liobn, put it to userspace */ >>>>> + if (!tt) >>>>> + return H_TOO_HARD; >>>>> + >>>>> + ++tt->stat.vm.stuff; >>>>> + >>>>> + if ((ioba + (npages<< IOMMU_PAGE_SHIFT))> tt->window_size) >>>>> + return H_PARAMETER; >>>>> + >>>>> + ret = kvmppc_emulated_validate_tce(tce_value); >>>>> + if (ret || (tce_value& (TCE_PCI_WRITE | TCE_PCI_READ))) >>>>> + return H_PARAMETER; >>>>> + >>>>> + for (i = 0; i< npages; ++i, ioba += IOMMU_PAGE_SIZE) >>>>> + kvmppc_emulated_put_tce(tt, ioba, tce_value); >>>>> + >>>>> + return H_SUCCESS; >>>>> +} >>>>> diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c >>>>> b/arch/powerpc/kvm/book3s_64_vio_hv.c >>>>> index 30c2f3b..cd3e6f9 100644 >>>>> --- a/arch/powerpc/kvm/book3s_64_vio_hv.c >>>>> +++ b/arch/powerpc/kvm/book3s_64_vio_hv.c >>>>> @@ -14,6 +14,7 @@ >>>>> * >>>>> * Copyright 2010 Paul Mackerras, IBM Corp.<pau...@au1.ibm.com> >>>>> * Copyright 2011 David Gibson, IBM Corporation<d...@au1.ibm.com> >>>>> + * Copyright 2013 Alexey Kardashevskiy, IBM Corporation<a...@au1.ibm.com> >>>>> */ >>>>> >>>>> #include<linux/types.h> >>>>> @@ -35,42 +36,243 @@ >>>>> #include<asm/ppc-opcode.h> >>>>> #include<asm/kvm_host.h> >>>>> #include<asm/udbg.h> >>>>> +#include<asm/iommu.h> >>>>> +#include<asm/tce.h> >>>>> >>>>> #define TCES_PER_PAGE (PAGE_SIZE / sizeof(u64)) >>>>> +#define ERROR_ADDR (~(unsigned long)0x0) >>>>> >>>>> -/* WARNING: This will be called in real-mode on HV KVM and virtual >>>>> - * mode on PR KVM >>>> >>>> What's wrong with the warning? >>> >>> >>> It belongs to kvmppc_h_put_tce() which is not called in virtual mode >>> anymore. >> >> I thought the comment applied to the whole file before? Hrm. Maybe I misread >> it then. >> >>> It is technically correct for kvmppc_find_tce_table() though. Should I put >>> this comment before every function which may be called from real and >>> virtual modes? >> >> Yes, please. Otherwise someone might stick an access to a non-linear address >> in there by accident. >> >>> >>> >>> >>>>> +/* >>>>> + * Finds a TCE table descriptor by LIOBN >>>>> */ >>>>> +struct kvmppc_spapr_tce_table *kvmppc_find_tce_table(struct kvm_vcpu >>>>> *vcpu, >>>>> + unsigned long liobn) >>>>> +{ >>>>> + struct kvmppc_spapr_tce_table *tt; >>>>> + >>>>> + list_for_each_entry(tt,&vcpu->kvm->arch.spapr_tce_tables, list) { >>>>> + if (tt->liobn == liobn) >>>>> + return tt; >>>>> + } >>>>> + >>>>> + return NULL; >>>>> +} >>>>> +EXPORT_SYMBOL_GPL(kvmppc_find_tce_table); >>>>> + >>>>> +#ifdef DEBUG >>>>> +/* >>>>> + * Lets user mode disable realmode handlers by putting big number >>>>> + * in the bottom value of LIOBN >>>> >>>> What? Seriously? Just don't enable the CAP. >>> >>> >>> It is under DEBUG. It really, really helps to be able to disable real mode >>> handlers without reboot. Ok, no debug code, I'll remove. >> >> Debug code is good, but #ifdefs are bad. For you, an #ifdef reads like >> "code that doesn't do any hard when disabled". For me, #ifdefs read >> "code that definitely breaks because nobody turns the #define on". >> >> So please, avoid #ifdef'ed code whenever possible. Switching the CAP on and >> off is a much better debug approach in this case. >> >>> >>> >>>>> + */ >>>>> +#define kvmppc_find_tce_table(a, b) \ >>>>> + ((((b)&0xffff)>10000)?NULL:kvmppc_find_tce_table((a), (b))) >>>>> +#endif >>>>> + >>>>> +/* >>>>> + * Validates TCE address. >>>>> + * At the moment only flags are validated as other checks will >>>>> significantly slow >>>>> + * down or can make it even impossible to handle TCE requests in real >>>>> mode. >>>> >>>> What? >>> >>> >>> What is missing here (besides good english)? >> >> What badness could slip through by not validating everything? > > > I cannot think of any good check which could be done in real mode and not > be "more than 2 calls deep" (c) Ben. Check that the page is allocated at > all? How? Don't know. If you say that our validation doesn't validate everything, that makes me really weary. Could the guest use it to maliciously inject anything? Could a missing check make our code go berserk? What checks exactly would you do in addition when this was virtual mode? > > > >>>>> + */ >>>>> +long kvmppc_emulated_validate_tce(unsigned long tce) >>>> >>>> I don't like the naming scheme. Please turn this around and make it >>>> kvmppc_tce_validate(). >>> >>> >>> Oh. "Like"... Ok. >> >> Yes. Like. >> >>> >>> >>>>> +{ >>>>> + if (tce& ~(IOMMU_PAGE_MASK | TCE_PCI_WRITE | TCE_PCI_READ)) >>>>> + return H_PARAMETER; >>>>> + >>>>> + return H_SUCCESS; >>>>> +} >>>>> +EXPORT_SYMBOL_GPL(kvmppc_emulated_validate_tce); >>>>> + >>>>> +/* >>>>> + * Handles TCE requests for QEMU emulated devices. >>>> >>>> We still don't mention QEMU in KVM code. And does it really matter whether >>>> they're emulated by QEMU? Devices could also be emulated by KVM. >>>> >>>>> + * Puts guest TCE values to the table and expects QEMU to convert them >>>>> + * later in a QEMU device implementation. >>>>> + * Called in both real and virtual modes. >>>>> + * Cannot fail so kvmppc_emulated_validate_tce must be called before it. >>>>> + */ >>>>> +void kvmppc_emulated_put_tce(struct kvmppc_spapr_tce_table *tt, >>>> >>>> kvmppc_tce_put() >>>> >>>>> + unsigned long ioba, unsigned long tce) >>>>> +{ >>>>> + unsigned long idx = ioba>> SPAPR_TCE_SHIFT; >>>>> + struct page *page; >>>>> + u64 *tbl; >>>>> + >>>>> + /* >>>>> + * Note on the use of page_address() in real mode, >>>>> + * >>>>> + * It is safe to use page_address() in real mode on ppc64 because >>>>> + * page_address() is always defined as lowmem_page_address() >>>>> + * which returns __va(PFN_PHYS(page_to_pfn(page))) which is >>>>> arithmetial >>>>> + * operation and does not access page struct. >>>>> + * >>>>> + * Theoretically page_address() could be defined different >>>>> + * but either WANT_PAGE_VIRTUAL or HASHED_PAGE_VIRTUAL >>>>> + * should be enabled. >>>>> + * WANT_PAGE_VIRTUAL is never enabled on ppc32/ppc64, >>>>> + * HASHED_PAGE_VIRTUAL could be enabled for ppc32 only and only >>>>> + * if CONFIG_HIGHMEM is defined. As CONFIG_SPARSEMEM_VMEMMAP >>>>> + * is not expected to be enabled on ppc32, page_address() >>>>> + * is safe for ppc32 as well. >>>>> + */ >>>>> +#if defined(HASHED_PAGE_VIRTUAL) || defined(WANT_PAGE_VIRTUAL) >>>>> +#error TODO: fix to avoid page_address() here >>>>> +#endif >>>> >>>> Can you extract the text above, the check and the page_address call into a >>>> simple wrapper function? >>> >>> >>> Is this function also too big? Sorry, I do not understand the comment. >> >> All of the comment and #if here only deal with the fact that you >> have a real mode hack to call page_address() that happens >> to work under specific circumstances. >> >> There's nothing kvmppc_tce_put() specific about this. >> The page_address() code happens to get called here, sure. >> But if I read the kvmppc_tce_put() function I don't care about >> these details - I want to understand the code flow that ends >> up writing the TCE. >> >>>>> + page = tt->pages[idx / TCES_PER_PAGE]; >>>>> + tbl = (u64 *)page_address(page); >>>>> + >>>>> + /* udbg_printf("tce @ %p\n",&tbl[idx % TCES_PER_PAGE]); */ >>>> >>>> This is not an RFC, is it? >>> >>> >>> Any debug code is prohibited? Ok, I'll remove. >> >> Debug code that requires code changes is prohibited, yes. >> Debug code that is runtime switchable (pr_debug, trace points, etc) >> are allowed. > > > Is there any easy way to enable just this specific udbg_printf (not all of > them at once)? Trace points do not work in real mode as we figured out. You can enable pr_debug by file IIRC. > > >>>>> + tbl[idx % TCES_PER_PAGE] = tce; >>>>> +} >>>>> +EXPORT_SYMBOL_GPL(kvmppc_emulated_put_tce); >>>>> + >>>>> +#ifdef CONFIG_KVM_BOOK3S_64_HV >>>>> +/* >>>>> + * Converts guest physical address to host physical address. >>>>> + * Tries to increase page counter via realmode_get_page() and >>>>> + * returns ERROR_ADDR if failed. >>>>> + */ >>>>> +static unsigned long kvmppc_rm_gpa_to_hpa_and_get(struct kvm_vcpu *vcpu, >>>>> + unsigned long gpa, struct page **pg) >>>>> +{ >>>>> + struct kvm_memory_slot *memslot; >>>>> + pte_t *ptep, pte; >>>>> + unsigned long hva, hpa = ERROR_ADDR; >>>>> + unsigned long gfn = gpa>> PAGE_SHIFT; >>>>> + unsigned shift = 0; >>>>> + >>>>> + memslot = search_memslots(kvm_memslots(vcpu->kvm), gfn); >>>>> + if (!memslot) >>>>> + return ERROR_ADDR; >>>>> + >>>>> + hva = __gfn_to_hva_memslot(memslot, gfn); >>>>> + >>>>> + ptep = find_linux_pte_or_hugepte(vcpu->arch.pgdir, hva,&shift); >>>>> + if (!ptep || !pte_present(*ptep)) >>>>> + return ERROR_ADDR; >>>>> + pte = *ptep; >>>>> + >>>>> + if (((gpa& TCE_PCI_WRITE) || pte_write(pte))&& !pte_dirty(pte)) >>>>> + return ERROR_ADDR; >>>>> + >>>>> + if (!pte_young(pte)) >>>>> + return ERROR_ADDR; >>>>> + >>>>> + if (!shift) >>>>> + shift = PAGE_SHIFT; >>>>> + >>>>> + /* Put huge pages handling to the virtual mode */ >>>>> + if (shift> PAGE_SHIFT) >>>>> + return ERROR_ADDR; >>>>> + >>>>> + *pg = realmode_pfn_to_page(pte_pfn(pte)); >>>>> + if (!*pg || realmode_get_page(*pg)) >>>>> + return ERROR_ADDR; >>>>> + >>>>> + /* pte_pfn(pte) returns address aligned to pg_size */ >>>>> + hpa = (pte_pfn(pte)<< PAGE_SHIFT) + (gpa& ((1<< shift) - 1)); >>>>> + >>>>> + if (unlikely(pte_val(pte) != pte_val(*ptep))) { >>>>> + hpa = ERROR_ADDR; >>>>> + realmode_put_page(*pg); >>>>> + *pg = NULL; >>>>> + } >>>>> + >>>>> + return hpa; >>>>> +} >>>>> + >>>>> long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn, >>>>> unsigned long ioba, unsigned long tce) >>>>> { >>>>> - struct kvm *kvm = vcpu->kvm; >>>>> - struct kvmppc_spapr_tce_table *stt; >>>>> - >>>>> - /* udbg_printf("H_PUT_TCE(): liobn=0x%lx ioba=0x%lx, tce=0x%lx\n", */ >>>>> - /* liobn, ioba, tce); */ >>>>> - >>>>> - list_for_each_entry(stt,&kvm->arch.spapr_tce_tables, list) { >>>>> - if (stt->liobn == liobn) { >>>>> - unsigned long idx = ioba>> SPAPR_TCE_SHIFT; >>>>> - struct page *page; >>>>> - u64 *tbl; >>>>> - >>>>> - /* udbg_printf("H_PUT_TCE: liobn 0x%lx => stt=%p >>>>> window_size=0x%x\n", */ >>>>> - /* liobn, stt, stt->window_size); */ >>>>> - if (ioba>= stt->window_size) >>>>> - return H_PARAMETER; >>>>> - >>>>> - page = stt->pages[idx / TCES_PER_PAGE]; >>>>> - tbl = (u64 *)page_address(page); >>>>> - >>>>> - /* FIXME: Need to validate the TCE itself */ >>>>> - /* udbg_printf("tce @ %p\n",&tbl[idx % TCES_PER_PAGE]); */ >>>>> - tbl[idx % TCES_PER_PAGE] = tce; >>>>> - return H_SUCCESS; >>>>> - } >>>>> + long ret; >>>>> + struct kvmppc_spapr_tce_table *tt = kvmppc_find_tce_table(vcpu, >>>>> liobn); >>>>> + >>>>> + if (!tt) >>>>> + return H_TOO_HARD; >>>>> + >>>>> + ++tt->stat.rm.put; >>>>> + >>>>> + if (ioba>= tt->window_size) >>>>> + return H_PARAMETER; >>>>> + >>>>> + ret = kvmppc_emulated_validate_tce(tce); >>>>> + if (!ret) >>>>> + kvmppc_emulated_put_tce(tt, ioba, tce); >>>>> + >>>>> + return ret; >>>>> +} >>>>> + >>>>> +long kvmppc_h_put_tce_indirect(struct kvm_vcpu *vcpu, >>>> >>>> So the _vm version is the normal one and this is the _rm version? If so, >>>> please mark it as such. Is there any way to generate both from the same >>>> source? The way it's now there is a lot of duplicate code. >>> >>> >>> I tried, looked very ugly. If you insist, I will do so. >> > >> If it looks ugly better don't. I just want to make sure you explored the >> option. >> But please keep the naming scheme consistent. > > > Removed _vm everywhere and put _rm in realmode handlers. I just was > confused by _vm in kvm_vm_ioctl_create_spapr_tce() at the first place. That vm refers to the virtual machine. It's on VM scope, not VCPU scope. Alex > > > -- > Alexey _______________________________________________ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev