Re: [Xen-devel] [PATCH v2 2/2] ns16550: enable use of PCI MSI
>>> On 29.11.18 at 18:33, wrote: > On Mon, Oct 01, 2018 at 10:26:05AM -0600, Jan Beulich wrote: >> --- a/xen/arch/x86/msi.c >> +++ b/xen/arch/x86/msi.c >> @@ -742,6 +742,16 @@ static int msi_capability_init(struct pc >> >> *desc = entry; >> /* Restore the original MSI enabled bits */ >> +if ( !hardware_domain ) > > Wouldn't it be better to assign the device to dom_xen (pdev->domain = > dom_xen), and then check if the owner is dom_xen here? I'm not sure this couldn't be wrong in the general case (and we sit on a generic code path here): It depends on whether Dom0 can modify the device's config space, and I wouldn't want to (here) introduce a connection between dom_xen ownership and whether Dom0 can control INTX. The comment below here is specifically worded to the effect of why I use hardware_domain here. If we ever get into the situation of wanting to enable MSI on an internally used device _after_ Dom0 has started, this would need careful re-considering. > Or at the point where this is called from the serial console driver is > too early for dom_xen to exist? ns16550_init_postirq() is where both MSI setup and hiding of the device happen, so in principle this would seem to be possible for the specific case of a serial card. >> @@ -777,8 +777,65 @@ static void __init ns16550_init_postirq( >> uart->ps_bdf[0], uart->ps_bdf[1], >> uart->ps_bdf[2]); >> } >> + >> +if ( uart->msi ) >> +{ >> +struct msi_info msi = { >> +.bus = uart->ps_bdf[0], >> +.devfn = PCI_DEVFN(uart->ps_bdf[1], uart->ps_bdf[2]), >> +.irq = rc = uart->irq, >> +.entry_nr = 1 >> +}; >> + >> +if ( rc > 0 ) >> +{ >> +struct msi_desc *msi_desc = NULL; >> + >> +pcidevs_lock(); >> + >> +rc = pci_enable_msi(&msi, &msi_desc); > > Before attempting to enable MSI, shouldn't you make sure memory > decoding is enabled in case the device uses MSIX? > > I think this code assumes the device will always use MSI? (in which > case my above question is likely moot). I've yet to see serial cards using MSI-X. If we get to the point where this is needed, we also may need to first set up the BAR for the MSI-X table. Furthermore pci_enable_msi() won't even try to enable MSI-X with msi->table_base being zero. >> --- a/xen/drivers/pci/pci.c >> +++ b/xen/drivers/pci/pci.c >> @@ -115,6 +115,21 @@ int pci_find_next_ext_capability(int seg >> return 0; >> } >> >> +void pci_intx(const struct pci_dev *pdev, bool_t enable) > > Please use bool. See how old this patch is. V1 was posted long before bool came into existence, and I had refrained from posting v2 until I actually had a device where MSI would indeed function (the first two I tried this with claimed to be MSI capable, but no interrupts ever surfaced when MSI was enabled on them, yet I couldn't be sure the code was doing something wrong). Obviously I then forgot to switch this, which I've now done. Jan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v4 4/6] microcode: don't call apply_microcode() in cpu_request_microcode()
On Thu, Nov 29, 2018 at 10:46:05AM +0100, Roger Pau Monné wrote: >On Thu, Nov 29, 2018 at 12:28:46PM +0800, Chao Gao wrote: >> On Wed, Nov 28, 2018 at 04:02:25PM +0100, Roger Pau Monné wrote: >> >On Wed, Nov 28, 2018 at 01:34:14PM +0800, Chao Gao wrote: >> >> diff --git a/xen/arch/x86/microcode.c b/xen/arch/x86/microcode.c >> >> index 8350d22..cca7b2c 100644 >> >> --- a/xen/arch/x86/microcode.c >> >> +++ b/xen/arch/x86/microcode.c >> >> @@ -233,20 +233,12 @@ int microcode_resume_cpu(unsigned int cpu) >> >> return err; >> >> } >> >> >> >> -static int microcode_update_cpu(const void *buf, size_t size) >> >> +static int microcode_update_cpu(void) >> >> { >> >> int err; >> >> -unsigned int cpu = smp_processor_id(); >> >> -struct ucode_cpu_info *uci = &per_cpu(ucode_cpu_info, cpu); >> >> >> >> spin_lock(µcode_mutex); >> >> - >> >> -err = microcode_ops->collect_cpu_info(cpu, &uci->cpu_sig); >> >> -if ( likely(!err) ) >> >> -err = microcode_ops->cpu_request_microcode(cpu, buf, size); >> >> -else >> >> -__microcode_fini_cpu(cpu); >> >> - >> >> +err = microcode_ops->apply_microcode(smp_processor_id()); >> >> spin_unlock(µcode_mutex); >> >> >> >> return err; >> >> @@ -259,7 +251,7 @@ static long do_microcode_update(void *_info) >> >> >> >> BUG_ON(info->cpu != smp_processor_id()); >> >> >> >> -error = microcode_update_cpu(info->buffer, info->buffer_size); >> >> +error = microcode_update_cpu(); >> > >> >Why don't you just set info->error = microcode_update_cpu()? >> > >> >AFAICT this is done to attempt to update the remaining CPUs if one >> >update failed? >> >> Yes. But this patch doesn't change the logic here. Actually, if HT is >> enabled and microcode is shared between the logical threads of the same >> core, so if one thread updates microcode successfully, its sibling would >> always fail in current logic. I am trying to explain why we cannot abort >> the update even though an error is met in current logic. It definitely >> can be solved by tweaking the logic slightly. >> >> > >> >Is there anyway to rollback to the previous state so all CPUs have the >> >same microcode? >> >> Seems it is not allowed to load a microcode with numeratically smaller >> revision according to 9.11.7.2. >> >> With patch 6, a panic() would be triggered if some cpus failed to do the >> update. I didn't try to change the logic here. >> >> >I assume nothing good will come out of running a >> >system with CPUs using different microcode versions, but maybe that's >> >not so bad? >> >> It is better that all CPUs have the same microcode revision. >> >> Linux kernel rejects late microcode update if finding some CPUs >> offlined. I may port this patch to Xen too in a separate patch. > >What happens with hotplug CPUs? > >Even if you disable late loading when there are offlined CPUs you >still need to handle hotplug CPUs, which IMO should share the same >path with offlined CPUs: the microcode update should be done ASAP >after bringing the CPU up. > In linux, CPU's being offline is just a logical offline. It may participate actions like MCE. It would lead to a situation that some cpus are using old microcode while some are using the new one, which introduces instability. CPU hotplug doesn't have such issue. >> > >> >> if ( error ) >> >> info->error = error; >> >> >> >> @@ -276,6 +268,8 @@ int >> >> microcode_update(XEN_GUEST_HANDLE_PARAM(const_void) buf, unsigned long >> >> len) >> >> { >> >> int ret; >> >> struct microcode_info *info; >> >> +unsigned int cpu = smp_processor_id(); >> >> +struct ucode_cpu_info *uci = &per_cpu(ucode_cpu_info, cpu); >> >> >> >> if ( len != (uint32_t)len ) >> >> return -E2BIG; >> >> @@ -294,10 +288,6 @@ int >> >> microcode_update(XEN_GUEST_HANDLE_PARAM(const_void) buf, unsigned long >> >> len) >> >> return ret; >> >> } >> >> >> >> -info->buffer_size = len; >> >> -info->error = 0; >> >> -info->cpu = cpumask_first(&cpu_online_map); >> >> - >> >> if ( microcode_ops->start_update ) >> >> { >> >> ret = microcode_ops->start_update(); >> >> @@ -308,6 +298,26 @@ int >> >> microcode_update(XEN_GUEST_HANDLE_PARAM(const_void) buf, unsigned long >> >> len) >> >> } >> >> } >> >> >> >> +spin_lock(µcode_mutex); >> >> + >> >> +ret = microcode_ops->collect_cpu_info(cpu, &uci->cpu_sig); >> >> +if ( likely(!ret) ) >> >> +ret = microcode_ops->cpu_request_microcode(cpu, info->buffer, >> >> len); >> >> +else >> >> +__microcode_fini_cpu(cpu); >> >> + >> >> +spin_unlock(µcode_mutex); >> > >> >Why do you need to hold the lock here? >> > >> >microcode_update is already serialized and should only be executed on >> >a CPU at a time due to the usage of chained >> >continue_hypercall_on_cpu. >> >> microcode_resume_cpu() also uses the 'uci' and the global microcode cache. >> This lock is to preve
Re: [Xen-devel] [PATCH v4 6/6] x86/microcode: Synchronize late microcode loading
On Thu, Nov 29, 2018 at 10:56:53AM +0100, Roger Pau Monné wrote: >On Thu, Nov 29, 2018 at 12:43:25PM +0800, Chao Gao wrote: >> On Wed, Nov 28, 2018 at 04:22:09PM +0100, Roger Pau Monné wrote: >> >On Wed, Nov 28, 2018 at 01:34:16PM +0800, Chao Gao wrote: >> >> This patch ports microcode improvement patches from linux kernel. >> >> >> >> Before you read any further: the early loading method is still the >> >> preferred one and you should always do that. The following patch is >> >> improving the late loading mechanism for long running jobs and cloud use >> >> cases. >> >> >> >> Gather all cores and serialize the microcode update on them by doing it >> >> one-by-one to make the late update process as reliable as possible and >> >> avoid potential issues caused by the microcode update. >> >> >> >> Signed-off-by: Chao Gao >> >> Tested-by: Chao Gao >> >> [linux commit: a5321aec6412b20b5ad15db2d6b916c05349dbff] >> >> [linux commit: bb8c13d61a629276a162c1d2b1a20a815cbcfbb7] >> > >> >If this patch is the squash of two Linux commits, please post the >> >ported versions of the two commits separately. >> >> I don't understand this one. > >You reference two Linux commits above, why is this done? > >I assume this is because you are porting two Linux commits to Xen, in >which case I think that should be done in two different patches, or a >note needs to be added to why you merge two Linux commits into a >single Xen patch. The latter fixed a severe bug introduced the first one. Maybe I need to add a note to clarify this. Thanks Chao ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v5 01/13] x86: reduce general stack alignment to 8
>>> On 29.11.18 at 18:44, wrote: > On Thu, Nov 08, 2018 at 09:05:45AM -0700, Jan Beulich wrote: >> --- a/xen/arch/x86/efi/Makefile >> +++ b/xen/arch/x86/efi/Makefile >> @@ -5,7 +5,11 @@ CFLAGS += -fshort-wchar >> >> boot.init.o: buildid.o >> >> +EFIOBJ := boot.init.o compat.o runtime.o >> + >> +$(EFIOBJ): CFLAGS-stack-boundary := -mpreferred-stack-boundary=4 > > From gcc's manual on -mincoming-stack-boundary: > > "Thus calling a function compiled with a higher preferred stack boundary > from a function compiled with a lower preferred stack boundary most > likely misaligns the stack." > > I notice runtime.o now has stack alignment of 2^4 while the rest of xen > has 2^3. > > There is at least one example (efi_get_time) that could misalign the > stack. Is that okay? It would not be okay if the runtime call machinery wouldn't force 32-byte alignment of the stack. See the declaration of struct efi_rs_state, an instance of which gets put on the stack of every function making runtime calls. Also note how this is no different from prior to this change, as explained by the comment in that structure declaration, except that instead of always running on a reliably mis-aligned stack we will now run on a mixture (hence the code [and stack] size savings mentioned in the description). Jan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [linux-4.9 test] 130840: regressions - FAIL
flight 130840 linux-4.9 real [real] http://logs.test-lab.xenproject.org/osstest/logs/130840/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: test-amd64-amd64-xl-xsm 7 xen-boot fail REGR. vs. 130142 test-amd64-i386-xl-qemuu-debianhvm-amd64-shadow 16 guest-localmigrate/x10 fail REGR. vs. 130142 test-armhf-armhf-libvirt-raw 15 guest-start/debian.repeat fail REGR. vs. 130142 Regressions which are regarded as allowable (not blocking): test-armhf-armhf-xl-rtds 12 guest-start fail REGR. vs. 130142 Tests which did not succeed, but are not blocking: test-amd64-i386-xl-qemut-win7-amd64 17 guest-stop fail like 130142 test-amd64-amd64-xl-qemut-win7-amd64 17 guest-stopfail like 130142 test-amd64-amd64-xl-qemuu-win7-amd64 17 guest-stopfail like 130142 test-amd64-i386-xl-qemuu-win7-amd64 17 guest-stop fail like 130142 test-amd64-amd64-xl-qemuu-ws16-amd64 17 guest-stopfail like 130142 test-amd64-amd64-xl-qemuu-dmrestrict-amd64-dmrestrict 10 debian-hvm-install fail never pass test-amd64-amd64-xl-pvhv2-intel 12 guest-start fail never pass test-amd64-amd64-libvirt 13 migrate-support-checkfail never pass test-amd64-i386-xl-qemuu-dmrestrict-amd64-dmrestrict 10 debian-hvm-install fail never pass test-amd64-amd64-xl-pvhv2-amd 12 guest-start fail never pass test-amd64-amd64-libvirt-xsm 13 migrate-support-checkfail never pass test-amd64-i386-xl-pvshim12 guest-start fail never pass test-amd64-i386-libvirt 13 migrate-support-checkfail never pass test-amd64-i386-libvirt-xsm 13 migrate-support-checkfail never pass test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check fail never pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check fail never pass test-armhf-armhf-xl-arndale 13 migrate-support-checkfail never pass test-armhf-armhf-xl-arndale 14 saverestore-support-checkfail never pass test-amd64-amd64-qemuu-nested-amd 17 debian-hvm-install/l1/l2 fail never pass test-amd64-amd64-libvirt-vhd 12 migrate-support-checkfail never pass test-armhf-armhf-libvirt 13 migrate-support-checkfail never pass test-armhf-armhf-libvirt 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-cubietruck 13 migrate-support-checkfail never pass test-armhf-armhf-xl-cubietruck 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit2 13 migrate-support-checkfail never pass test-armhf-armhf-xl-credit2 14 saverestore-support-checkfail never pass test-armhf-armhf-xl 13 migrate-support-checkfail never pass test-armhf-armhf-xl 14 saverestore-support-checkfail never pass test-amd64-amd64-xl-qemut-ws16-amd64 17 guest-stop fail never pass test-armhf-armhf-xl-credit1 13 migrate-support-checkfail never pass test-armhf-armhf-xl-credit1 14 saverestore-support-checkfail never pass test-amd64-i386-xl-qemut-ws16-amd64 17 guest-stop fail never pass test-amd64-i386-xl-qemuu-ws16-amd64 17 guest-stop fail never pass test-armhf-armhf-libvirt-raw 12 migrate-support-checkfail never pass test-armhf-armhf-libvirt-raw 13 saverestore-support-checkfail never pass test-armhf-armhf-xl-vhd 12 migrate-support-checkfail never pass test-armhf-armhf-xl-vhd 13 saverestore-support-checkfail never pass test-armhf-armhf-xl-multivcpu 13 migrate-support-checkfail never pass test-armhf-armhf-xl-multivcpu 14 saverestore-support-checkfail never pass test-amd64-amd64-xl-qemut-win10-i386 10 windows-installfail never pass test-amd64-amd64-xl-qemuu-win10-i386 10 windows-installfail never pass test-amd64-i386-xl-qemuu-win10-i386 10 windows-install fail never pass test-amd64-i386-xl-qemut-win10-i386 10 windows-install fail never pass version targeted for testing: linuxbb2d990b6fefaf99b2832a7a588234e7986ebe15 baseline version: linux5552683784c9e2641e8c74827565476a45827126 Last test of basis 130142 2018-11-15 19:02:17 Z 14 days Failing since130646 2018-11-21 08:42:52 Z9 days4 attempts Testing same since 130840 2018-11-27 21:02:30 Z2 days1 attempts People who touched revisions under test: "Eric W. Biederman" "Yan, Zheng" Aaron Brown Al Viro Alan Tull Alexandre Belloni Andrea Arcangeli Andreas Gruenbacher Andrew Morton Andrey Ryabinin Andy Shevchenko Ard Biesheuvel Arnd Bergmann Bartlomiej Zolnierkiewicz Behan Webster Ben Hutchings Boris Brezillon Breno Leitao Catalin Marinas Changwei Ge
[Xen-devel] [seabios test] 130842: regressions - FAIL
flight 130842 seabios real [real] http://logs.test-lab.xenproject.org/osstest/logs/130842/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 7 xen-boot fail REGR. vs. 130373 test-amd64-i386-xl-qemuu-dmrestrict-amd64-dmrestrict 10 debian-hvm-install fail REGR. vs. 130373 test-amd64-amd64-xl-qemuu-dmrestrict-amd64-dmrestrict 10 debian-hvm-install fail REGR. vs. 130373 Tests which did not succeed, but are not blocking: test-amd64-i386-xl-qemuu-win7-amd64 17 guest-stop fail like 130373 test-amd64-amd64-xl-qemuu-ws16-amd64 17 guest-stopfail like 130373 test-amd64-amd64-xl-qemuu-win7-amd64 17 guest-stopfail like 130373 test-amd64-i386-xl-qemuu-ws16-amd64 17 guest-stop fail like 130373 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check fail never pass test-amd64-amd64-qemuu-nested-amd 17 debian-hvm-install/l1/l2 fail never pass test-amd64-amd64-xl-qemuu-win10-i386 10 windows-installfail never pass test-amd64-i386-xl-qemuu-win10-i386 10 windows-install fail never pass version targeted for testing: seabios 42efebdf1d120554e1a30e8debf562527ec6a53d baseline version: seabios a698c8995ffb2838296ec284fe3c4ad33dfca307 Last test of basis 130373 2018-11-18 03:30:13 Z 12 days Testing same since 130842 2018-11-28 02:10:59 Z2 days1 attempts People who touched revisions under test: Stephen Douthit jobs: build-amd64-xsm pass build-i386-xsm pass build-amd64 pass build-i386 pass build-amd64-libvirt pass build-i386-libvirt pass build-amd64-pvopspass build-i386-pvops pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm pass test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsmfail test-amd64-amd64-xl-qemuu-debianhvm-amd64-xsmpass test-amd64-i386-xl-qemuu-debianhvm-amd64-xsm pass test-amd64-amd64-qemuu-nested-amdfail test-amd64-i386-qemuu-rhel6hvm-amd pass test-amd64-amd64-xl-qemuu-debianhvm-amd64pass test-amd64-i386-xl-qemuu-debianhvm-amd64 pass test-amd64-amd64-xl-qemuu-win7-amd64 fail test-amd64-i386-xl-qemuu-win7-amd64 fail test-amd64-amd64-xl-qemuu-ws16-amd64 fail test-amd64-i386-xl-qemuu-ws16-amd64 fail test-amd64-amd64-xl-qemuu-dmrestrict-amd64-dmrestrictfail test-amd64-i386-xl-qemuu-dmrestrict-amd64-dmrestrict fail test-amd64-amd64-xl-qemuu-win10-i386 fail test-amd64-i386-xl-qemuu-win10-i386 fail test-amd64-amd64-qemuu-nested-intel pass test-amd64-i386-qemuu-rhel6hvm-intel pass test-amd64-amd64-xl-qemuu-debianhvm-amd64-shadow pass test-amd64-i386-xl-qemuu-debianhvm-amd64-shadow pass sg-report-flight on osstest.test-lab.xenproject.org logs: /home/logs/logs images: /home/logs/images Logs, config files, etc. are available at http://logs.test-lab.xenproject.org/osstest/logs Explanation of these reports, and of osstest in general, is at http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master Test harness code can be found at http://xenbits.xen.org/gitweb?p=osstest.git;a=summary Not pushing. commit 42efebdf1d120554e1a30e8debf562527ec6a53d Author: Stephen Douthit Date: Wed Mar 7 13:17:36 2018 -0500 tpm: Check for TPM related ACPI tables before attempting hw probe Signed-off-by: Stephen Douthit ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH 1/2] tools/libxc: Fix issues with libxc and Xen having different featureset lengths
>>> On 29.11.18 at 20:20, wrote: > In almost all cases, Xen and libxc will agree on the featureset length, > because they are built from the same source. > > However, there are circumstances (e.g. security hotfixes) where the featureset > gets longer and dom0 will, after installing updates, be running with an old > Xen but new libxc. Despite writing the code with this scenario in mind, there > were some bugs. > > First, xen-cpuid's get_featureset() erroneously allocates a buffer based on > Xen's featureset length, but records libxc's length, which is longer. "... which may be longer", seeing that nr_features gets initialized from xc_get_cpu_featureset_size()'s return value, and its subsequent updating (through xc_get_cpu_featureset()) is only done in certain cases. > The hypercall bounce buffer code reads/writes the recorded length, which is > beyond the end of the allocated object, and a later free() encounters corrupt > heap metadata. Fix this by recording the same length that we allocate. > > Secondly, get_cpuid_domain_info() has a related bug when the passed-in > featureset is a different length to libxc's. > > A large amount of the libxc cpuid functionality depends on info->featureset > being as long as expected, and it is allocated appropriately. However, in the > case that a shorter external featureset is passed in, the logic to check for > trailing nonzero bits may read off the end of it. Rework the logic to use the > correct upper bound. > > In addition, leave a comment next to the fields in struct cpuid_domain_info > explaining the relationship between the various lengths, and how to cope with > different lengths. > > Signed-off-by: Andrew Cooper Reviewed-by: Jan Beulich ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH 2/2] tools/libxc: Fix error handling in get_cpuid_domain_info()
>>> On 29.11.18 at 20:20, wrote: > get_cpuid_domain_info() has two conflicting return styles - either -error for > local failures, or -1/errno for hypercall failures. Switch to consistently > use -error. > > While fixing the xc_get_cpu_featureset(), take the opportunity to remove the > redundancy and move it to be adjacent to the other featureset handling. Having noticed the redundancy before, I was assuming this was done to prevent altering host_nr_features early, just in case future code additions would want to use its original value. The way it is now, the code folding looks correct. > Signed-off-by: Andrew Cooper Reviewed-by: Jan Beulich ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v4 2/6] microcode: save all microcodes which pass sanity check
>>> On 30.11.18 at 08:55, wrote: > On Thu, Nov 29, 2018 at 10:22:10AM +0100, Roger Pau Monné wrote: >>On Thu, Nov 29, 2018 at 10:40:32AM +0800, Chao Gao wrote: >>> On Wed, Nov 28, 2018 at 01:00:14PM +0100, Roger Pau Monné wrote: >>> >On Wed, Nov 28, 2018 at 01:34:12PM +0800, Chao Gao wrote: >>> >> ... and search caches to find a suitable one when loading. >>> > >>> >Why do you need to save all of them? You are only going to load a >>> >single microcode, so I don't understand the need to cache them all. >> >>I think the above question needs an answer. > > Out of consideraton for a mixed-family system. Anyway, Since Jan commented > that we gave up support of a mixed-family system, we only need to save > a single microcode for offlined or hot-plugged cpus. Well, there might be slightly more needed than just a single instance. This depends on whether same family/model CPUs with different stepping and/or different "pf" value can be mixed (and would have identical feature flags in their CPUID output). I may have oversimplified the current state of things: Hotplugging a CPU with more capabilities than the boot CPU is generally fine afaict. Both you (Intel) and AMD place restrictions on permitted mixes iirc, so I don't think we need to support anything beyond such restrictions. Being able to run on permitted mixes which are not in conflict with our general assumption of there not being any CPU in the system with less capabilities than the boot CPU would seem desirable. Andrew, do you have any view or opinion here? Jan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v4 4/6] microcode: don't call apply_microcode() in cpu_request_microcode()
>>> On 30.11.18 at 09:57, wrote: > On Thu, Nov 29, 2018 at 10:46:05AM +0100, Roger Pau Monné wrote: >>On Thu, Nov 29, 2018 at 12:28:46PM +0800, Chao Gao wrote: >>> It is better that all CPUs have the same microcode revision. >>> >>> Linux kernel rejects late microcode update if finding some CPUs >>> offlined. I may port this patch to Xen too in a separate patch. >> >>What happens with hotplug CPUs? >> >>Even if you disable late loading when there are offlined CPUs you >>still need to handle hotplug CPUs, which IMO should share the same >>path with offlined CPUs: the microcode update should be done ASAP >>after bringing the CPU up. > > In linux, CPU's being offline is just a logical offline. It may participate > actions like MCE. It would lead to a situation that some cpus are using old > microcode while some are using the new one, which introduces instability. > CPU hotplug doesn't have such issue. But please be careful not to disallow ucode loading in cases where doing so is fine. I'm in particular thinking about "smt=no", where secondary hyperthreads get parked. Since ucode updates on one hyperthread of a core affect all hyperthreads, updating is fine in this particular case. Furthermore, if you look at my patch series "x86: more power- efficient CPU parking", it is also within realm of possibility to briefly wake up parked CPUs, just to get their ucode updated. Jan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v3 20/21] xen_backend: remove xen_sysdev_init() function
The init function doesn't do anything at all, so we just omit it. Cc: sstabell...@kernel.org Cc: anthony.per...@citrix.com Cc: xen-devel@lists.xenproject.org Cc: peter.mayd...@linaro.org Signed-off-by: Mao Zhongyi Signed-off-by: Zhang Shengju Acked-by: Anthony PERARD --- hw/xen/xen_backend.c | 7 --- 1 file changed, 7 deletions(-) diff --git a/hw/xen/xen_backend.c b/hw/xen/xen_backend.c index 9a8e8771ec..0bc6b1de60 100644 --- a/hw/xen/xen_backend.c +++ b/hw/xen/xen_backend.c @@ -809,11 +809,6 @@ static const TypeInfo xensysbus_info = { } }; -static int xen_sysdev_init(SysBusDevice *dev) -{ -return 0; -} - static Property xen_sysdev_properties[] = { {/* end of property list */}, }; @@ -821,9 +816,7 @@ static Property xen_sysdev_properties[] = { static void xen_sysdev_class_init(ObjectClass *klass, void *data) { DeviceClass *dc = DEVICE_CLASS(klass); -SysBusDeviceClass *k = SYS_BUS_DEVICE_CLASS(klass); -k->init = xen_sysdev_init; dc->props = xen_sysdev_properties; dc->bus_type = TYPE_XENSYSBUS; } -- 2.17.1 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v3 00/21] QOM'ify SysBusDeviceClass->init
v3 -> v2: - rebase to the HEAD - use SysBusDevice *sbd variable in patch15 v2 -> v1: - SYS_BUS_DEVICE(dev) was used in a function several times, so use a variable 'sbd' to replace it, like: SysBusDevice *sbd = SYS_BUS_DEVICE(dev); - remove the xen_sysdev_init() function - drop the patch21 in v1 - fix the broken in sysbus_realize of patch22 Cc: alistair.fran...@wdc.com Cc: anthony.per...@citrix.com Cc: arm...@redhat.com Cc: borntrae...@de.ibm.com Cc: chout...@adacore.com Cc: coh...@redhat.com Cc: da...@gibson.dropbear.id.au Cc: da...@redhat.com Cc: edgar.igles...@gmail.com Cc: ehabk...@redhat.com Cc: f4...@amsat.org Cc: g...@mprc.pku.edu.cn Cc: jan.kis...@web.de Cc: kra...@redhat.com Cc: kw...@redhat.com Cc: marcandre.lur...@redhat.com Cc: marcel.apfelb...@gmail.com Cc: mich...@walle.cc Cc: mre...@redhat.com Cc: m...@redhat.com Cc: pbonz...@redhat.com Cc: peter.mayd...@linaro.org Cc: qemu-...@nongnu.org Cc: qemu-bl...@nongnu.org Cc: qemu-...@nongnu.org Cc: qemu-s3...@nongnu.org Cc: richard.hender...@linaro.org Cc: r...@twiddle.net Cc: sstabell...@kernel.org Cc: th...@redhat.com Cc: xen-devel@lists.xenproject.org Mao Zhongyi (21): musicpal: Convert sysbus init function to realize function block/noenand: Convert sysbus init function to realize function char/grlib_apbuart: Convert sysbus init function to realize function core/empty_slot: Convert sysbus init function to realize function display/g364fb: Convert sysbus init function to realize function dma/puv3_dma: Convert sysbus init function to realize function gpio/puv3_gpio: Convert sysbus init function to realize function milkymist-softusb: Convert sysbus init function to realize function input/pl050: Convert sysbus init function to realize function intc/puv3_intc: Convert sysbus init function to realize function milkymist-hpdmc: Convert sysbus init function to realize function milkymist-pfpu: Convert sysbus init function to realize function puv3_pm.c: Convert sysbus init function to realize function nvram/ds1225y: Convert sysbus init function to realize function pci-bridge/dec: Convert sysbus init function to realize function timer/etraxfs_timer: Convert sysbus init function to realize function timer/grlib_gptimer: Convert sysbus init function to realize function timer/puv3_ost: Convert sysbus init function to realize function usb/tusb6010: Convert sysbus init function to realize function xen_backend: remove xen_sysdev_init() function core/sysbus: remove the SysBusDeviceClass::init path hw/arm/musicpal.c| 9 - hw/block/onenand.c | 16 +++- hw/char/grlib_apbuart.c | 12 +--- hw/core/empty_slot.c | 9 - hw/core/sysbus.c | 15 +-- hw/display/g364fb.c | 9 +++-- hw/dma/puv3_dma.c| 10 -- hw/gpio/puv3_gpio.c | 29 ++--- hw/input/milkymist-softusb.c | 16 +++- hw/input/pl050.c | 11 +-- hw/intc/puv3_intc.c | 11 --- hw/misc/milkymist-hpdmc.c| 9 +++-- hw/misc/milkymist-pfpu.c | 12 +--- hw/misc/puv3_pm.c| 10 -- hw/nvram/ds1225y.c | 12 +--- hw/pci-bridge/dec.c | 12 ++-- hw/timer/etraxfs_timer.c | 14 +++--- hw/timer/grlib_gptimer.c | 11 +-- hw/timer/puv3_ost.c | 13 ++--- hw/usb/tusb6010.c| 8 +++- hw/xen/xen_backend.c | 7 --- include/hw/sysbus.h | 3 --- 22 files changed, 106 insertions(+), 152 deletions(-) -- 2.17.1 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v4 6/6] x86/microcode: Synchronize late microcode loading
>>> On 29.11.18 at 23:43, wrote: > One other comment about this patch (which IIRC was raised by Andrew on > an earlier version) is that it may be worth to stop timer calibration. I > am pretty sure we've seen deadlocks, which is why we ended up disabling > it during microcode updates. I recall the claim, but I don't think I've ever seen proof. My point was ans still is that if there's a problem with ucode loading using the stop-machine logic here, then there's a problem with the stop-machine logic in general, which would make other uses, perhaps most notably rcu_barrier(), unsafe too. Otoh from your reply it's not clear whether the observed issue wasn't with our present way of loading ucode, but then it would put under question the general correctness of continue_hypercall_on_cpu(), which again we use for more than just ucode loading. Jan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH] x86/shadow: don't enable shadow mode with too small a shadow allocation
At 07:53 -0700 on 29 Nov (1543477992), Jan Beulich wrote: > We've had more than one report of host crashes after failed migration, > and in at least one case we've had a hint towards a too far shrunk > shadow allocation pool. Instead of just checking the pool for being > empty, check whether the pool is smaller than what > shadow_set_allocation() would minimally bump it to if it was invoked in > the first place. > > Signed-off-by: Jan Beulich Acked-by: Tim Deegan Thanks, Tim. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH 2/2] iommu: elide flushing for higher order map/unmap operations
This patch removes any implicit flushing that occurs in the implementation of map and unmap operations and, instead, adds explicit flushing at the end of the loops in the iommu_map/unmap() wrapper functions. Because VT-d currently performs two different types of flush dependent upon whether a PTE is being modified versus merely added (i.e. replacing a non- present PTE) a 'iommu_flush_type' enumeration is defined by this patch and the iommu_ops map method is modified to pass back the type of flush necessary for the PTE that has been populated. When a higher order mapping operation is done, the wrapper code performs the 'highest' level of flush required by the individual iommu_ops method calls, where a 'modified PTE' flush is deemed to be higher than a 'added PTE' flush. The ARM SMMU implementation currently performs no implicit flushing and therefore the modified map method always passes back a flush type of 'none'. NOTE: The per-cpu 'iommu_dont_flush_iotlb' is respected by the iommu_map() wrapper function and therefore this now applies to all IOMMU implementations rather than just VT-d. Use of the flag has been added to arch_iommu_hwdom_init() so that flushing is now fully elided for the hardware domain's initial table population. Signed-off-by: Paul Durrant --- Cc: Stefano Stabellini Cc: Julien Grall Cc: Suravee Suthikulpanit Cc: Brian Woods Cc: Jan Beulich Cc: Kevin Tian Cc: Andrew Cooper Cc: Wei Liu Cc: "Roger Pau Monné" This code has only been compile tested for ARM. --- xen/drivers/passthrough/amd/iommu_map.c | 70 --- xen/drivers/passthrough/arm/smmu.c| 13 +++-- xen/drivers/passthrough/iommu.c | 31 ++-- xen/drivers/passthrough/vtd/iommu.c | 26 +- xen/drivers/passthrough/x86/iommu.c | 15 -- xen/include/asm-x86/hvm/svm/amd-iommu-proto.h | 8 +-- xen/include/xen/iommu.h | 13 - 7 files changed, 121 insertions(+), 55 deletions(-) diff --git a/xen/drivers/passthrough/amd/iommu_map.c b/xen/drivers/passthrough/amd/iommu_map.c index c05b042821..6afc412d3f 100644 --- a/xen/drivers/passthrough/amd/iommu_map.c +++ b/xen/drivers/passthrough/amd/iommu_map.c @@ -45,13 +45,14 @@ static void clear_iommu_pte_present(unsigned long l1_mfn, unsigned long dfn) unmap_domain_page(table); } -static bool set_iommu_pde_present(uint32_t *pde, unsigned long next_mfn, - unsigned int next_level, - bool iw, bool ir) +static enum iommu_flush_type set_iommu_pde_present( +uint32_t *pde, unsigned long next_mfn, unsigned int next_level, bool iw, +bool ir) { uint64_t maddr_next; uint32_t addr_lo, addr_hi, entry; -bool need_flush = false, old_present; +bool old_present; +enum iommu_flush_type flush_type = IOMMU_FLUSH_none; maddr_next = __pfn_to_paddr(next_mfn); @@ -84,7 +85,7 @@ static bool set_iommu_pde_present(uint32_t *pde, unsigned long next_mfn, if ( maddr_old != maddr_next || iw != old_w || ir != old_r || old_level != next_level ) -need_flush = true; +flush_type = IOMMU_FLUSH_modified; } addr_lo = maddr_next & DMA_32BIT_MASK; @@ -121,24 +122,24 @@ static bool set_iommu_pde_present(uint32_t *pde, unsigned long next_mfn, IOMMU_PDE_PRESENT_SHIFT, &entry); pde[0] = entry; -return need_flush; +return flush_type; } -static bool set_iommu_pte_present(unsigned long pt_mfn, unsigned long dfn, - unsigned long next_mfn, int pde_level, - bool iw, bool ir) +static enum iommu_flush_type set_iommu_pte_present( +unsigned long pt_mfn, unsigned long dfn, unsigned long next_mfn, +int pde_level, bool iw, bool ir) { uint64_t *table; uint32_t *pde; -bool need_flush; +enum iommu_flush_type flush_type; table = map_domain_page(_mfn(pt_mfn)); pde = (uint32_t *)(table + pfn_to_pde_idx(dfn, pde_level)); -need_flush = set_iommu_pde_present(pde, next_mfn, 0, iw, ir); +flush_type = set_iommu_pde_present(pde, next_mfn, 0, iw, ir); unmap_domain_page(table); -return need_flush; +return flush_type; } void amd_iommu_set_root_page_table(uint32_t *dte, uint64_t root_ptr, @@ -522,13 +523,15 @@ static int update_paging_mode(struct domain *d, unsigned long dfn) } int amd_iommu_map_page(struct domain *d, dfn_t dfn, mfn_t mfn, - unsigned int flags) + unsigned int flags, + enum iommu_flush_type *flush_type) { -bool need_flush; struct domain_iommu *hd = dom_iommu(d); int rc; unsigned long pt_mfn[7]; +*flush_type = IOMMU_FLUSH_none; + if ( iommu_use_hap_pt(d) ) return 0; @@ -570,12 +573,9 @@ int amd_iommu_map_page(struct domain *d, dfn_t dfn, mfn_t mf
[Xen-devel] [PATCH 0/2] iommu improvements
Paul Durrant (2): amd-iommu: add flush iommu_ops iommu: elide flushing for higher order map/unmap operations xen/drivers/passthrough/amd/iommu_map.c | 114 -- xen/drivers/passthrough/amd/pci_amd_iommu.c | 2 + xen/drivers/passthrough/arm/smmu.c| 13 ++- xen/drivers/passthrough/iommu.c | 31 ++- xen/drivers/passthrough/vtd/iommu.c | 26 +++--- xen/drivers/passthrough/x86/iommu.c | 15 +++- xen/include/asm-x86/hvm/svm/amd-iommu-proto.h | 7 +- xen/include/xen/iommu.h | 18 +++- 8 files changed, 175 insertions(+), 51 deletions(-) -- 2.11.0 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH 1/2] amd-iommu: add flush iommu_ops
The iommu_ops structure contains two methods for flushing: 'iotlb_flush' and 'iotlb_flush_all'. This patch adds implementations of these for AMD IOMMUs. The iotlb_flush method takes a base DFN and a (4k) page count, but the flush needs to be done by page order (i.e. 0, 9 or 18). Because a flush operation is fairly expensive to perform, the code calculates the minimum order single flush that will cover the specified page range rather than performing multiple flushes. Signed-off-by: Paul Durrant --- Cc: Suravee Suthikulpanit Cc: Brian Woods Cc: Jan Beulich Cc: Andrew Cooper Cc: Wei Liu Cc: "Roger Pau Monné" --- xen/drivers/passthrough/amd/iommu_map.c | 48 +++ xen/drivers/passthrough/amd/pci_amd_iommu.c | 2 ++ xen/include/asm-x86/hvm/svm/amd-iommu-proto.h | 3 ++ xen/include/xen/iommu.h | 5 +++ 4 files changed, 58 insertions(+) diff --git a/xen/drivers/passthrough/amd/iommu_map.c b/xen/drivers/passthrough/amd/iommu_map.c index 04cb7b3182..c05b042821 100644 --- a/xen/drivers/passthrough/amd/iommu_map.c +++ b/xen/drivers/passthrough/amd/iommu_map.c @@ -631,6 +631,54 @@ int amd_iommu_unmap_page(struct domain *d, dfn_t dfn) spin_unlock(&hd->arch.mapping_lock); amd_iommu_flush_pages(d, dfn_x(dfn), 0); +return 0; +} + +static unsigned long flush_count(dfn_t dfn, unsigned int page_count, + unsigned int order) +{ +unsigned long start = dfn_x(dfn) / (1u << order); +unsigned long end = DIV_ROUND_UP(dfn_x(dfn) + page_count, + (1u << order)); + +ASSERT(end > start); +return end - start; +} + +int amd_iommu_flush_iotlb_pages(struct domain *d, dfn_t dfn, +unsigned int page_count) +{ +/* Match VT-d semantics */ +if ( !page_count || dfn_eq(dfn, INVALID_DFN) || + dfn_lt(dfn_add(dfn, page_count), dfn) /* overflow */ ) +{ +amd_iommu_flush_all_pages(d); +return 0; +} + +/* + * Flushes are expensive so find the minimal single flush that will + * cover the page range. + * + * NOTE: It is unnecessary to round down the DFN value to align with + * the flush order here. This is done by the internals of the + * flush code. + */ +if ( page_count == 1 ) /* order 0 flush count */ +amd_iommu_flush_pages(d, dfn_x(dfn), 0); +else if ( flush_count(dfn, page_count, 9) == 1 ) +amd_iommu_flush_pages(d, dfn_x(dfn), 9); +else if ( flush_count(dfn, page_count, 18) == 1 ) +amd_iommu_flush_pages(d, dfn_x(dfn), 18); +else +amd_iommu_flush_all_pages(d); + +return 0; +} + +int amd_iommu_flush_iotlb_all(struct domain *d) +{ +amd_iommu_flush_all_pages(d); return 0; } diff --git a/xen/drivers/passthrough/amd/pci_amd_iommu.c b/xen/drivers/passthrough/amd/pci_amd_iommu.c index 900136390d..33a3798f36 100644 --- a/xen/drivers/passthrough/amd/pci_amd_iommu.c +++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c @@ -579,6 +579,8 @@ static const struct iommu_ops __initconstrel amd_iommu_ops = { .teardown = amd_iommu_domain_destroy, .map_page = amd_iommu_map_page, .unmap_page = amd_iommu_unmap_page, +.iotlb_flush = amd_iommu_flush_iotlb_pages, +.iotlb_flush_all = amd_iommu_flush_iotlb_all, .free_page_table = deallocate_page_table, .reassign_device = reassign_device, .get_device_group_id = amd_iommu_group_id, diff --git a/xen/include/asm-x86/hvm/svm/amd-iommu-proto.h b/xen/include/asm-x86/hvm/svm/amd-iommu-proto.h index 718a44f956..88715329ca 100644 --- a/xen/include/asm-x86/hvm/svm/amd-iommu-proto.h +++ b/xen/include/asm-x86/hvm/svm/amd-iommu-proto.h @@ -60,6 +60,9 @@ int __must_check amd_iommu_alloc_root(struct domain_iommu *hd); int amd_iommu_reserve_domain_unity_map(struct domain *domain, paddr_t phys_addr, unsigned long size, int iw, int ir); +int __must_check amd_iommu_flush_iotlb_pages(struct domain *d, dfn_t dfn, + unsigned int page_count); +int __must_check amd_iommu_flush_iotlb_all(struct domain *d); /* Share p2m table with iommu */ void amd_iommu_share_p2m(struct domain *d); diff --git a/xen/include/xen/iommu.h b/xen/include/xen/iommu.h index 3d78126801..da8294bac8 100644 --- a/xen/include/xen/iommu.h +++ b/xen/include/xen/iommu.h @@ -52,6 +52,11 @@ static inline bool_t dfn_eq(dfn_t x, dfn_t y) return dfn_x(x) == dfn_x(y); } +static inline bool_t dfn_lt(dfn_t x, dfn_t y) +{ +return dfn_x(x) < dfn_x(y); +} + extern bool_t iommu_enable, iommu_enabled; extern bool_t force_iommu, iommu_verbose; extern bool_t iommu_workaround_bios_bug, iommu_igfx; -- 2.11.0 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [distros-debian-jessie test] 75627: regressions - FAIL
flight 75627 distros-debian-jessie real [real] http://osstest.xensource.com/osstest/logs/75627/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: test-amd64-i386-i386-jessie-netboot-pvgrub 10 debian-di-install fail REGR. vs. 75618 Tests which did not succeed, but are not blocking: test-amd64-amd64-i386-jessie-netboot-pygrub 19 guest-start/debian.repeat fail like 75618 test-armhf-armhf-armhf-jessie-netboot-pygrub 10 debian-di-install fail like 75618 baseline version: flight 75618 jobs: build-amd64 pass build-armhf pass build-i386 pass build-amd64-pvopspass build-armhf-pvopspass build-i386-pvops pass test-amd64-amd64-amd64-jessie-netboot-pvgrub pass test-amd64-i386-i386-jessie-netboot-pvgrub fail test-amd64-i386-amd64-jessie-netboot-pygrub pass test-armhf-armhf-armhf-jessie-netboot-pygrub fail test-amd64-amd64-i386-jessie-netboot-pygrub fail sg-report-flight on osstest.xs.citrite.net logs: /home/osstest/logs images: /home/osstest/images Logs, config files, etc. are available at http://osstest.xensource.com/osstest/logs Test harness code can be found at http://xenbits.xensource.com/gitweb?p=osstest.git;a=summary Push not applicable. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v2] pvcalls-front: Avoid __get_free_pages(GFP_KERNEL) under spinlock
The problem is that we call this with a spin lock held. The call tree is: pvcalls_front_accept() holds bedata->socket_lock. -> create_active() -> __get_free_pages() uses GFP_KERNEL The create_active() function is only called from pvcalls_front_accept() with a spin_lock held, The allocation is not allowed to sleep and GFP_KERNEL is not sufficient. This issue was detected by using the Coccinelle software. v2: Add a function doing the allocations which is called outside the lock and passing the allocated data to create_active(). Suggested-by: Juergen Gross Signed-off-by: Wen Yang CC: Julia Lawall CC: Boris Ostrovsky CC: Juergen Gross CC: Stefano Stabellini CC: xen-devel@lists.xenproject.org CC: linux-ker...@vger.kernel.org --- drivers/xen/pvcalls-front.c | 71 ++--- 1 file changed, 59 insertions(+), 12 deletions(-) diff --git a/drivers/xen/pvcalls-front.c b/drivers/xen/pvcalls-front.c index 2f11ca72a281..cc92af31807f 100644 --- a/drivers/xen/pvcalls-front.c +++ b/drivers/xen/pvcalls-front.c @@ -335,7 +335,43 @@ int pvcalls_front_socket(struct socket *sock) return ret; } -static int create_active(struct sock_mapping *map, int *evtchn) +struct sock_mapping_active_ring { + struct pvcalls_data_intf *ring; + RING_IDX ring_order; + void *bytes; +}; + +static int alloc_active_ring(struct sock_mapping_active_ring *active_ring) +{ + active_ring->ring = NULL; + active_ring->bytes = NULL; + + active_ring->ring = (struct pvcalls_data_intf *) + __get_free_page(GFP_KERNEL | __GFP_ZERO); + if (active_ring->ring == NULL) + goto out_error; + active_ring->ring_order = PVCALLS_RING_ORDER; + active_ring->bytes = (void *)__get_free_pages(GFP_KERNEL | __GFP_ZERO, + PVCALLS_RING_ORDER); + if (active_ring->bytes == NULL) + goto out_error; + + return 0; + +out_error: + kfree(active_ring->bytes); + kfree(active_ring->ring); + return -ENOMEM; +} + +static void free_active_ring(struct sock_mapping_active_ring *active_ring) +{ + free_pages((unsigned long)active_ring->bytes, active_ring->ring_order); + free_page((unsigned long)active_ring->ring); +} + +static int create_active(struct sock_mapping *map, int *evtchn, + struct sock_mapping_active_ring *active_ring) { void *bytes; int ret = -ENOMEM, irq = -1, i; @@ -343,15 +379,9 @@ static int create_active(struct sock_mapping *map, int *evtchn) *evtchn = -1; init_waitqueue_head(&map->active.inflight_conn_req); - map->active.ring = (struct pvcalls_data_intf *) - __get_free_page(GFP_KERNEL | __GFP_ZERO); - if (map->active.ring == NULL) - goto out_error; - map->active.ring->ring_order = PVCALLS_RING_ORDER; - bytes = (void *)__get_free_pages(GFP_KERNEL | __GFP_ZERO, - PVCALLS_RING_ORDER); - if (bytes == NULL) - goto out_error; + map->active.ring = active_ring->ring; + map->active.ring->ring_order = active_ring->ring_order; + bytes = active_ring->bytes; for (i = 0; i < (1 << PVCALLS_RING_ORDER); i++) map->active.ring->ref[i] = gnttab_grant_foreign_access( pvcalls_front_dev->otherend_id, @@ -397,6 +427,7 @@ int pvcalls_front_connect(struct socket *sock, struct sockaddr *addr, struct sock_mapping *map = NULL; struct xen_pvcalls_request *req; int notify, req_id, ret, evtchn; + struct sock_mapping_active_ring active_ring; if (addr->sa_family != AF_INET || sock->type != SOCK_STREAM) return -EOPNOTSUPP; @@ -406,15 +437,21 @@ int pvcalls_front_connect(struct socket *sock, struct sockaddr *addr, return PTR_ERR(map); bedata = dev_get_drvdata(&pvcalls_front_dev->dev); + ret = alloc_active_ring(&active_ring); + if (ret < 0) { + pvcalls_exit_sock(sock); + return ret; + } spin_lock(&bedata->socket_lock); ret = get_request(bedata, &req_id); if (ret < 0) { spin_unlock(&bedata->socket_lock); + free_active_ring(&active_ring); pvcalls_exit_sock(sock); return ret; } - ret = create_active(map, &evtchn); + ret = create_active(map, &evtchn, &active_ring); if (ret < 0) { spin_unlock(&bedata->socket_lock); pvcalls_exit_sock(sock); @@ -744,6 +781,7 @@ int pvcalls_front_accept(struct socket *sock, struct socket *newsock, int flags) struct sock_mapping *map2 = NULL; struct xen_pvcalls_request *req; int notify, req_id, ret, evtchn, nonblock; + struct sock_mapping_active_ring active_ring; map = pvcalls_enter_sock(sock); if (IS_ERR(map)) @@ -780,12 +818
Re: [Xen-devel] [PATCH] pci: apply workaround for Intel errata HSE43 and BDF2
>>> On 29.11.18 at 18:11, wrote: > This errata affect the values read from the BAR registers, and could > render vPCI (and by extension PVH Dom0 unusable). > > HSE43 is a Haswell erratum where a non-BAR register is implemented at > the position where the first BAR of the device should be found in the s/in the/in a/ or something like this, because out of the several Power Control Unit devices only one kind is really affected. > Power Control Unit device. Note that there are no BARs on this device, > apart from the bogus CSR register positioned on top of the first BAR. > > BDF2 is a Broadwell erratum where BARs in the Home Agent device will > return bogus non-zero values. I'm afraid there's quite a bit of confusion in Intel's docs here: The vol 2 datasheet link for this CPU from https://www.intel.com/content/www/us/en/processors/xeon/xeon-technical-resources.html looks to be dead, and the local copy I have of this lists PCI IDs identical to E5v3. The E7v4 link still works, and vol 2 has the same issue. (I really just wanted to cross check that we fully cover the issue with the three PCI IDs used.) In any event in the code I'd like to see BDX2 mentioned as well (the same erratum on E7v4). However, given the situation with the datasheets I can't see a way to associate the device IDs used with the individual errata (I would generally suspect there being a 3rd erratum for the 3rd device ID). Several years ago spec updates also used to have PCI device ID tables, but this doesn't appear to be the case anymore. > @@ -298,6 +299,34 @@ static void check_pdev(const struct pci_dev *pdev) > #undef PCI_STATUS_CHECK > } > > +static void apply_quirks(struct pci_dev *pdev) > +{ > +uint16_t vendor = pci_conf_read16(pdev->seg, pdev->bus, > + PCI_SLOT(pdev->devfn), > + PCI_FUNC(pdev->devfn), PCI_VENDOR_ID); > +uint16_t device = pci_conf_read16(pdev->seg, pdev->bus, > + PCI_SLOT(pdev->devfn), > + PCI_FUNC(pdev->devfn), PCI_DEVICE_ID); > + > +if ( vendor == PCI_VENDOR_ID_INTEL && (device == 0x2fc0 || > + device == 0x6f60 || device == 0x6fa0 || device == 0x6fc0) ) Instead of such an ever growing if(), could we make this table based? > @@ -397,6 +426,7 @@ static struct pci_dev *alloc_pdev(struct pci_seg *pseg, > u8 bus, u8 devfn) > } > > check_pdev(pdev); > +apply_quirks(pdev); At which point putting the small loop into check_pdev() might be as good as adding a new function. > --- a/xen/drivers/vpci/header.c > +++ b/xen/drivers/vpci/header.c > @@ -480,6 +480,9 @@ static int init_bars(struct pci_dev *pdev) > return -EOPNOTSUPP; > } > > +if ( pdev->ignore_bars ) > +num_bars = 0; While benign for the errata currently dealt with I wonder whether the ROM BAR wouldn't better be left alone as well with this flag set. Since additionally enabling memory decoding on a device without BARs is a questionable operation I wonder whether you couldn't better move this a few lines down immediately ahead of the loop over the BARs, and make it return instead of zeroing num_bars. > --- a/xen/include/xen/pci.h > +++ b/xen/include/xen/pci.h > @@ -115,6 +115,9 @@ struct pci_dev { > > /* Data for vPCI. */ > struct vpci *vpci; > + > +/* Device with errata, ignore the BARs. */ > +bool ignore_bars; Please can you put this into (one of?) the existing hole(s), instead of at the end? Jan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v3] pvcalls-front: Avoid __get_free_pages(GFP_KERNEL) under spinlock
The problem is that we call this with a spin lock held. The call tree is: pvcalls_front_accept() holds bedata->socket_lock. -> create_active() -> __get_free_pages() uses GFP_KERNEL The create_active() function is only called from pvcalls_front_accept() with a spin_lock held, The allocation is not allowed to sleep and GFP_KERNEL is not sufficient. This issue was detected by using the Coccinelle software. v2: Add a function doing the allocations which is called outside the lock and passing the allocated data to create_active(). v3: Use the matching deallocators i.e., free_page() and free_pages(), respectively. Suggested-by: Juergen Gross Signed-off-by: Wen Yang CC: Julia Lawall CC: Boris Ostrovsky CC: Juergen Gross CC: Stefano Stabellini CC: xen-devel@lists.xenproject.org CC: linux-ker...@vger.kernel.org --- drivers/xen/pvcalls-front.c | 71 ++--- 1 file changed, 59 insertions(+), 12 deletions(-) diff --git a/drivers/xen/pvcalls-front.c b/drivers/xen/pvcalls-front.c index 2f11ca72a281..a26f416daf46 100644 --- a/drivers/xen/pvcalls-front.c +++ b/drivers/xen/pvcalls-front.c @@ -335,7 +335,43 @@ int pvcalls_front_socket(struct socket *sock) return ret; } -static int create_active(struct sock_mapping *map, int *evtchn) +struct sock_mapping_active_ring { + struct pvcalls_data_intf *ring; + RING_IDX ring_order; + void *bytes; +}; + +static int alloc_active_ring(struct sock_mapping_active_ring *active_ring) +{ + active_ring->ring = NULL; + active_ring->bytes = NULL; + + active_ring->ring = (struct pvcalls_data_intf *) + __get_free_page(GFP_KERNEL | __GFP_ZERO); + if (active_ring->ring == NULL) + goto out_error; + active_ring->ring_order = PVCALLS_RING_ORDER; + active_ring->bytes = (void *)__get_free_pages(GFP_KERNEL | __GFP_ZERO, + PVCALLS_RING_ORDER); + if (active_ring->bytes == NULL) + goto out_error; + + return 0; + +out_error: + free_pages((unsigned long)active_ring->bytes, active_ring->ring_order); + free_page((unsigned long)active_ring->ring); + return -ENOMEM; +} + +static void free_active_ring(struct sock_mapping_active_ring *active_ring) +{ + free_pages((unsigned long)active_ring->bytes, active_ring->ring_order); + free_page((unsigned long)active_ring->ring); +} + +static int create_active(struct sock_mapping *map, int *evtchn, + struct sock_mapping_active_ring *active_ring) { void *bytes; int ret = -ENOMEM, irq = -1, i; @@ -343,15 +379,9 @@ static int create_active(struct sock_mapping *map, int *evtchn) *evtchn = -1; init_waitqueue_head(&map->active.inflight_conn_req); - map->active.ring = (struct pvcalls_data_intf *) - __get_free_page(GFP_KERNEL | __GFP_ZERO); - if (map->active.ring == NULL) - goto out_error; - map->active.ring->ring_order = PVCALLS_RING_ORDER; - bytes = (void *)__get_free_pages(GFP_KERNEL | __GFP_ZERO, - PVCALLS_RING_ORDER); - if (bytes == NULL) - goto out_error; + map->active.ring = active_ring->ring; + map->active.ring->ring_order = active_ring->ring_order; + bytes = active_ring->bytes; for (i = 0; i < (1 << PVCALLS_RING_ORDER); i++) map->active.ring->ref[i] = gnttab_grant_foreign_access( pvcalls_front_dev->otherend_id, @@ -397,6 +427,7 @@ int pvcalls_front_connect(struct socket *sock, struct sockaddr *addr, struct sock_mapping *map = NULL; struct xen_pvcalls_request *req; int notify, req_id, ret, evtchn; + struct sock_mapping_active_ring active_ring; if (addr->sa_family != AF_INET || sock->type != SOCK_STREAM) return -EOPNOTSUPP; @@ -406,15 +437,21 @@ int pvcalls_front_connect(struct socket *sock, struct sockaddr *addr, return PTR_ERR(map); bedata = dev_get_drvdata(&pvcalls_front_dev->dev); + ret = alloc_active_ring(&active_ring); + if (ret < 0) { + pvcalls_exit_sock(sock); + return ret; + } spin_lock(&bedata->socket_lock); ret = get_request(bedata, &req_id); if (ret < 0) { spin_unlock(&bedata->socket_lock); + free_active_ring(&active_ring); pvcalls_exit_sock(sock); return ret; } - ret = create_active(map, &evtchn); + ret = create_active(map, &evtchn, &active_ring); if (ret < 0) { spin_unlock(&bedata->socket_lock); pvcalls_exit_sock(sock); @@ -744,6 +781,7 @@ int pvcalls_front_accept(struct socket *sock, struct socket *newsock, int flags) struct sock_mapping *map2 = NULL; struct xen_pvcalls_request *req; int notify, req_id, ret, e
Re: [Xen-devel] [PATCH v2 2/2] ns16550: enable use of PCI MSI
On Fri, Nov 30, 2018 at 01:52:39AM -0700, Jan Beulich wrote: > >>> On 29.11.18 at 18:33, wrote: > > On Mon, Oct 01, 2018 at 10:26:05AM -0600, Jan Beulich wrote: > >> --- a/xen/arch/x86/msi.c > >> +++ b/xen/arch/x86/msi.c > >> @@ -742,6 +742,16 @@ static int msi_capability_init(struct pc > >> > >> *desc = entry; > >> /* Restore the original MSI enabled bits */ > >> +if ( !hardware_domain ) > > > > Wouldn't it be better to assign the device to dom_xen (pdev->domain = > > dom_xen), and then check if the owner is dom_xen here? > > I'm not sure this couldn't be wrong in the general case (and we > sit on a generic code path here): It depends on whether Dom0 > can modify the device's config space, and I wouldn't want to > (here) introduce a connection between dom_xen ownership and > whether Dom0 can control INTX. The comment below here is > specifically worded to the effect of why I use hardware_domain > here. Well, I think Dom0 shouldn't be allowed to interact with devices owned by dom_xen. That being set, at least the current vPCI code will allow PVH Dom0 to do so by passing through any accesses to registers not explicitly handled by vPCI. > If we ever get into the situation of wanting to enable MSI on an > internally used device _after_ Dom0 has started, this would need > careful re-considering. OK, I'm fine with this. Maybe using system_state would be clearer to note that this code path is only to be used during early boot? > > Or at the point where this is called from the serial console driver is > > too early for dom_xen to exist? > > ns16550_init_postirq() is where both MSI setup and hiding of the > device happen, so in principle this would seem to be possible for > the specific case of a serial card. IMO it's clear from a conceptual PoV to check against the ownership of the device rather than the system state at the point of the function call. Devices assigned to Dom0 use the split model, devices assigned to Xen don't. > >> @@ -777,8 +777,65 @@ static void __init ns16550_init_postirq( > >> uart->ps_bdf[0], uart->ps_bdf[1], > >> uart->ps_bdf[2]); > >> } > >> + > >> +if ( uart->msi ) > >> +{ > >> +struct msi_info msi = { > >> +.bus = uart->ps_bdf[0], > >> +.devfn = PCI_DEVFN(uart->ps_bdf[1], uart->ps_bdf[2]), > >> +.irq = rc = uart->irq, > >> +.entry_nr = 1 > >> +}; > >> + > >> +if ( rc > 0 ) > >> +{ > >> +struct msi_desc *msi_desc = NULL; > >> + > >> +pcidevs_lock(); > >> + > >> +rc = pci_enable_msi(&msi, &msi_desc); > > > > Before attempting to enable MSI, shouldn't you make sure memory > > decoding is enabled in case the device uses MSIX? > > > > I think this code assumes the device will always use MSI? (in which > > case my above question is likely moot). > > I've yet to see serial cards using MSI-X. If we get to the point where > this is needed, we also may need to first set up the BAR for the MSI-X > table. Furthermore pci_enable_msi() won't even try to enable MSI-X > with msi->table_base being zero. Yes, that's how I figured out it was restricted to MSI because table_base was always 0. Note sure if a comment to note this code is not capable of handling MSIX would be helpful. > >> --- a/xen/drivers/pci/pci.c > >> +++ b/xen/drivers/pci/pci.c > >> @@ -115,6 +115,21 @@ int pci_find_next_ext_capability(int seg > >> return 0; > >> } > >> > >> +void pci_intx(const struct pci_dev *pdev, bool_t enable) > > > > Please use bool. > > See how old this patch is. V1 was posted long before bool came into > existence, and I had refrained from posting v2 until I actually had a > device where MSI would indeed function (the first two I tried this > with claimed to be MSI capable, but no interrupts ever surfaced > when MSI was enabled on them, yet I couldn't be sure the code > was doing something wrong). Obviously I then forgot to switch this, > which I've now done. Thanks. With the bool change: Reviewed-by: Roger Pau Monné I don't want to hold this any longer for the msi_capability_init change, we can always switch to check device ownership in a later patch. Roger. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH 2/9] libxl: Move dm user determination logic into a helper function
> On Nov 23, 2018, at 5:14 PM, George Dunlap wrote: > > diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h > index e498435e16..a370de54ed 100644 > --- a/tools/libxl/libxl_internal.h > +++ b/tools/libxl/libxl_internal.h > @@ -1135,6 +1135,8 @@ typedef struct { > const char * shim_cmdline; > const char * pv_cmdline; > > +char * dm_runas; I’ve removed the space both here, and in the above char * elements. I’ll retain both of your Acks unless I hear otherwise. -George ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH 2/2] iommu: elide flushing for higher order map/unmap operations
On 30/11/2018 10:45, Paul Durrant wrote: > This patch removes any implicit flushing that occurs in the implementation > of map and unmap operations and, instead, adds explicit flushing at the > end of the loops in the iommu_map/unmap() wrapper functions. > > Because VT-d currently performs two different types of flush dependent upon > whether a PTE is being modified versus merely added (i.e. replacing a non- > present PTE) a 'iommu_flush_type' enumeration is defined by this patch and > the iommu_ops map method is modified to pass back the type of flush > necessary for the PTE that has been populated. When a higher order mapping > operation is done, the wrapper code performs the 'highest' level of flush > required by the individual iommu_ops method calls, where a 'modified PTE' > flush is deemed to be higher than a 'added PTE' flush. The ARM SMMU > implementation currently performs no implicit flushing and therefore > the modified map method always passes back a flush type of 'none'. > > NOTE: The per-cpu 'iommu_dont_flush_iotlb' is respected by the iommu_map() > wrapper function and therefore this now applies to all IOMMU > implementations rather than just VT-d. Use of the flag has been added > to arch_iommu_hwdom_init() so that flushing is now fully elided for > the hardware domain's initial table population. iommu_dont_flush_iotlb is a misfeature. While it still exists, the flushing API is fundamentally broken. Do you have a plan to remove it? I ask, because the only feasible option I see is for iommu_{map,unmap}() to pass the flush accumulation out to the caller, and have the caller use the appropriate flush interfaces. [Edit - lunch happened around about this point, and there was a long discussion] One idea with be to start with a prep patch renaming iommu_{,un}map() to _legacy(), nothing beside them that they have implicit flushing characteristics. Then, the nonflushing versions of iommu_{,un}map() can be introduced, which return the accumulated flush flag, and the callers can DTRT. This way, we can avoid introducing a further user of iommu_dont_flush_iotlb in arch_iommu_hwdom_init(), and clean up the remaining legacy callsites at a later point when more infrastructure is in place. In particular, the P2M code cannot be fixed to behave in this way at the moment because the point at which the IOMMU changes are hooked in lacks an order parameter. > diff --git a/xen/include/xen/iommu.h b/xen/include/xen/iommu.h > index da8294bac8..289e0e2772 100644 > --- a/xen/include/xen/iommu.h > +++ b/xen/include/xen/iommu.h > @@ -155,6 +155,13 @@ struct page_info; > */ > typedef int iommu_grdm_t(xen_pfn_t start, xen_ulong_t nr, u32 id, void > *ctxt); > This wants at least a comment stating that some IOMMUs require that we issue a flush when modifying a not-present/otherwise invalid entry. > +enum iommu_flush_type > +{ > +IOMMU_FLUSH_none, /* no flush required */ > +IOMMU_FLUSH_added, /* no modified entries, just additional entries */ IOMMU_FLUSH_invalid ? I think it is more descriptive of the scenario in which it is used. > +IOMMU_FLUSH_modified, /* modified entries */ > +}; > + > struct iommu_ops { > int (*init)(struct domain *d); > void (*hwdom_init)(struct domain *d); > @@ -177,7 +184,8 @@ struct iommu_ops { > * other by the caller in order to have meaningful results. > */ > int __must_check (*map_page)(struct domain *d, dfn_t dfn, mfn_t mfn, > - unsigned int flags); > + unsigned int flags, > + enum iommu_flush_type *flush_type); Maintaining the flush type by pointer is quite awkward. How about folding a positive flush type in with negative errors? i.e. map_page() becomes < 0 on error, 0 for success/no flush and >0 for success/w flush. I think the result would be rather cleaner to read. ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH 3/9] libxl: Get rid of support for QEMU_USER_BASE (xen-qemuuser-domidNN)
> On Nov 28, 2018, at 3:32 PM, Wei Liu wrote: > > On Fri, Nov 23, 2018 at 05:14:56PM +, George Dunlap wrote: >> QEMU_USER_BASE allows a user to specify the UID to use when running >> the devicemodel for a specific domain number. Unfortunately, this is >> not really practical: It requires nearly 32,000 entries in >> /etc/passwd. QEMU_USER_RANGE_BASE is much more practical. >> >> Remove support for QEMU_USER_BASE. >> >> Signed-off-by: George Dunlap > > I have no problem with you removing this, but you forgot to change > xl.cfg.pod.5.in. Hmm… it seems what I *really* forgot to do was remove that entire section and point to the feature doc instead. It’s quite out of date now (e.g., cdrom insert should work I think, qemu is chrooted so can’t read world-readable files, &c). With your permission I’ll put fixing that up properly on a to-do list (to be done before 4.12 release at a minimum). -George ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH 1/2] amd-iommu: add flush iommu_ops
On 30/11/2018 10:45, Paul Durrant wrote: > diff --git a/xen/drivers/passthrough/amd/iommu_map.c > b/xen/drivers/passthrough/amd/iommu_map.c > index 04cb7b3182..c05b042821 100644 > --- a/xen/drivers/passthrough/amd/iommu_map.c > +++ b/xen/drivers/passthrough/amd/iommu_map.c > @@ -631,6 +631,54 @@ int amd_iommu_unmap_page(struct domain *d, dfn_t dfn) > spin_unlock(&hd->arch.mapping_lock); > > amd_iommu_flush_pages(d, dfn_x(dfn), 0); > +return 0; > +} > + > +static unsigned long flush_count(dfn_t dfn, unsigned int page_count, > + unsigned int order) > +{ > +unsigned long start = dfn_x(dfn) / (1u << order); > +unsigned long end = DIV_ROUND_UP(dfn_x(dfn) + page_count, > + (1u << order)); > + > +ASSERT(end > start); > +return end - start; > +} > + > +int amd_iommu_flush_iotlb_pages(struct domain *d, dfn_t dfn, > +unsigned int page_count) What are the semantics here? Why page_count rather than order? Are we guaranteed to be actually flushing consecutive dfn's ? > +{ > +/* Match VT-d semantics */ > +if ( !page_count || dfn_eq(dfn, INVALID_DFN) || Do we really have callers passing these? I'd argue that these should be invalid to pass (accepting that we might need to tolerate them until other cleanup can occur). > + dfn_lt(dfn_add(dfn, page_count), dfn) /* overflow */ ) Given that all users of dfn here want it in unsigned long form, I'd make a local dfn_l and use that. I'm not sure that we want to start accumulating a sporadic set of binary operator functions for the typesafe variables. > diff --git a/xen/include/xen/iommu.h b/xen/include/xen/iommu.h > index 3d78126801..da8294bac8 100644 > --- a/xen/include/xen/iommu.h > +++ b/xen/include/xen/iommu.h > @@ -52,6 +52,11 @@ static inline bool_t dfn_eq(dfn_t x, dfn_t y) > return dfn_x(x) == dfn_x(y); > } > > +static inline bool_t dfn_lt(dfn_t x, dfn_t y) No new introductions of bool_t please. dfn_eq() shouldn't have been bool_t either. ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH 1/2] tools/libxc: Fix issues with libxc and Xen having different featureset lengths
On 30/11/2018 09:18, Jan Beulich wrote: On 29.11.18 at 20:20, wrote: >> In almost all cases, Xen and libxc will agree on the featureset length, >> because they are built from the same source. >> >> However, there are circumstances (e.g. security hotfixes) where the >> featureset >> gets longer and dom0 will, after installing updates, be running with an old >> Xen but new libxc. Despite writing the code with this scenario in mind, >> there >> were some bugs. >> >> First, xen-cpuid's get_featureset() erroneously allocates a buffer based on >> Xen's featureset length, but records libxc's length, which is longer. > "... which may be longer", seeing that nr_features gets initialized from > xc_get_cpu_featureset_size()'s return value, and its subsequent > updating (through xc_get_cpu_featureset()) is only done in certain > cases. Ah yes - very true. I'll also tweak the following sentence to start with "In this situation, the hypercall bounce ..." > >> The hypercall bounce buffer code reads/writes the recorded length, which is >> beyond the end of the allocated object, and a later free() encounters corrupt >> heap metadata. Fix this by recording the same length that we allocate. >> >> Secondly, get_cpuid_domain_info() has a related bug when the passed-in >> featureset is a different length to libxc's. >> >> A large amount of the libxc cpuid functionality depends on info->featureset >> being as long as expected, and it is allocated appropriately. However, in >> the >> case that a shorter external featureset is passed in, the logic to check for >> trailing nonzero bits may read off the end of it. Rework the logic to use >> the >> correct upper bound. >> >> In addition, leave a comment next to the fields in struct cpuid_domain_info >> explaining the relationship between the various lengths, and how to cope with >> different lengths. >> >> Signed-off-by: Andrew Cooper > Reviewed-by: Jan Beulich Thanks, ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH 1/2] amd-iommu: add flush iommu_ops
> -Original Message- > From: Andrew Cooper > Sent: 30 November 2018 13:04 > To: Paul Durrant ; xen-devel@lists.xenproject.org > Cc: Suravee Suthikulpanit ; Brian Woods > ; Jan Beulich ; Wei Liu > ; Roger Pau Monne > Subject: Re: [PATCH 1/2] amd-iommu: add flush iommu_ops > > On 30/11/2018 10:45, Paul Durrant wrote: > > diff --git a/xen/drivers/passthrough/amd/iommu_map.c > b/xen/drivers/passthrough/amd/iommu_map.c > > index 04cb7b3182..c05b042821 100644 > > --- a/xen/drivers/passthrough/amd/iommu_map.c > > +++ b/xen/drivers/passthrough/amd/iommu_map.c > > @@ -631,6 +631,54 @@ int amd_iommu_unmap_page(struct domain *d, dfn_t > dfn) > > spin_unlock(&hd->arch.mapping_lock); > > > > amd_iommu_flush_pages(d, dfn_x(dfn), 0); > > +return 0; > > +} > > + > > +static unsigned long flush_count(dfn_t dfn, unsigned int page_count, > > + unsigned int order) > > +{ > > +unsigned long start = dfn_x(dfn) / (1u << order); > > +unsigned long end = DIV_ROUND_UP(dfn_x(dfn) + page_count, > > + (1u << order)); > > + > > +ASSERT(end > start); > > +return end - start; > > +} > > + > > +int amd_iommu_flush_iotlb_pages(struct domain *d, dfn_t dfn, > > +unsigned int page_count) > > What are the semantics here? Why page_count rather than order? Because that's the way the iommu_ops method is declared in the header? > Are we > guaranteed to be actually flushing consecutive dfn's ? I believe so. Looking at the wrapper code and its callers, that seems to be the assumption. > > > +{ > > +/* Match VT-d semantics */ > > +if ( !page_count || dfn_eq(dfn, INVALID_DFN) || > > Do we really have callers passing these? I'd argue that these should be > invalid to pass (accepting that we might need to tolerate them until > other cleanup can occur). I could look at cleaning that up. There aren't many callers. > > > + dfn_lt(dfn_add(dfn, page_count), dfn) /* overflow */ ) > > Given that all users of dfn here want it in unsigned long form, I'd make > a local dfn_l and use that. I'm not sure that we want to start > accumulating a sporadic set of binary operator functions for the > typesafe variables. > Ok, I don't really mind. I thought it was neater this way though. > > diff --git a/xen/include/xen/iommu.h b/xen/include/xen/iommu.h > > index 3d78126801..da8294bac8 100644 > > --- a/xen/include/xen/iommu.h > > +++ b/xen/include/xen/iommu.h > > @@ -52,6 +52,11 @@ static inline bool_t dfn_eq(dfn_t x, dfn_t y) > > return dfn_x(x) == dfn_x(y); > > } > > > > +static inline bool_t dfn_lt(dfn_t x, dfn_t y) > > No new introductions of bool_t please. dfn_eq() shouldn't have been > bool_t either. Oops, yes. Cut'n'pasted the wrong thing. Paul > > ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Linux 4.19.5 fails to boot as Xen dom0
On 11/29/18 4:06 PM, Kirill A. Shutemov wrote: > On Thu, Nov 29, 2018 at 03:00:45PM +, Juergen Gross wrote: >> On 29/11/2018 15:32, Kirill A. Shutemov wrote: >>> On Thu, Nov 29, 2018 at 02:24:47PM +, Kirill A. Shutemov wrote: On Thu, Nov 29, 2018 at 01:35:17PM +, Juergen Gross wrote: > On 29/11/2018 14:26, Kirill A. Shutemov wrote: >> On Thu, Nov 29, 2018 at 09:41:25AM +, Juergen Gross wrote: >>> On 29/11/2018 02:22, Hans van Kranenburg wrote: Hi, As also seen at: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=914951 Attached there are two serial console output logs. One is starting with Xen 4.11 (from debian unstable) as dom0, and the other one without Xen. [2.085543] BUG: unable to handle kernel paging request at 888d9fffc000 [2.085610] PGD 200c067 P4D 200c067 PUD 0 [2.085674] Oops: [#1] SMP NOPTI [2.085736] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 4.19.0-trunk-amd64 #1 Debian 4.19.5-1~exp1+pvh1 [2.085823] Hardware name: HP ProLiant DL360 G7, BIOS P68 05/21/2018 [2.085895] RIP: e030:ptdump_walk_pgd_level_core+0x1fd/0x490 [...] >>> >>> The offending stable commit is 4074ca7d8a1832921c865d250bbd08f3441b3657 >>> ("x86/mm: Move LDT remap out of KASLR region on 5-level paging"), this >>> is commit d52888aa2753e3063a9d3a0c9f72f94aa9809c15 upstream. >>> >>> Current upstream kernel is booting fine under Xen, so in general the >>> patch should be fine. Using an upstream kernel built from above commit >>> (with the then needed Xen fixup patch 1457d8cf7664f34c4ba534) is fine, >>> too. >>> >>> Kirill, are you aware of any prerequisite patch from 4.20 which could be >>> missing in 4.19.5? >> >> I'm not. >> >> Let me look into this. >> > > What is making me suspicious is the failure happening just after > releasing the init memory. Maybe there is an access to .init.data > segment or similar? The native kernel booting could be related to the > usage of 2M mappings not being available in a PV-domain. Ahh.. Could you test this: diff --git a/arch/x86/mm/dump_pagetables.c b/arch/x86/mm/dump_pagetables.c index a12afff146d1..7dec63ec7aab 100644 --- a/arch/x86/mm/dump_pagetables.c +++ b/arch/x86/mm/dump_pagetables.c @@ -496,7 +496,7 @@ static inline bool is_hypervisor_range(int idx) * 8000 - 87ff is reserved for * the hypervisor. */ - return (idx >= pgd_index(__PAGE_OFFSET) - 16) && + return (idx >= pgd_index(__PAGE_OFFSET) - 17) && (idx < pgd_index(__PAGE_OFFSET)); #else return false; >>> >>> Or, better, this: >> >> That makes it boot again! >> >> Any idea why upstream doesn't need it? > > Nope. > > I'll prepare a proper fix. > Thanks for looking into this. In the meantime, I applied the "Or, better, this" change, and my dom0 boots again. FYI, boot log now: (paste 90d valid) https://paste.debian.net/plainh/48940826 Hans ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH 2/2] tools/libxc: Fix error handling in get_cpuid_domain_info()
On 30/11/2018 09:23, Jan Beulich wrote: On 29.11.18 at 20:20, wrote: >> get_cpuid_domain_info() has two conflicting return styles - either -error for >> local failures, or -1/errno for hypercall failures. Switch to consistently >> use -error. >> >> While fixing the xc_get_cpu_featureset(), take the opportunity to remove the >> redundancy and move it to be adjacent to the other featureset handling. > Having noticed the redundancy before, I was assuming this was done > to prevent altering host_nr_features early, just in case future code > additions would want to use its original value. The way it is now, the > code folding looks correct. I don't recall a deliberate intention to do that, but it was a long time ago now. > >> Signed-off-by: Andrew Cooper > Reviewed-by: Jan Beulich Thanks. ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH 03/23] arm: remove the mapping_error dma_map_ops method
Arm already returns (~(dma_addr_t)0x0) on mapping failures, so we can switch over to returning DMA_MAPPING_ERROR and let the core dma-mapping code handle the rest. Signed-off-by: Christoph Hellwig --- arch/arm/common/dmabounce.c | 12 +++--- arch/arm/include/asm/dma-iommu.h | 2 -- arch/arm/mm/dma-mapping.c| 39 3 files changed, 18 insertions(+), 35 deletions(-) diff --git a/arch/arm/common/dmabounce.c b/arch/arm/common/dmabounce.c index 9a92de63426f..5ba4622030ca 100644 --- a/arch/arm/common/dmabounce.c +++ b/arch/arm/common/dmabounce.c @@ -257,7 +257,7 @@ static inline dma_addr_t map_single(struct device *dev, void *ptr, size_t size, if (buf == NULL) { dev_err(dev, "%s: unable to map unsafe buffer %p!\n", __func__, ptr); - return ARM_MAPPING_ERROR; + return DMA_MAPPING_ERROR; } dev_dbg(dev, "%s: unsafe buffer %p (dma=%#x) mapped to %p (dma=%#x)\n", @@ -327,7 +327,7 @@ static dma_addr_t dmabounce_map_page(struct device *dev, struct page *page, ret = needs_bounce(dev, dma_addr, size); if (ret < 0) - return ARM_MAPPING_ERROR; + return DMA_MAPPING_ERROR; if (ret == 0) { arm_dma_ops.sync_single_for_device(dev, dma_addr, size, dir); @@ -336,7 +336,7 @@ static dma_addr_t dmabounce_map_page(struct device *dev, struct page *page, if (PageHighMem(page)) { dev_err(dev, "DMA buffer bouncing of HIGHMEM pages is not supported\n"); - return ARM_MAPPING_ERROR; + return DMA_MAPPING_ERROR; } return map_single(dev, page_address(page) + offset, size, dir, attrs); @@ -453,11 +453,6 @@ static int dmabounce_dma_supported(struct device *dev, u64 dma_mask) return arm_dma_ops.dma_supported(dev, dma_mask); } -static int dmabounce_mapping_error(struct device *dev, dma_addr_t dma_addr) -{ - return arm_dma_ops.mapping_error(dev, dma_addr); -} - static const struct dma_map_ops dmabounce_ops = { .alloc = arm_dma_alloc, .free = arm_dma_free, @@ -472,7 +467,6 @@ static const struct dma_map_ops dmabounce_ops = { .sync_sg_for_cpu= arm_dma_sync_sg_for_cpu, .sync_sg_for_device = arm_dma_sync_sg_for_device, .dma_supported = dmabounce_dma_supported, - .mapping_error = dmabounce_mapping_error, }; static int dmabounce_init_pool(struct dmabounce_pool *pool, struct device *dev, diff --git a/arch/arm/include/asm/dma-iommu.h b/arch/arm/include/asm/dma-iommu.h index 6821f1249300..772f48ef84b7 100644 --- a/arch/arm/include/asm/dma-iommu.h +++ b/arch/arm/include/asm/dma-iommu.h @@ -9,8 +9,6 @@ #include #include -#define ARM_MAPPING_ERROR (~(dma_addr_t)0x0) - struct dma_iommu_mapping { /* iommu specific data */ struct iommu_domain *domain; diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c index 661fe48ab78d..2cfb17bad1e6 100644 --- a/arch/arm/mm/dma-mapping.c +++ b/arch/arm/mm/dma-mapping.c @@ -179,11 +179,6 @@ static void arm_dma_sync_single_for_device(struct device *dev, __dma_page_cpu_to_dev(page, offset, size, dir); } -static int arm_dma_mapping_error(struct device *dev, dma_addr_t dma_addr) -{ - return dma_addr == ARM_MAPPING_ERROR; -} - const struct dma_map_ops arm_dma_ops = { .alloc = arm_dma_alloc, .free = arm_dma_free, @@ -197,7 +192,6 @@ const struct dma_map_ops arm_dma_ops = { .sync_single_for_device = arm_dma_sync_single_for_device, .sync_sg_for_cpu= arm_dma_sync_sg_for_cpu, .sync_sg_for_device = arm_dma_sync_sg_for_device, - .mapping_error = arm_dma_mapping_error, .dma_supported = arm_dma_supported, }; EXPORT_SYMBOL(arm_dma_ops); @@ -217,7 +211,6 @@ const struct dma_map_ops arm_coherent_dma_ops = { .get_sgtable= arm_dma_get_sgtable, .map_page = arm_coherent_dma_map_page, .map_sg = arm_dma_map_sg, - .mapping_error = arm_dma_mapping_error, .dma_supported = arm_dma_supported, }; EXPORT_SYMBOL(arm_coherent_dma_ops); @@ -774,7 +767,7 @@ static void *__dma_alloc(struct device *dev, size_t size, dma_addr_t *handle, gfp &= ~(__GFP_COMP); args.gfp = gfp; - *handle = ARM_MAPPING_ERROR; + *handle = DMA_MAPPING_ERROR; allowblock = gfpflags_allow_blocking(gfp); cma = allowblock ? dev_get_cma_area(dev) : false; @@ -1217,7 +1210,7 @@ static inline dma_addr_t __alloc_iova(struct dma_iommu_mapping *mapping, if (i == mapping->nr_bitmaps) { if (extend_iommu_mapping(mapping)) { spin_unlock_irqrestore(&mapping->lock, flags); - ret
[Xen-devel] [PATCH 01/23] dma-mapping: provide a generic DMA_MAPPING_ERROR
Error handling of the dma_map_single and dma_map_page APIs is a little problematic at the moment, in that we use different encodings in the returned dma_addr_t to indicate an error. That means we require an additional indirect call to figure out if a dma mapping call returned an error, and a lot of boilerplate code to implement these semantics. Instead return the maximum addressable value as the error. As long as we don't allow mapping single-byte ranges with single-byte alignment this value can never be a valid return. Additionaly if drivers do not check the return value from the dma_map* routines this values means they will generally not be pointed to actual memory. Once the default value is added here we can start removing the various mapping_error methods and just rely on this generic check. Signed-off-by: Christoph Hellwig --- include/linux/dma-mapping.h | 6 ++ 1 file changed, 6 insertions(+) diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h index 0f81c713f6e9..46bd612d929e 100644 --- a/include/linux/dma-mapping.h +++ b/include/linux/dma-mapping.h @@ -133,6 +133,8 @@ struct dma_map_ops { u64 (*get_required_mask)(struct device *dev); }; +#define DMA_MAPPING_ERROR (~(dma_addr_t)0) + extern const struct dma_map_ops dma_direct_ops; extern const struct dma_map_ops dma_virt_ops; @@ -576,6 +578,10 @@ static inline int dma_mapping_error(struct device *dev, dma_addr_t dma_addr) const struct dma_map_ops *ops = get_dma_ops(dev); debug_dma_mapping_error(dev, dma_addr); + + if (dma_addr == DMA_MAPPING_ERROR) + return 1; + if (ops->mapping_error) return ops->mapping_error(dev, dma_addr); return 0; -- 2.19.1 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] remove the ->mapping_error method from dma_map_ops V3
Error reporting for the dma_map_single and dma_map_page operations is currently a mess. Both APIs directly return the dma_addr_t to be used for the DMA, with a magic error escape that is specific to the instance and checked by another method provided. This has a few downsides: - the error check is easily forgotten and a __must_check marker doesn't help as the value always is consumed anyway - the error checking requires another indirect call, which have gotten incredibly expensive - a lot of code is wasted on implementing these methods The historical reason for this is that people thought DMA mappings would not fail when the API was created, which sounds like a really bad assumption in retrospective, and then we tried to cram error handling onto it later on. There basically are two variants: the error code is 0 because the implementation will never return 0 as a valid DMA address, or the error code is all-F as the implementation won't ever return an address that high. The old AMD GART is the only one not falling into these two camps as it picks sort of a relative zero relative to where it is mapped. The 0 return doesn't work for direct mappings that have ram at address zero and a lot of IOMMUs that start allocating bus space from address zero, so we can't consolidate on that, but I think we can move everyone to all-Fs, which the patch here does. The reason for that is that there is only one way to ever get this address: by doing a 1-byte long, 1-byte aligned mapping, but all our mappings are not only longer but generally aligned, and the mappings have to keep at least the basic alignment. A git tree is also available here: git://git.infradead.org/users/hch/misc.git dma-mapping-error.3 Gitweb: http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/dma-mapping-error.3 Changes since v2: - fix a compile error in the ia64 sba_iommu driver - return an errno value from dma_mapping_error Changes since v1: - dropped the signature change - split into multiple patches - fixed the iova allocator return check in amd-iommu - remove EMERGENCY_PAGES in amd_gart and calgary ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH 08/23] parisc/ccio: remove the mapping_error dma_map_ops method
The CCIO iommu code already returns (~(dma_addr_t)0x0) on mapping failures, so we can switch over to returning DMA_MAPPING_ERROR and let the core dma-mapping code handle the rest. Signed-off-by: Christoph Hellwig --- drivers/parisc/ccio-dma.c | 10 +- 1 file changed, 1 insertion(+), 9 deletions(-) diff --git a/drivers/parisc/ccio-dma.c b/drivers/parisc/ccio-dma.c index 701a7d6a74d5..714aac72df0e 100644 --- a/drivers/parisc/ccio-dma.c +++ b/drivers/parisc/ccio-dma.c @@ -110,8 +110,6 @@ #define CMD_TLB_DIRECT_WRITE 35 /* IO_COMMAND for I/O TLB Writes */ #define CMD_TLB_PURGE33 /* IO_COMMAND to Purge I/O TLB entry */ -#define CCIO_MAPPING_ERROR(~(dma_addr_t)0) - struct ioa_registers { /* Runway Supervisory Set */ int32_tunused1[12]; @@ -740,7 +738,7 @@ ccio_map_single(struct device *dev, void *addr, size_t size, BUG_ON(!dev); ioc = GET_IOC(dev); if (!ioc) - return CCIO_MAPPING_ERROR; + return DMA_MAPPING_ERROR; BUG_ON(size <= 0); @@ -1021,11 +1019,6 @@ ccio_unmap_sg(struct device *dev, struct scatterlist *sglist, int nents, DBG_RUN_SG("%s() DONE (nents %d)\n", __func__, nents); } -static int ccio_mapping_error(struct device *dev, dma_addr_t dma_addr) -{ - return dma_addr == CCIO_MAPPING_ERROR; -} - static const struct dma_map_ops ccio_ops = { .dma_supported =ccio_dma_supported, .alloc =ccio_alloc, @@ -1034,7 +1027,6 @@ static const struct dma_map_ops ccio_ops = { .unmap_page = ccio_unmap_page, .map_sg = ccio_map_sg, .unmap_sg = ccio_unmap_sg, - .mapping_error =ccio_mapping_error, }; #ifdef CONFIG_PROC_FS -- 2.19.1 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH 06/23] s390: remove the mapping_error dma_map_ops method
S390 already returns (~(dma_addr_t)0x0) on mapping failures, so we can switch over to returning DMA_MAPPING_ERROR and let the core dma-mapping code handle the rest. Signed-off-by: Christoph Hellwig --- arch/s390/pci/pci_dma.c | 18 +- 1 file changed, 5 insertions(+), 13 deletions(-) diff --git a/arch/s390/pci/pci_dma.c b/arch/s390/pci/pci_dma.c index d387a0fbdd7e..346ba382193a 100644 --- a/arch/s390/pci/pci_dma.c +++ b/arch/s390/pci/pci_dma.c @@ -15,8 +15,6 @@ #include #include -#define S390_MAPPING_ERROR (~(dma_addr_t) 0x0) - static struct kmem_cache *dma_region_table_cache; static struct kmem_cache *dma_page_table_cache; static int s390_iommu_strict; @@ -301,7 +299,7 @@ static dma_addr_t dma_alloc_address(struct device *dev, int size) out_error: spin_unlock_irqrestore(&zdev->iommu_bitmap_lock, flags); - return S390_MAPPING_ERROR; + return DMA_MAPPING_ERROR; } static void dma_free_address(struct device *dev, dma_addr_t dma_addr, int size) @@ -349,7 +347,7 @@ static dma_addr_t s390_dma_map_pages(struct device *dev, struct page *page, /* This rounds up number of pages based on size and offset */ nr_pages = iommu_num_pages(pa, size, PAGE_SIZE); dma_addr = dma_alloc_address(dev, nr_pages); - if (dma_addr == S390_MAPPING_ERROR) { + if (dma_addr == DMA_MAPPING_ERROR) { ret = -ENOSPC; goto out_err; } @@ -372,7 +370,7 @@ static dma_addr_t s390_dma_map_pages(struct device *dev, struct page *page, out_err: zpci_err("map error:\n"); zpci_err_dma(ret, pa); - return S390_MAPPING_ERROR; + return DMA_MAPPING_ERROR; } static void s390_dma_unmap_pages(struct device *dev, dma_addr_t dma_addr, @@ -449,7 +447,7 @@ static int __s390_dma_map_sg(struct device *dev, struct scatterlist *sg, int ret; dma_addr_base = dma_alloc_address(dev, nr_pages); - if (dma_addr_base == S390_MAPPING_ERROR) + if (dma_addr_base == DMA_MAPPING_ERROR) return -ENOMEM; dma_addr = dma_addr_base; @@ -496,7 +494,7 @@ static int s390_dma_map_sg(struct device *dev, struct scatterlist *sg, for (i = 1; i < nr_elements; i++) { s = sg_next(s); - s->dma_address = S390_MAPPING_ERROR; + s->dma_address = DMA_MAPPING_ERROR; s->dma_length = 0; if (s->offset || (size & ~PAGE_MASK) || @@ -546,11 +544,6 @@ static void s390_dma_unmap_sg(struct device *dev, struct scatterlist *sg, } } -static int s390_mapping_error(struct device *dev, dma_addr_t dma_addr) -{ - return dma_addr == S390_MAPPING_ERROR; -} - int zpci_dma_init_device(struct zpci_dev *zdev) { int rc; @@ -675,7 +668,6 @@ const struct dma_map_ops s390_pci_dma_ops = { .unmap_sg = s390_dma_unmap_sg, .map_page = s390_dma_map_pages, .unmap_page = s390_dma_unmap_pages, - .mapping_error = s390_mapping_error, /* dma_supported is unconditionally true without a callback */ }; EXPORT_SYMBOL_GPL(s390_pci_dma_ops); -- 2.19.1 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH 07/23] sparc: remove the mapping_error dma_map_ops method
Sparc already returns (~(dma_addr_t)0x0) on mapping failures, so we can switch over to returning DMA_MAPPING_ERROR and let the core dma-mapping code handle the rest. Signed-off-by: Christoph Hellwig --- arch/sparc/kernel/iommu.c| 12 +++- arch/sparc/kernel/iommu_common.h | 2 -- arch/sparc/kernel/pci_sun4v.c| 14 -- 3 files changed, 7 insertions(+), 21 deletions(-) diff --git a/arch/sparc/kernel/iommu.c b/arch/sparc/kernel/iommu.c index 40d008b0bd3e..0626bae5e3da 100644 --- a/arch/sparc/kernel/iommu.c +++ b/arch/sparc/kernel/iommu.c @@ -315,7 +315,7 @@ static dma_addr_t dma_4u_map_page(struct device *dev, struct page *page, bad_no_ctx: if (printk_ratelimit()) WARN_ON(1); - return SPARC_MAPPING_ERROR; + return DMA_MAPPING_ERROR; } static void strbuf_flush(struct strbuf *strbuf, struct iommu *iommu, @@ -548,7 +548,7 @@ static int dma_4u_map_sg(struct device *dev, struct scatterlist *sglist, if (outcount < incount) { outs = sg_next(outs); - outs->dma_address = SPARC_MAPPING_ERROR; + outs->dma_address = DMA_MAPPING_ERROR; outs->dma_length = 0; } @@ -574,7 +574,7 @@ static int dma_4u_map_sg(struct device *dev, struct scatterlist *sglist, iommu_tbl_range_free(&iommu->tbl, vaddr, npages, IOMMU_ERROR_CODE); - s->dma_address = SPARC_MAPPING_ERROR; + s->dma_address = DMA_MAPPING_ERROR; s->dma_length = 0; } if (s == outs) @@ -742,11 +742,6 @@ static void dma_4u_sync_sg_for_cpu(struct device *dev, spin_unlock_irqrestore(&iommu->lock, flags); } -static int dma_4u_mapping_error(struct device *dev, dma_addr_t dma_addr) -{ - return dma_addr == SPARC_MAPPING_ERROR; -} - static int dma_4u_supported(struct device *dev, u64 device_mask) { struct iommu *iommu = dev->archdata.iommu; @@ -772,7 +767,6 @@ static const struct dma_map_ops sun4u_dma_ops = { .sync_single_for_cpu= dma_4u_sync_single_for_cpu, .sync_sg_for_cpu= dma_4u_sync_sg_for_cpu, .dma_supported = dma_4u_supported, - .mapping_error = dma_4u_mapping_error, }; const struct dma_map_ops *dma_ops = &sun4u_dma_ops; diff --git a/arch/sparc/kernel/iommu_common.h b/arch/sparc/kernel/iommu_common.h index e3c02ba32500..d62ed9c5682d 100644 --- a/arch/sparc/kernel/iommu_common.h +++ b/arch/sparc/kernel/iommu_common.h @@ -48,6 +48,4 @@ static inline int is_span_boundary(unsigned long entry, return iommu_is_span_boundary(entry, nr, shift, boundary_size); } -#define SPARC_MAPPING_ERROR(~(dma_addr_t)0x0) - #endif /* _IOMMU_COMMON_H */ diff --git a/arch/sparc/kernel/pci_sun4v.c b/arch/sparc/kernel/pci_sun4v.c index 565d9ac883d0..fa0e42b4cbfb 100644 --- a/arch/sparc/kernel/pci_sun4v.c +++ b/arch/sparc/kernel/pci_sun4v.c @@ -414,12 +414,12 @@ static dma_addr_t dma_4v_map_page(struct device *dev, struct page *page, bad: if (printk_ratelimit()) WARN_ON(1); - return SPARC_MAPPING_ERROR; + return DMA_MAPPING_ERROR; iommu_map_fail: local_irq_restore(flags); iommu_tbl_range_free(tbl, bus_addr, npages, IOMMU_ERROR_CODE); - return SPARC_MAPPING_ERROR; + return DMA_MAPPING_ERROR; } static void dma_4v_unmap_page(struct device *dev, dma_addr_t bus_addr, @@ -592,7 +592,7 @@ static int dma_4v_map_sg(struct device *dev, struct scatterlist *sglist, if (outcount < incount) { outs = sg_next(outs); - outs->dma_address = SPARC_MAPPING_ERROR; + outs->dma_address = DMA_MAPPING_ERROR; outs->dma_length = 0; } @@ -609,7 +609,7 @@ static int dma_4v_map_sg(struct device *dev, struct scatterlist *sglist, iommu_tbl_range_free(tbl, vaddr, npages, IOMMU_ERROR_CODE); /* XXX demap? XXX */ - s->dma_address = SPARC_MAPPING_ERROR; + s->dma_address = DMA_MAPPING_ERROR; s->dma_length = 0; } if (s == outs) @@ -688,11 +688,6 @@ static int dma_4v_supported(struct device *dev, u64 device_mask) return pci64_dma_supported(to_pci_dev(dev), device_mask); } -static int dma_4v_mapping_error(struct device *dev, dma_addr_t dma_addr) -{ - return dma_addr == SPARC_MAPPING_ERROR; -} - static const struct dma_map_ops sun4v_dma_ops = { .alloc = dma_4v_alloc_coherent, .free = dma_4v_free_coherent, @@ -701,7 +696,6 @@ static const struct dma_map_ops sun4v_dma_ops = { .map_sg = dma_4v_map_sg, .unmap_sg = dma_4v_unmap_sg,
[Xen-devel] [PATCH 04/23] powerpc/iommu: remove the mapping_error dma_map_ops method
The powerpc iommu code already returns (~(dma_addr_t)0x0) on mapping failures, so we can switch over to returning DMA_MAPPING_ERROR and let the core dma-mapping code handle the rest. Signed-off-by: Christoph Hellwig --- arch/powerpc/include/asm/iommu.h | 4 arch/powerpc/kernel/dma-iommu.c | 6 -- arch/powerpc/kernel/iommu.c | 28 ++-- arch/powerpc/platforms/cell/iommu.c | 1 - arch/powerpc/platforms/pseries/vio.c | 3 +-- 5 files changed, 15 insertions(+), 27 deletions(-) diff --git a/arch/powerpc/include/asm/iommu.h b/arch/powerpc/include/asm/iommu.h index 35db0cbc9222..55312990d1d2 100644 --- a/arch/powerpc/include/asm/iommu.h +++ b/arch/powerpc/include/asm/iommu.h @@ -143,8 +143,6 @@ struct scatterlist; #ifdef CONFIG_PPC64 -#define IOMMU_MAPPING_ERROR(~(dma_addr_t)0x0) - static inline void set_iommu_table_base(struct device *dev, struct iommu_table *base) { @@ -242,8 +240,6 @@ static inline int __init tce_iommu_bus_notifier_init(void) } #endif /* !CONFIG_IOMMU_API */ -int dma_iommu_mapping_error(struct device *dev, dma_addr_t dma_addr); - #else static inline void *get_iommu_table_base(struct device *dev) diff --git a/arch/powerpc/kernel/dma-iommu.c b/arch/powerpc/kernel/dma-iommu.c index f9fe2080ceb9..5ebacf0fe41a 100644 --- a/arch/powerpc/kernel/dma-iommu.c +++ b/arch/powerpc/kernel/dma-iommu.c @@ -106,11 +106,6 @@ static u64 dma_iommu_get_required_mask(struct device *dev) return mask; } -int dma_iommu_mapping_error(struct device *dev, dma_addr_t dma_addr) -{ - return dma_addr == IOMMU_MAPPING_ERROR; -} - struct dma_map_ops dma_iommu_ops = { .alloc = dma_iommu_alloc_coherent, .free = dma_iommu_free_coherent, @@ -121,6 +116,5 @@ struct dma_map_ops dma_iommu_ops = { .map_page = dma_iommu_map_page, .unmap_page = dma_iommu_unmap_page, .get_required_mask = dma_iommu_get_required_mask, - .mapping_error = dma_iommu_mapping_error, }; EXPORT_SYMBOL(dma_iommu_ops); diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c index f0dc680e659a..ca7f73488c62 100644 --- a/arch/powerpc/kernel/iommu.c +++ b/arch/powerpc/kernel/iommu.c @@ -197,11 +197,11 @@ static unsigned long iommu_range_alloc(struct device *dev, if (unlikely(npages == 0)) { if (printk_ratelimit()) WARN_ON(1); - return IOMMU_MAPPING_ERROR; + return DMA_MAPPING_ERROR; } if (should_fail_iommu(dev)) - return IOMMU_MAPPING_ERROR; + return DMA_MAPPING_ERROR; /* * We don't need to disable preemption here because any CPU can @@ -277,7 +277,7 @@ static unsigned long iommu_range_alloc(struct device *dev, } else { /* Give up */ spin_unlock_irqrestore(&(pool->lock), flags); - return IOMMU_MAPPING_ERROR; + return DMA_MAPPING_ERROR; } } @@ -309,13 +309,13 @@ static dma_addr_t iommu_alloc(struct device *dev, struct iommu_table *tbl, unsigned long attrs) { unsigned long entry; - dma_addr_t ret = IOMMU_MAPPING_ERROR; + dma_addr_t ret = DMA_MAPPING_ERROR; int build_fail; entry = iommu_range_alloc(dev, tbl, npages, NULL, mask, align_order); - if (unlikely(entry == IOMMU_MAPPING_ERROR)) - return IOMMU_MAPPING_ERROR; + if (unlikely(entry == DMA_MAPPING_ERROR)) + return DMA_MAPPING_ERROR; entry += tbl->it_offset;/* Offset into real TCE table */ ret = entry << tbl->it_page_shift; /* Set the return dma address */ @@ -327,12 +327,12 @@ static dma_addr_t iommu_alloc(struct device *dev, struct iommu_table *tbl, /* tbl->it_ops->set() only returns non-zero for transient errors. * Clean up the table bitmap in this case and return -* IOMMU_MAPPING_ERROR. For all other errors the functionality is +* DMA_MAPPING_ERROR. For all other errors the functionality is * not altered. */ if (unlikely(build_fail)) { __iommu_free(tbl, ret, npages); - return IOMMU_MAPPING_ERROR; + return DMA_MAPPING_ERROR; } /* Flush/invalidate TLB caches if necessary */ @@ -477,7 +477,7 @@ int ppc_iommu_map_sg(struct device *dev, struct iommu_table *tbl, DBG(" - vaddr: %lx, size: %lx\n", vaddr, slen); /* Handle failure */ - if (unlikely(entry == IOMMU_MAPPING_ERROR)) { + if (unlikely(entry == DMA_MAPPING_ERROR)) { if (!(attrs & DMA_ATTR_NO_WARN) && printk_ratelimit())
[Xen-devel] [PATCH 10/23] arm64: remove the dummy_dma_ops mapping_error method
Just return DMA_MAPPING_ERROR from __dummy_map_page and let the core dma-mapping code handle the rest. Signed-off-by: Christoph Hellwig --- arch/arm64/mm/dma-mapping.c | 8 +--- 1 file changed, 1 insertion(+), 7 deletions(-) diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c index a3ac26284845..fdc26ea5036c 100644 --- a/arch/arm64/mm/dma-mapping.c +++ b/arch/arm64/mm/dma-mapping.c @@ -282,7 +282,7 @@ static dma_addr_t __dummy_map_page(struct device *dev, struct page *page, enum dma_data_direction dir, unsigned long attrs) { - return 0; + return DMA_MAPPING_ERROR; } static void __dummy_unmap_page(struct device *dev, dma_addr_t dev_addr, @@ -317,11 +317,6 @@ static void __dummy_sync_sg(struct device *dev, { } -static int __dummy_mapping_error(struct device *hwdev, dma_addr_t dma_addr) -{ - return 1; -} - static int __dummy_dma_supported(struct device *hwdev, u64 mask) { return 0; @@ -339,7 +334,6 @@ const struct dma_map_ops dummy_dma_ops = { .sync_single_for_device = __dummy_sync_single, .sync_sg_for_cpu= __dummy_sync_sg, .sync_sg_for_device = __dummy_sync_sg, - .mapping_error = __dummy_mapping_error, .dma_supported = __dummy_dma_supported, }; EXPORT_SYMBOL(dummy_dma_ops); -- 2.19.1 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH 15/23] x86/amd_gart: remove the mapping_error dma_map_ops method
Return DMA_MAPPING_ERROR instead of the magic bad_dma_addr on a dma mapping failure and let the core dma-mapping code handle the rest. Remove the magic EMERGENCY_PAGES that the bad_dma_addr gets redirected to. Signed-off-by: Christoph Hellwig --- arch/x86/kernel/amd_gart_64.c | 39 ++- 1 file changed, 6 insertions(+), 33 deletions(-) diff --git a/arch/x86/kernel/amd_gart_64.c b/arch/x86/kernel/amd_gart_64.c index 3f9d1b4019bb..4e733de93f41 100644 --- a/arch/x86/kernel/amd_gart_64.c +++ b/arch/x86/kernel/amd_gart_64.c @@ -50,8 +50,6 @@ static unsigned long iommu_pages; /* .. and in pages */ static u32 *iommu_gatt_base; /* Remapping table */ -static dma_addr_t bad_dma_addr; - /* * If this is disabled the IOMMU will use an optimized flushing strategy * of only flushing when an mapping is reused. With it true the GART is @@ -74,8 +72,6 @@ static u32 gart_unmapped_entry; (((x) & 0xf000) | (((x) >> 32) << 4) | GPTE_VALID | GPTE_COHERENT) #define GPTE_DECODE(x) (((x) & 0xf000) | (((u64)(x) & 0xff0) << 28)) -#define EMERGENCY_PAGES 32 /* = 128KB */ - #ifdef CONFIG_AGP #define AGPEXTERN extern #else @@ -184,14 +180,6 @@ static void iommu_full(struct device *dev, size_t size, int dir) */ dev_err(dev, "PCI-DMA: Out of IOMMU space for %lu bytes\n", size); - - if (size > PAGE_SIZE*EMERGENCY_PAGES) { - if (dir == PCI_DMA_FROMDEVICE || dir == PCI_DMA_BIDIRECTIONAL) - panic("PCI-DMA: Memory would be corrupted\n"); - if (dir == PCI_DMA_TODEVICE || dir == PCI_DMA_BIDIRECTIONAL) - panic(KERN_ERR - "PCI-DMA: Random memory would be DMAed\n"); - } #ifdef CONFIG_IOMMU_LEAK dump_leak(); #endif @@ -220,7 +208,7 @@ static dma_addr_t dma_map_area(struct device *dev, dma_addr_t phys_mem, int i; if (unlikely(phys_mem + size > GART_MAX_PHYS_ADDR)) - return bad_dma_addr; + return DMA_MAPPING_ERROR; iommu_page = alloc_iommu(dev, npages, align_mask); if (iommu_page == -1) { @@ -229,7 +217,7 @@ static dma_addr_t dma_map_area(struct device *dev, dma_addr_t phys_mem, if (panic_on_overflow) panic("dma_map_area overflow %lu bytes\n", size); iommu_full(dev, size, dir); - return bad_dma_addr; + return DMA_MAPPING_ERROR; } for (i = 0; i < npages; i++) { @@ -271,7 +259,7 @@ static void gart_unmap_page(struct device *dev, dma_addr_t dma_addr, int npages; int i; - if (dma_addr < iommu_bus_base + EMERGENCY_PAGES*PAGE_SIZE || + if (dma_addr == DMA_MAPPING_ERROR || dma_addr >= iommu_bus_base + iommu_size) return; @@ -315,7 +303,7 @@ static int dma_map_sg_nonforce(struct device *dev, struct scatterlist *sg, if (nonforced_iommu(dev, addr, s->length)) { addr = dma_map_area(dev, addr, s->length, dir, 0); - if (addr == bad_dma_addr) { + if (addr == DMA_MAPPING_ERROR) { if (i > 0) gart_unmap_sg(dev, sg, i, dir, 0); nents = 0; @@ -471,7 +459,7 @@ static int gart_map_sg(struct device *dev, struct scatterlist *sg, int nents, iommu_full(dev, pages << PAGE_SHIFT, dir); for_each_sg(sg, s, nents, i) - s->dma_address = bad_dma_addr; + s->dma_address = DMA_MAPPING_ERROR; return 0; } @@ -490,7 +478,7 @@ gart_alloc_coherent(struct device *dev, size_t size, dma_addr_t *dma_addr, *dma_addr = dma_map_area(dev, virt_to_phys(vaddr), size, DMA_BIDIRECTIONAL, (1UL << get_order(size)) - 1); flush_gart(); - if (unlikely(*dma_addr == bad_dma_addr)) + if (unlikely(*dma_addr == DMA_MAPPING_ERROR)) goto out_free; return vaddr; out_free: @@ -507,11 +495,6 @@ gart_free_coherent(struct device *dev, size_t size, void *vaddr, dma_direct_free_pages(dev, size, vaddr, dma_addr, attrs); } -static int gart_mapping_error(struct device *dev, dma_addr_t dma_addr) -{ - return (dma_addr == bad_dma_addr); -} - static int no_agp; static __init unsigned long check_iommu_size(unsigned long aper, u64 aper_size) @@ -695,7 +678,6 @@ static const struct dma_map_ops gart_dma_ops = { .unmap_page = gart_unmap_page, .alloc = gart_alloc_coherent, .free = gart_free_coherent, - .mapping_error = gart_mapping_error, .dma_supported = dma_direct_supported, }; @@ -784,19 +766,12 @@ int __init gart_iommu_init(void) } #endif - /* -* Out of IOMMU space handling.
[Xen-devel] [PATCH 12/23] ia64/sba_iommu: improve internal map_page users
Remove the odd sba_{un,}map_single_attrs wrappers, check errors everywhere. Signed-off-by: Christoph Hellwig --- arch/ia64/hp/common/sba_iommu.c | 73 + 1 file changed, 29 insertions(+), 44 deletions(-) diff --git a/arch/ia64/hp/common/sba_iommu.c b/arch/ia64/hp/common/sba_iommu.c index e8a93b07283e..c56f28c98102 100644 --- a/arch/ia64/hp/common/sba_iommu.c +++ b/arch/ia64/hp/common/sba_iommu.c @@ -907,11 +907,12 @@ sba_mark_invalid(struct ioc *ioc, dma_addr_t iova, size_t byte_cnt) } /** - * sba_map_single_attrs - map one buffer and return IOVA for DMA + * sba_map_page - map one buffer and return IOVA for DMA * @dev: instance of PCI owned by the driver that's asking. - * @addr: driver buffer to map. - * @size: number of bytes to map in driver buffer. - * @dir: R/W or both. + * @page: page to map + * @poff: offset into page + * @size: number of bytes to map + * @dir: dma direction * @attrs: optional dma attributes * * See Documentation/DMA-API-HOWTO.txt @@ -944,7 +945,7 @@ static dma_addr_t sba_map_page(struct device *dev, struct page *page, ** Device is bit capable of DMA'ing to the buffer... ** just return the PCI address of ptr */ - DBG_BYPASS("sba_map_single_attrs() bypass mask/addr: " + DBG_BYPASS("sba_map_page() bypass mask/addr: " "0x%lx/0x%lx\n", to_pci_dev(dev)->dma_mask, pci_addr); return pci_addr; @@ -966,7 +967,7 @@ static dma_addr_t sba_map_page(struct device *dev, struct page *page, #ifdef ASSERT_PDIR_SANITY spin_lock_irqsave(&ioc->res_lock, flags); - if (sba_check_pdir(ioc,"Check before sba_map_single_attrs()")) + if (sba_check_pdir(ioc,"Check before sba_map_page()")) panic("Sanity check failed"); spin_unlock_irqrestore(&ioc->res_lock, flags); #endif @@ -997,20 +998,12 @@ static dma_addr_t sba_map_page(struct device *dev, struct page *page, /* form complete address */ #ifdef ASSERT_PDIR_SANITY spin_lock_irqsave(&ioc->res_lock, flags); - sba_check_pdir(ioc,"Check after sba_map_single_attrs()"); + sba_check_pdir(ioc,"Check after sba_map_page()"); spin_unlock_irqrestore(&ioc->res_lock, flags); #endif return SBA_IOVA(ioc, iovp, offset); } -static dma_addr_t sba_map_single_attrs(struct device *dev, void *addr, - size_t size, enum dma_data_direction dir, - unsigned long attrs) -{ - return sba_map_page(dev, virt_to_page(addr), - (unsigned long)addr & ~PAGE_MASK, size, dir, attrs); -} - #ifdef ENABLE_MARK_CLEAN static SBA_INLINE void sba_mark_clean(struct ioc *ioc, dma_addr_t iova, size_t size) @@ -1036,7 +1029,7 @@ sba_mark_clean(struct ioc *ioc, dma_addr_t iova, size_t size) #endif /** - * sba_unmap_single_attrs - unmap one IOVA and free resources + * sba_unmap_page - unmap one IOVA and free resources * @dev: instance of PCI owned by the driver that's asking. * @iova: IOVA of driver buffer previously mapped. * @size: number of bytes mapped in driver buffer. @@ -1063,7 +1056,7 @@ static void sba_unmap_page(struct device *dev, dma_addr_t iova, size_t size, /* ** Address does not fall w/in IOVA, must be bypassing */ - DBG_BYPASS("sba_unmap_single_attrs() bypass addr: 0x%lx\n", + DBG_BYPASS("sba_unmap_page() bypass addr: 0x%lx\n", iova); #ifdef ENABLE_MARK_CLEAN @@ -1114,12 +1107,6 @@ static void sba_unmap_page(struct device *dev, dma_addr_t iova, size_t size, #endif /* DELAYED_RESOURCE_CNT == 0 */ } -void sba_unmap_single_attrs(struct device *dev, dma_addr_t iova, size_t size, - enum dma_data_direction dir, unsigned long attrs) -{ - sba_unmap_page(dev, iova, size, dir, attrs); -} - /** * sba_alloc_coherent - allocate/map shared mem for DMA * @dev: instance of PCI owned by the driver that's asking. @@ -1132,30 +1119,24 @@ static void * sba_alloc_coherent(struct device *dev, size_t size, dma_addr_t *dma_handle, gfp_t flags, unsigned long attrs) { + struct page *page; struct ioc *ioc; + int node = -1; void *addr; ioc = GET_IOC(dev); ASSERT(ioc); - #ifdef CONFIG_NUMA - { - struct page *page; - - page = alloc_pages_node(ioc->node, flags, get_order(size)); - if (unlikely(!page)) - return NULL; - - addr = page_address(page); - } -#else - addr = (void *) __get_free_pages(flags, get_order(size)); + node = ioc->node; #endif - if (unlikely(!addr)) + + page = alloc_pages_node(node, flags, get_order(size)); + if (unlikely(!page)) return NULL; + a
[Xen-devel] [PATCH 05/23] mips/jazz: remove the mapping_error dma_map_ops method
The Jazz iommu code already returns (~(dma_addr_t)0x0) on mapping failures, so we can switch over to returning DMA_MAPPING_ERROR and let the core dma-mapping code handle the rest. Signed-off-by: Christoph Hellwig --- arch/mips/include/asm/jazzdma.h | 6 -- arch/mips/jazz/jazzdma.c| 16 +--- 2 files changed, 5 insertions(+), 17 deletions(-) diff --git a/arch/mips/include/asm/jazzdma.h b/arch/mips/include/asm/jazzdma.h index d913439c738c..d13f940022d5 100644 --- a/arch/mips/include/asm/jazzdma.h +++ b/arch/mips/include/asm/jazzdma.h @@ -39,12 +39,6 @@ extern int vdma_get_enable(int channel); #define VDMA_PAGE(a) ((unsigned int)(a) >> 12) #define VDMA_OFFSET(a) ((unsigned int)(a) & (VDMA_PAGESIZE-1)) -/* - * error code returned by vdma_alloc() - * (See also arch/mips/kernel/jazzdma.c) - */ -#define VDMA_ERROR 0x - /* * VDMA pagetable entry description */ diff --git a/arch/mips/jazz/jazzdma.c b/arch/mips/jazz/jazzdma.c index 4c41ed0a637e..6256d35dbf4d 100644 --- a/arch/mips/jazz/jazzdma.c +++ b/arch/mips/jazz/jazzdma.c @@ -104,12 +104,12 @@ unsigned long vdma_alloc(unsigned long paddr, unsigned long size) if (vdma_debug) printk("vdma_alloc: Invalid physical address: %08lx\n", paddr); - return VDMA_ERROR; /* invalid physical address */ + return DMA_MAPPING_ERROR; /* invalid physical address */ } if (size > 0x40 || size == 0) { if (vdma_debug) printk("vdma_alloc: Invalid size: %08lx\n", size); - return VDMA_ERROR; /* invalid physical address */ + return DMA_MAPPING_ERROR; /* invalid physical address */ } spin_lock_irqsave(&vdma_lock, flags); @@ -123,7 +123,7 @@ unsigned long vdma_alloc(unsigned long paddr, unsigned long size) first < VDMA_PGTBL_ENTRIES) first++; if (first + pages > VDMA_PGTBL_ENTRIES) { /* nothing free */ spin_unlock_irqrestore(&vdma_lock, flags); - return VDMA_ERROR; + return DMA_MAPPING_ERROR; } last = first + 1; @@ -569,7 +569,7 @@ static void *jazz_dma_alloc(struct device *dev, size_t size, return NULL; *dma_handle = vdma_alloc(virt_to_phys(ret), size); - if (*dma_handle == VDMA_ERROR) { + if (*dma_handle == DMA_MAPPING_ERROR) { dma_direct_free_pages(dev, size, ret, *dma_handle, attrs); return NULL; } @@ -620,7 +620,7 @@ static int jazz_dma_map_sg(struct device *dev, struct scatterlist *sglist, arch_sync_dma_for_device(dev, sg_phys(sg), sg->length, dir); sg->dma_address = vdma_alloc(sg_phys(sg), sg->length); - if (sg->dma_address == VDMA_ERROR) + if (sg->dma_address == DMA_MAPPING_ERROR) return 0; sg_dma_len(sg) = sg->length; } @@ -674,11 +674,6 @@ static void jazz_dma_sync_sg_for_cpu(struct device *dev, arch_sync_dma_for_cpu(dev, sg_phys(sg), sg->length, dir); } -static int jazz_dma_mapping_error(struct device *dev, dma_addr_t dma_addr) -{ - return dma_addr == VDMA_ERROR; -} - const struct dma_map_ops jazz_dma_ops = { .alloc = jazz_dma_alloc, .free = jazz_dma_free, @@ -692,6 +687,5 @@ const struct dma_map_ops jazz_dma_ops = { .sync_sg_for_device = jazz_dma_sync_sg_for_device, .dma_supported = dma_direct_supported, .cache_sync = arch_dma_cache_sync, - .mapping_error = jazz_dma_mapping_error, }; EXPORT_SYMBOL(jazz_dma_ops); -- 2.19.1 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH 13/23] ia64/sba_iommu: remove the mapping_error dma_map_ops method
Return DMA_MAPPING_ERROR instead of 0 on a dma mapping failure and let the core dma-mapping code handle the rest. Signed-off-by: Christoph Hellwig --- arch/ia64/hp/common/sba_iommu.c | 8 +--- 1 file changed, 1 insertion(+), 7 deletions(-) diff --git a/arch/ia64/hp/common/sba_iommu.c b/arch/ia64/hp/common/sba_iommu.c index c56f28c98102..0d21c0b5b23d 100644 --- a/arch/ia64/hp/common/sba_iommu.c +++ b/arch/ia64/hp/common/sba_iommu.c @@ -974,7 +974,7 @@ static dma_addr_t sba_map_page(struct device *dev, struct page *page, pide = sba_alloc_range(ioc, dev, size); if (pide < 0) - return 0; + return DMA_MAPPING_ERROR; iovp = (dma_addr_t) pide << iovp_shift; @@ -2155,11 +2155,6 @@ static int sba_dma_supported (struct device *dev, u64 mask) return ((mask & 0xUL) == 0xUL); } -static int sba_dma_mapping_error(struct device *dev, dma_addr_t dma_addr) -{ - return 0; -} - __setup("nosbagart", nosbagart); static int __init @@ -2193,7 +2188,6 @@ const struct dma_map_ops sba_dma_ops = { .map_sg = sba_map_sg_attrs, .unmap_sg = sba_unmap_sg_attrs, .dma_supported = sba_dma_supported, - .mapping_error = sba_dma_mapping_error, }; void sba_dma_init(void) -- 2.19.1 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH 14/23] ia64/sn: remove the mapping_error dma_map_ops method
Return DMA_MAPPING_ERROR instead of 0 on a dma mapping failure and let the core dma-mapping code handle the rest. Signed-off-by: Christoph Hellwig --- arch/ia64/sn/pci/pci_dma.c | 8 +--- 1 file changed, 1 insertion(+), 7 deletions(-) diff --git a/arch/ia64/sn/pci/pci_dma.c b/arch/ia64/sn/pci/pci_dma.c index 4ce4ee4ef9f2..b7d42e4edc1f 100644 --- a/arch/ia64/sn/pci/pci_dma.c +++ b/arch/ia64/sn/pci/pci_dma.c @@ -196,7 +196,7 @@ static dma_addr_t sn_dma_map_page(struct device *dev, struct page *page, if (!dma_addr) { printk(KERN_ERR "%s: out of ATEs\n", __func__); - return 0; + return DMA_MAPPING_ERROR; } return dma_addr; } @@ -314,11 +314,6 @@ static int sn_dma_map_sg(struct device *dev, struct scatterlist *sgl, return nhwentries; } -static int sn_dma_mapping_error(struct device *dev, dma_addr_t dma_addr) -{ - return 0; -} - static u64 sn_dma_get_required_mask(struct device *dev) { return DMA_BIT_MASK(64); @@ -441,7 +436,6 @@ static struct dma_map_ops sn_dma_ops = { .unmap_page = sn_dma_unmap_page, .map_sg = sn_dma_map_sg, .unmap_sg = sn_dma_unmap_sg, - .mapping_error = sn_dma_mapping_error, .dma_supported = sn_dma_supported, .get_required_mask = sn_dma_get_required_mask, }; -- 2.19.1 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH 09/23] parisc/sba_iommu: remove the mapping_error dma_map_ops method
The SBA iommu code already returns (~(dma_addr_t)0x0) on mapping failures, so we can switch over to returning DMA_MAPPING_ERROR and let the core dma-mapping code handle the rest. Signed-off-by: Christoph Hellwig --- drivers/parisc/sba_iommu.c | 10 +- 1 file changed, 1 insertion(+), 9 deletions(-) diff --git a/drivers/parisc/sba_iommu.c b/drivers/parisc/sba_iommu.c index c1e599a429af..452d306ce5cb 100644 --- a/drivers/parisc/sba_iommu.c +++ b/drivers/parisc/sba_iommu.c @@ -93,8 +93,6 @@ #define DEFAULT_DMA_HINT_REG 0 -#define SBA_MAPPING_ERROR(~(dma_addr_t)0) - struct sba_device *sba_list; EXPORT_SYMBOL_GPL(sba_list); @@ -725,7 +723,7 @@ sba_map_single(struct device *dev, void *addr, size_t size, ioc = GET_IOC(dev); if (!ioc) - return SBA_MAPPING_ERROR; + return DMA_MAPPING_ERROR; /* save offset bits */ offset = ((dma_addr_t) (long) addr) & ~IOVP_MASK; @@ -1080,11 +1078,6 @@ sba_unmap_sg(struct device *dev, struct scatterlist *sglist, int nents, } -static int sba_mapping_error(struct device *dev, dma_addr_t dma_addr) -{ - return dma_addr == SBA_MAPPING_ERROR; -} - static const struct dma_map_ops sba_ops = { .dma_supported =sba_dma_supported, .alloc =sba_alloc, @@ -1093,7 +1086,6 @@ static const struct dma_map_ops sba_ops = { .unmap_page = sba_unmap_page, .map_sg = sba_map_sg, .unmap_sg = sba_unmap_sg, - .mapping_error =sba_mapping_error, }; -- 2.19.1 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH 02/23] dma-direct: remove the mapping_error dma_map_ops method
The dma-direct code already returns (~(dma_addr_t)0x0) on mapping failures, so we can switch over to returning DMA_MAPPING_ERROR and let the core dma-mapping code handle the rest. Signed-off-by: Christoph Hellwig --- arch/powerpc/kernel/dma-swiotlb.c | 1 - include/linux/dma-direct.h| 3 --- kernel/dma/direct.c | 8 +--- kernel/dma/swiotlb.c | 11 +-- 4 files changed, 6 insertions(+), 17 deletions(-) diff --git a/arch/powerpc/kernel/dma-swiotlb.c b/arch/powerpc/kernel/dma-swiotlb.c index 5fc335f4d9cd..3d8df2cf8be9 100644 --- a/arch/powerpc/kernel/dma-swiotlb.c +++ b/arch/powerpc/kernel/dma-swiotlb.c @@ -59,7 +59,6 @@ const struct dma_map_ops powerpc_swiotlb_dma_ops = { .sync_single_for_device = swiotlb_sync_single_for_device, .sync_sg_for_cpu = swiotlb_sync_sg_for_cpu, .sync_sg_for_device = swiotlb_sync_sg_for_device, - .mapping_error = dma_direct_mapping_error, .get_required_mask = swiotlb_powerpc_get_required, }; diff --git a/include/linux/dma-direct.h b/include/linux/dma-direct.h index 9e66bfe369aa..e7600f92d876 100644 --- a/include/linux/dma-direct.h +++ b/include/linux/dma-direct.h @@ -5,8 +5,6 @@ #include #include -#define DIRECT_MAPPING_ERROR (~(dma_addr_t)0) - #ifdef CONFIG_ARCH_HAS_PHYS_TO_DMA #include #else @@ -73,5 +71,4 @@ dma_addr_t dma_direct_map_page(struct device *dev, struct page *page, int dma_direct_map_sg(struct device *dev, struct scatterlist *sgl, int nents, enum dma_data_direction dir, unsigned long attrs); int dma_direct_supported(struct device *dev, u64 mask); -int dma_direct_mapping_error(struct device *dev, dma_addr_t dma_addr); #endif /* _LINUX_DMA_DIRECT_H */ diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c index 22a12ab5a5e9..d4335a03193a 100644 --- a/kernel/dma/direct.c +++ b/kernel/dma/direct.c @@ -265,7 +265,7 @@ dma_addr_t dma_direct_map_page(struct device *dev, struct page *page, dma_addr_t dma_addr = phys_to_dma(dev, phys); if (!check_addr(dev, dma_addr, size, __func__)) - return DIRECT_MAPPING_ERROR; + return DMA_MAPPING_ERROR; if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC)) dma_direct_sync_single_for_device(dev, dma_addr, size, dir); @@ -312,11 +312,6 @@ int dma_direct_supported(struct device *dev, u64 mask) return mask >= phys_to_dma(dev, min_mask); } -int dma_direct_mapping_error(struct device *dev, dma_addr_t dma_addr) -{ - return dma_addr == DIRECT_MAPPING_ERROR; -} - const struct dma_map_ops dma_direct_ops = { .alloc = dma_direct_alloc, .free = dma_direct_free, @@ -335,7 +330,6 @@ const struct dma_map_ops dma_direct_ops = { #endif .get_required_mask = dma_direct_get_required_mask, .dma_supported = dma_direct_supported, - .mapping_error = dma_direct_mapping_error, .cache_sync = arch_dma_cache_sync, }; EXPORT_SYMBOL(dma_direct_ops); diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c index 045930e32c0e..ff1ce81bb623 100644 --- a/kernel/dma/swiotlb.c +++ b/kernel/dma/swiotlb.c @@ -631,21 +631,21 @@ static dma_addr_t swiotlb_bounce_page(struct device *dev, phys_addr_t *phys, if (unlikely(swiotlb_force == SWIOTLB_NO_FORCE)) { dev_warn_ratelimited(dev, "Cannot do DMA to address %pa\n", phys); - return DIRECT_MAPPING_ERROR; + return DMA_MAPPING_ERROR; } /* Oh well, have to allocate and map a bounce buffer. */ *phys = swiotlb_tbl_map_single(dev, __phys_to_dma(dev, io_tlb_start), *phys, size, dir, attrs); if (*phys == SWIOTLB_MAP_ERROR) - return DIRECT_MAPPING_ERROR; + return DMA_MAPPING_ERROR; /* Ensure that the address returned is DMA'ble */ dma_addr = __phys_to_dma(dev, *phys); if (unlikely(!dma_capable(dev, dma_addr, size))) { swiotlb_tbl_unmap_single(dev, *phys, size, dir, attrs | DMA_ATTR_SKIP_CPU_SYNC); - return DIRECT_MAPPING_ERROR; + return DMA_MAPPING_ERROR; } return dma_addr; @@ -680,7 +680,7 @@ dma_addr_t swiotlb_map_page(struct device *dev, struct page *page, if (!dev_is_dma_coherent(dev) && (attrs & DMA_ATTR_SKIP_CPU_SYNC) == 0 && - dev_addr != DIRECT_MAPPING_ERROR) + dev_addr != DMA_MAPPING_ERROR) arch_sync_dma_for_device(dev, phys, size, dir); return dev_addr; @@ -789,7 +789,7 @@ swiotlb_map_sg_attrs(struct device *dev, struct scatterlist *sgl, int nelems, for_each_sg(sgl, sg, nelems, i) { sg->dma_address = swiotlb_map_page(dev, sg_page(sg), sg->offset, sg->length, dir, attrs); - if (sg->dma_address == DIRECT
[Xen-devel] [PATCH 11/23] alpha: remove the mapping_error dma_map_ops method
Return DMA_MAPPING_ERROR instead of 0 on a dma mapping failure and let the core dma-mapping code handle the rest. Signed-off-by: Christoph Hellwig --- arch/alpha/kernel/pci_iommu.c | 14 -- 1 file changed, 4 insertions(+), 10 deletions(-) diff --git a/arch/alpha/kernel/pci_iommu.c b/arch/alpha/kernel/pci_iommu.c index 46e08e0d9181..e1716e0d92fd 100644 --- a/arch/alpha/kernel/pci_iommu.c +++ b/arch/alpha/kernel/pci_iommu.c @@ -291,7 +291,7 @@ pci_map_single_1(struct pci_dev *pdev, void *cpu_addr, size_t size, use direct_map above, it now must be considered an error. */ if (! alpha_mv.mv_pci_tbi) { printk_once(KERN_WARNING "pci_map_single: no HW sg\n"); - return 0; + return DMA_MAPPING_ERROR; } arena = hose->sg_pci; @@ -307,7 +307,7 @@ pci_map_single_1(struct pci_dev *pdev, void *cpu_addr, size_t size, if (dma_ofs < 0) { printk(KERN_WARNING "pci_map_single failed: " "could not allocate dma page tables\n"); - return 0; + return DMA_MAPPING_ERROR; } paddr &= PAGE_MASK; @@ -455,7 +455,7 @@ static void *alpha_pci_alloc_coherent(struct device *dev, size_t size, memset(cpu_addr, 0, size); *dma_addrp = pci_map_single_1(pdev, cpu_addr, size, 0); - if (*dma_addrp == 0) { + if (*dma_addrp == DMA_MAPPING_ERROR) { free_pages((unsigned long)cpu_addr, order); if (alpha_mv.mv_pci_tbi || (gfp & GFP_DMA)) return NULL; @@ -671,7 +671,7 @@ static int alpha_pci_map_sg(struct device *dev, struct scatterlist *sg, sg->dma_address = pci_map_single_1(pdev, SG_ENT_VIRT_ADDRESS(sg), sg->length, dac_allowed); - return sg->dma_address != 0; + return sg->dma_address != DMA_MAPPING_ERROR; } start = sg; @@ -935,11 +935,6 @@ iommu_unbind(struct pci_iommu_arena *arena, long pg_start, long pg_count) return 0; } -static int alpha_pci_mapping_error(struct device *dev, dma_addr_t dma_addr) -{ - return dma_addr == 0; -} - const struct dma_map_ops alpha_pci_ops = { .alloc = alpha_pci_alloc_coherent, .free = alpha_pci_free_coherent, @@ -947,7 +942,6 @@ const struct dma_map_ops alpha_pci_ops = { .unmap_page = alpha_pci_unmap_page, .map_sg = alpha_pci_map_sg, .unmap_sg = alpha_pci_unmap_sg, - .mapping_error = alpha_pci_mapping_error, .dma_supported = alpha_pci_supported, }; EXPORT_SYMBOL(alpha_pci_ops); -- 2.19.1 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH 20/23] iommu/dma-iommu: remove the mapping_error dma_map_ops method
Return DMA_MAPPING_ERROR instead of 0 on a dma mapping failure and let the core dma-mapping code handle the rest. Signed-off-by: Christoph Hellwig --- arch/arm64/mm/dma-mapping.c | 7 +++ drivers/iommu/dma-iommu.c | 23 --- include/linux/dma-iommu.h | 1 - 3 files changed, 11 insertions(+), 20 deletions(-) diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c index fdc26ea5036c..4cc70029cf8d 100644 --- a/arch/arm64/mm/dma-mapping.c +++ b/arch/arm64/mm/dma-mapping.c @@ -397,7 +397,7 @@ static void *__iommu_alloc_attrs(struct device *dev, size_t size, return NULL; *handle = iommu_dma_map_page(dev, page, 0, iosize, ioprot); - if (iommu_dma_mapping_error(dev, *handle)) { + if (*handle == DMA_MAPPING_ERROR) { if (coherent) __free_pages(page, get_order(size)); else @@ -414,7 +414,7 @@ static void *__iommu_alloc_attrs(struct device *dev, size_t size, return NULL; *handle = iommu_dma_map_page(dev, page, 0, iosize, ioprot); - if (iommu_dma_mapping_error(dev, *handle)) { + if (*handle == DMA_MAPPING_ERROR) { dma_release_from_contiguous(dev, page, size >> PAGE_SHIFT); return NULL; @@ -574,7 +574,7 @@ static dma_addr_t __iommu_map_page(struct device *dev, struct page *page, dma_addr_t dev_addr = iommu_dma_map_page(dev, page, offset, size, prot); if (!coherent && !(attrs & DMA_ATTR_SKIP_CPU_SYNC) && - !iommu_dma_mapping_error(dev, dev_addr)) + dev_addr != DMA_MAPPING_ERROR) __dma_map_area(page_address(page) + offset, size, dir); return dev_addr; @@ -657,7 +657,6 @@ static const struct dma_map_ops iommu_dma_ops = { .sync_sg_for_device = __iommu_sync_sg_for_device, .map_resource = iommu_dma_map_resource, .unmap_resource = iommu_dma_unmap_resource, - .mapping_error = iommu_dma_mapping_error, }; static int __init __iommu_dma_init(void) diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c index d1b04753b204..60c7e9e9901e 100644 --- a/drivers/iommu/dma-iommu.c +++ b/drivers/iommu/dma-iommu.c @@ -32,8 +32,6 @@ #include #include -#define IOMMU_MAPPING_ERROR0 - struct iommu_dma_msi_page { struct list_headlist; dma_addr_t iova; @@ -523,7 +521,7 @@ void iommu_dma_free(struct device *dev, struct page **pages, size_t size, { __iommu_dma_unmap(iommu_get_dma_domain(dev), *handle, size); __iommu_dma_free_pages(pages, PAGE_ALIGN(size) >> PAGE_SHIFT); - *handle = IOMMU_MAPPING_ERROR; + *handle = DMA_MAPPING_ERROR; } /** @@ -556,7 +554,7 @@ struct page **iommu_dma_alloc(struct device *dev, size_t size, gfp_t gfp, dma_addr_t iova; unsigned int count, min_size, alloc_sizes = domain->pgsize_bitmap; - *handle = IOMMU_MAPPING_ERROR; + *handle = DMA_MAPPING_ERROR; min_size = alloc_sizes & -alloc_sizes; if (min_size < PAGE_SIZE) { @@ -649,11 +647,11 @@ static dma_addr_t __iommu_dma_map(struct device *dev, phys_addr_t phys, iova = iommu_dma_alloc_iova(domain, size, dma_get_mask(dev), dev); if (!iova) - return IOMMU_MAPPING_ERROR; + return DMA_MAPPING_ERROR; if (iommu_map(domain, iova, phys - iova_off, size, prot)) { iommu_dma_free_iova(cookie, iova, size); - return IOMMU_MAPPING_ERROR; + return DMA_MAPPING_ERROR; } return iova + iova_off; } @@ -694,7 +692,7 @@ static int __finalise_sg(struct device *dev, struct scatterlist *sg, int nents, s->offset += s_iova_off; s->length = s_length; - sg_dma_address(s) = IOMMU_MAPPING_ERROR; + sg_dma_address(s) = DMA_MAPPING_ERROR; sg_dma_len(s) = 0; /* @@ -737,11 +735,11 @@ static void __invalidate_sg(struct scatterlist *sg, int nents) int i; for_each_sg(sg, s, nents, i) { - if (sg_dma_address(s) != IOMMU_MAPPING_ERROR) + if (sg_dma_address(s) != DMA_MAPPING_ERROR) s->offset += sg_dma_address(s); if (sg_dma_len(s)) s->length = sg_dma_len(s); - sg_dma_address(s) = IOMMU_MAPPING_ERROR; + sg_dma_address(s) = DMA_MAPPING_ERROR; sg_dma_len(s) = 0; } } @@ -858,11 +856,6 @@ void iommu_dma_unmap_resource(struct device *dev, dma_addr_t handle, __iommu_dma_unmap(iommu_get_dma_domain(dev), handle, size); } -int iommu_dma_mapping_error(struct device *dev, dma_addr_t dma_addr) -{ - return dma_addr == IOMMU_MAPPING_ERROR; -} -
[Xen-devel] [PATCH 22/23] dma-mapping: remove the mapping_error dma_map_ops method
No users left except for vmd which just forwards it. Signed-off-by: Christoph Hellwig --- drivers/pci/controller/vmd.c | 6 -- include/linux/dma-mapping.h | 7 --- 2 files changed, 13 deletions(-) diff --git a/drivers/pci/controller/vmd.c b/drivers/pci/controller/vmd.c index e50b0b5815ff..98ce79eac128 100644 --- a/drivers/pci/controller/vmd.c +++ b/drivers/pci/controller/vmd.c @@ -394,11 +394,6 @@ static void vmd_sync_sg_for_device(struct device *dev, struct scatterlist *sg, vmd_dma_ops(dev)->sync_sg_for_device(to_vmd_dev(dev), sg, nents, dir); } -static int vmd_mapping_error(struct device *dev, dma_addr_t addr) -{ - return vmd_dma_ops(dev)->mapping_error(to_vmd_dev(dev), addr); -} - static int vmd_dma_supported(struct device *dev, u64 mask) { return vmd_dma_ops(dev)->dma_supported(to_vmd_dev(dev), mask); @@ -446,7 +441,6 @@ static void vmd_setup_dma_ops(struct vmd_dev *vmd) ASSIGN_VMD_DMA_OPS(source, dest, sync_single_for_device); ASSIGN_VMD_DMA_OPS(source, dest, sync_sg_for_cpu); ASSIGN_VMD_DMA_OPS(source, dest, sync_sg_for_device); - ASSIGN_VMD_DMA_OPS(source, dest, mapping_error); ASSIGN_VMD_DMA_OPS(source, dest, dma_supported); ASSIGN_VMD_DMA_OPS(source, dest, get_required_mask); add_dma_domain(domain); diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h index 46bd612d929e..2adef56c6069 100644 --- a/include/linux/dma-mapping.h +++ b/include/linux/dma-mapping.h @@ -128,7 +128,6 @@ struct dma_map_ops { enum dma_data_direction dir); void (*cache_sync)(struct device *dev, void *vaddr, size_t size, enum dma_data_direction direction); - int (*mapping_error)(struct device *dev, dma_addr_t dma_addr); int (*dma_supported)(struct device *dev, u64 mask); u64 (*get_required_mask)(struct device *dev); }; @@ -575,15 +574,9 @@ static inline void dma_free_coherent(struct device *dev, size_t size, static inline int dma_mapping_error(struct device *dev, dma_addr_t dma_addr) { - const struct dma_map_ops *ops = get_dma_ops(dev); - debug_dma_mapping_error(dev, dma_addr); - if (dma_addr == DMA_MAPPING_ERROR) return 1; - - if (ops->mapping_error) - return ops->mapping_error(dev, dma_addr); return 0; } -- 2.19.1 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH 16/23] x86/calgary: remove the mapping_error dma_map_ops method
Return DMA_MAPPING_ERROR instead of the magic bad_dma_addr on a dma mapping failure and let the core dma-mapping code handle the rest. Remove the magic EMERGENCY_PAGES that the bad_dma_addr gets redirected to. Signed-off-by: Christoph Hellwig --- arch/x86/kernel/pci-calgary_64.c | 29 +++-- 1 file changed, 7 insertions(+), 22 deletions(-) diff --git a/arch/x86/kernel/pci-calgary_64.c b/arch/x86/kernel/pci-calgary_64.c index bbfc8b1e9104..e76ec1b8ed1f 100644 --- a/arch/x86/kernel/pci-calgary_64.c +++ b/arch/x86/kernel/pci-calgary_64.c @@ -51,8 +51,6 @@ #include #include -#define CALGARY_MAPPING_ERROR 0 - #ifdef CONFIG_CALGARY_IOMMU_ENABLED_BY_DEFAULT int use_calgary __read_mostly = 1; #else @@ -157,8 +155,6 @@ static const unsigned long phb_debug_offsets[] = { #define PHB_DEBUG_STUFF_OFFSET 0x0020 -#define EMERGENCY_PAGES 32 /* = 128KB */ - unsigned int specified_table_size = TCE_TABLE_SIZE_UNSPECIFIED; static int translate_empty_slots __read_mostly = 0; static int calgary_detected __read_mostly = 0; @@ -255,7 +251,7 @@ static unsigned long iommu_range_alloc(struct device *dev, if (panic_on_overflow) panic("Calgary: fix the allocator.\n"); else - return CALGARY_MAPPING_ERROR; + return DMA_MAPPING_ERROR; } } @@ -274,11 +270,10 @@ static dma_addr_t iommu_alloc(struct device *dev, struct iommu_table *tbl, dma_addr_t ret; entry = iommu_range_alloc(dev, tbl, npages); - - if (unlikely(entry == CALGARY_MAPPING_ERROR)) { + if (unlikely(entry == DMA_MAPPING_ERROR)) { pr_warn("failed to allocate %u pages in iommu %p\n", npages, tbl); - return CALGARY_MAPPING_ERROR; + return DMA_MAPPING_ERROR; } /* set the return dma address */ @@ -298,8 +293,7 @@ static void iommu_free(struct iommu_table *tbl, dma_addr_t dma_addr, unsigned long flags; /* were we called with bad_dma_address? */ - badend = CALGARY_MAPPING_ERROR + (EMERGENCY_PAGES * PAGE_SIZE); - if (unlikely(dma_addr < badend)) { + if (unlikely(dma_addr == DMA_MAPPING_ERROR)) { WARN(1, KERN_ERR "Calgary: driver tried unmapping bad DMA " "address 0x%Lx\n", dma_addr); return; @@ -383,7 +377,7 @@ static int calgary_map_sg(struct device *dev, struct scatterlist *sg, npages = iommu_num_pages(vaddr, s->length, PAGE_SIZE); entry = iommu_range_alloc(dev, tbl, npages); - if (entry == CALGARY_MAPPING_ERROR) { + if (entry == DMA_MAPPING_ERROR) { /* makes sure unmap knows to stop */ s->dma_length = 0; goto error; @@ -401,7 +395,7 @@ static int calgary_map_sg(struct device *dev, struct scatterlist *sg, error: calgary_unmap_sg(dev, sg, nelems, dir, 0); for_each_sg(sg, s, nelems, i) { - sg->dma_address = CALGARY_MAPPING_ERROR; + sg->dma_address = DMA_MAPPING_ERROR; sg->dma_length = 0; } return 0; @@ -454,7 +448,7 @@ static void* calgary_alloc_coherent(struct device *dev, size_t size, /* set up tces to cover the allocated range */ mapping = iommu_alloc(dev, tbl, ret, npages, DMA_BIDIRECTIONAL); - if (mapping == CALGARY_MAPPING_ERROR) + if (mapping == DMA_MAPPING_ERROR) goto free; *dma_handle = mapping; return ret; @@ -479,11 +473,6 @@ static void calgary_free_coherent(struct device *dev, size_t size, free_pages((unsigned long)vaddr, get_order(size)); } -static int calgary_mapping_error(struct device *dev, dma_addr_t dma_addr) -{ - return dma_addr == CALGARY_MAPPING_ERROR; -} - static const struct dma_map_ops calgary_dma_ops = { .alloc = calgary_alloc_coherent, .free = calgary_free_coherent, @@ -491,7 +480,6 @@ static const struct dma_map_ops calgary_dma_ops = { .unmap_sg = calgary_unmap_sg, .map_page = calgary_map_page, .unmap_page = calgary_unmap_page, - .mapping_error = calgary_mapping_error, .dma_supported = dma_direct_supported, }; @@ -739,9 +727,6 @@ static void __init calgary_reserve_regions(struct pci_dev *dev) u64 start; struct iommu_table *tbl = pci_iommu(dev->bus); - /* reserve EMERGENCY_PAGES from bad_dma_address and up */ - iommu_range_reserve(tbl, CALGARY_MAPPING_ERROR, EMERGENCY_PAGES); - /* avoid the BIOS/VGA first 640KB-1MB region */ /* for CalIOC2 - avoid the entire first MB */ if (is_calgary(dev->device)) { -- 2.19.1 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailm
[Xen-devel] [PATCH 17/23] iommu: remove the mapping_error dma_map_ops method
Return DMA_MAPPING_ERROR instead of 0 on a dma mapping failure and let the core dma-mapping code handle the rest. Note that the existing code used AMD_IOMMU_MAPPING_ERROR to check from a 0 return from the IOVA allocator, which is replaced with an explicit 0 as in the implementation and other users of that interface. Signed-off-by: Christoph Hellwig --- drivers/iommu/amd_iommu.c | 18 +- 1 file changed, 5 insertions(+), 13 deletions(-) diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c index 1167ff0416cf..c5d6c7c42b0a 100644 --- a/drivers/iommu/amd_iommu.c +++ b/drivers/iommu/amd_iommu.c @@ -55,8 +55,6 @@ #include "amd_iommu_types.h" #include "irq_remapping.h" -#define AMD_IOMMU_MAPPING_ERROR0 - #define CMD_SET_TYPE(cmd, t) ((cmd)->data[1] |= ((t) << 28)) #define LOOP_TIMEOUT 10 @@ -2339,7 +2337,7 @@ static dma_addr_t __map_single(struct device *dev, paddr &= PAGE_MASK; address = dma_ops_alloc_iova(dev, dma_dom, pages, dma_mask); - if (address == AMD_IOMMU_MAPPING_ERROR) + if (!address) goto out; prot = dir2prot(direction); @@ -2376,7 +2374,7 @@ static dma_addr_t __map_single(struct device *dev, dma_ops_free_iova(dma_dom, address, pages); - return AMD_IOMMU_MAPPING_ERROR; + return DMA_MAPPING_ERROR; } /* @@ -2427,7 +2425,7 @@ static dma_addr_t map_page(struct device *dev, struct page *page, if (PTR_ERR(domain) == -EINVAL) return (dma_addr_t)paddr; else if (IS_ERR(domain)) - return AMD_IOMMU_MAPPING_ERROR; + return DMA_MAPPING_ERROR; dma_mask = *dev->dma_mask; dma_dom = to_dma_ops_domain(domain); @@ -2504,7 +2502,7 @@ static int map_sg(struct device *dev, struct scatterlist *sglist, npages = sg_num_pages(dev, sglist, nelems); address = dma_ops_alloc_iova(dev, dma_dom, npages, dma_mask); - if (address == AMD_IOMMU_MAPPING_ERROR) + if (address == DMA_MAPPING_ERROR) goto out_err; prot = dir2prot(direction); @@ -2627,7 +2625,7 @@ static void *alloc_coherent(struct device *dev, size_t size, *dma_addr = __map_single(dev, dma_dom, page_to_phys(page), size, DMA_BIDIRECTIONAL, dma_mask); - if (*dma_addr == AMD_IOMMU_MAPPING_ERROR) + if (*dma_addr == DMA_MAPPING_ERROR) goto out_free; return page_address(page); @@ -2678,11 +2676,6 @@ static int amd_iommu_dma_supported(struct device *dev, u64 mask) return check_device(dev); } -static int amd_iommu_mapping_error(struct device *dev, dma_addr_t dma_addr) -{ - return dma_addr == AMD_IOMMU_MAPPING_ERROR; -} - static const struct dma_map_ops amd_iommu_dma_ops = { .alloc = alloc_coherent, .free = free_coherent, @@ -2691,7 +2684,6 @@ static const struct dma_map_ops amd_iommu_dma_ops = { .map_sg = map_sg, .unmap_sg = unmap_sg, .dma_supported = amd_iommu_dma_supported, - .mapping_error = amd_iommu_mapping_error, }; static int init_reserved_iova_ranges(void) -- 2.19.1 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH 19/23] iommu/vt-d: remove the mapping_error dma_map_ops method
Return DMA_MAPPING_ERROR instead of 0 on a dma mapping failure and let the core dma-mapping code handle the rest. Signed-off-by: Christoph Hellwig --- drivers/iommu/intel-iommu.c | 12 +++- 1 file changed, 3 insertions(+), 9 deletions(-) diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index 66b398ae..0ad67d65bbce 100644 --- a/drivers/iommu/intel-iommu.c +++ b/drivers/iommu/intel-iommu.c @@ -3617,7 +3617,7 @@ static dma_addr_t __intel_map_page(struct device *dev, struct page *page, domain = get_valid_domain_for_dev(dev); if (!domain) - return 0; + return DMA_MAPPING_ERROR; iommu = domain_get_iommu(domain); size = aligned_nrpages(paddr, size); @@ -3655,7 +3655,7 @@ static dma_addr_t __intel_map_page(struct device *dev, struct page *page, free_iova_fast(&domain->iovad, iova_pfn, dma_to_mm_pfn(size)); pr_err("Device %s request: %zx@%llx dir %d --- failed\n", dev_name(dev), size, (unsigned long long)paddr, dir); - return 0; + return DMA_MAPPING_ERROR; } static dma_addr_t intel_map_page(struct device *dev, struct page *page, @@ -3756,7 +3756,7 @@ static void *intel_alloc_coherent(struct device *dev, size_t size, *dma_handle = __intel_map_page(dev, page, 0, size, DMA_BIDIRECTIONAL, dev->coherent_dma_mask); - if (*dma_handle) + if (*dma_handle != DMA_MAPPING_ERROR) return page_address(page); if (!dma_release_from_contiguous(dev, page, size >> PAGE_SHIFT)) __free_pages(page, order); @@ -3865,11 +3865,6 @@ static int intel_map_sg(struct device *dev, struct scatterlist *sglist, int nele return nelems; } -static int intel_mapping_error(struct device *dev, dma_addr_t dma_addr) -{ - return !dma_addr; -} - static const struct dma_map_ops intel_dma_ops = { .alloc = intel_alloc_coherent, .free = intel_free_coherent, @@ -3877,7 +3872,6 @@ static const struct dma_map_ops intel_dma_ops = { .unmap_sg = intel_unmap_sg, .map_page = intel_map_page, .unmap_page = intel_unmap_page, - .mapping_error = intel_mapping_error, .dma_supported = dma_direct_supported, }; -- 2.19.1 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH 21/23] xen-swiotlb: remove the mapping_error dma_map_ops method
Return DMA_MAPPING_ERROR instead of 0 on a dma mapping failure and let the core dma-mapping code handle the rest. Signed-off-by: Christoph Hellwig --- drivers/xen/swiotlb-xen.c | 12 ++-- 1 file changed, 2 insertions(+), 10 deletions(-) diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c index 2a7f545bd0b5..6dc969d5ea2f 100644 --- a/drivers/xen/swiotlb-xen.c +++ b/drivers/xen/swiotlb-xen.c @@ -53,8 +53,6 @@ * API. */ -#define XEN_SWIOTLB_ERROR_CODE (~(dma_addr_t)0x0) - static char *xen_io_tlb_start, *xen_io_tlb_end; static unsigned long xen_io_tlb_nslabs; /* @@ -406,7 +404,7 @@ static dma_addr_t xen_swiotlb_map_page(struct device *dev, struct page *page, map = swiotlb_tbl_map_single(dev, start_dma_addr, phys, size, dir, attrs); if (map == SWIOTLB_MAP_ERROR) - return XEN_SWIOTLB_ERROR_CODE; + return DMA_MAPPING_ERROR; dev_addr = xen_phys_to_bus(map); xen_dma_map_page(dev, pfn_to_page(map >> PAGE_SHIFT), @@ -421,7 +419,7 @@ static dma_addr_t xen_swiotlb_map_page(struct device *dev, struct page *page, attrs |= DMA_ATTR_SKIP_CPU_SYNC; swiotlb_tbl_unmap_single(dev, map, size, dir, attrs); - return XEN_SWIOTLB_ERROR_CODE; + return DMA_MAPPING_ERROR; } /* @@ -700,11 +698,6 @@ xen_swiotlb_get_sgtable(struct device *dev, struct sg_table *sgt, return dma_common_get_sgtable(dev, sgt, cpu_addr, handle, size, attrs); } -static int xen_swiotlb_mapping_error(struct device *dev, dma_addr_t dma_addr) -{ - return dma_addr == XEN_SWIOTLB_ERROR_CODE; -} - const struct dma_map_ops xen_swiotlb_dma_ops = { .alloc = xen_swiotlb_alloc_coherent, .free = xen_swiotlb_free_coherent, @@ -719,5 +712,4 @@ const struct dma_map_ops xen_swiotlb_dma_ops = { .dma_supported = xen_swiotlb_dma_supported, .mmap = xen_swiotlb_dma_mmap, .get_sgtable = xen_swiotlb_get_sgtable, - .mapping_error = xen_swiotlb_mapping_error, }; -- 2.19.1 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH 23/23] dma-mapping: return an error code from dma_mapping_error
Currently dma_mapping_error returns a boolean as int, with 1 meaning error. This is rather unusual and many callers have to convert it to errno value. The callers are highly inconsistent with error codes ranging from -ENOMEM over -EIO, -EINVAL and -EFAULT ranging to -EAGAIN. Return -ENOMEM which seems to be what the largest number of callers convert it to, and which also matches the typical error case where we are out of resources. Signed-off-by: Christoph Hellwig --- include/linux/dma-mapping.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h index 2adef56c6069..b5bd25fc1f81 100644 --- a/include/linux/dma-mapping.h +++ b/include/linux/dma-mapping.h @@ -576,7 +576,7 @@ static inline int dma_mapping_error(struct device *dev, dma_addr_t dma_addr) { debug_dma_mapping_error(dev, dma_addr); if (dma_addr == DMA_MAPPING_ERROR) - return 1; + return -ENOMEM; return 0; } -- 2.19.1 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH 18/23] iommu/intel: small map_page cleanup
Pass the page + offset to the low-level __iommu_map_single helper (which gets renamed to fit the new calling conventions) as both callers have the page at hand. Signed-off-by: Christoph Hellwig --- drivers/iommu/intel-iommu.c | 14 +++--- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index 41a4b8808802..66b398ae 100644 --- a/drivers/iommu/intel-iommu.c +++ b/drivers/iommu/intel-iommu.c @@ -3597,9 +3597,11 @@ static int iommu_no_mapping(struct device *dev) return 0; } -static dma_addr_t __intel_map_single(struct device *dev, phys_addr_t paddr, -size_t size, int dir, u64 dma_mask) +static dma_addr_t __intel_map_page(struct device *dev, struct page *page, + unsigned long offset, size_t size, int dir, + u64 dma_mask) { + phys_addr_t paddr = page_to_phys(page) + offset; struct dmar_domain *domain; phys_addr_t start_paddr; unsigned long iova_pfn; @@ -3661,8 +3663,7 @@ static dma_addr_t intel_map_page(struct device *dev, struct page *page, enum dma_data_direction dir, unsigned long attrs) { - return __intel_map_single(dev, page_to_phys(page) + offset, size, - dir, *dev->dma_mask); + return __intel_map_page(dev, page, offset, size, dir, *dev->dma_mask); } static void intel_unmap(struct device *dev, dma_addr_t dev_addr, size_t size) @@ -3753,9 +3754,8 @@ static void *intel_alloc_coherent(struct device *dev, size_t size, return NULL; memset(page_address(page), 0, size); - *dma_handle = __intel_map_single(dev, page_to_phys(page), size, -DMA_BIDIRECTIONAL, -dev->coherent_dma_mask); + *dma_handle = __intel_map_page(dev, page, 0, size, DMA_BIDIRECTIONAL, + dev->coherent_dma_mask); if (*dma_handle) return page_address(page); if (!dma_release_from_contiguous(dev, page, size >> PAGE_SHIFT)) -- 2.19.1 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH 0/2] tools/libxc: Fixes to cpuid logic
On Thu, Nov 29, 2018 at 07:20:00PM +, Andrew Cooper wrote: > Andrew Cooper (2): > tools/libxc: Fix issues with libxc and Xen having different featureset > lengths > tools/libxc: Fix error handling in get_cpuid_domain_info() With Jan's comments addressed: Acked-by: Wei Liu ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Linux 4.19.5 fails to boot as Xen dom0
On Fri, Nov 30, 2018 at 01:11:56PM +, Hans van Kranenburg wrote: > On 11/29/18 4:06 PM, Kirill A. Shutemov wrote: > > On Thu, Nov 29, 2018 at 03:00:45PM +, Juergen Gross wrote: > >> On 29/11/2018 15:32, Kirill A. Shutemov wrote: > >>> On Thu, Nov 29, 2018 at 02:24:47PM +, Kirill A. Shutemov wrote: > On Thu, Nov 29, 2018 at 01:35:17PM +, Juergen Gross wrote: > > On 29/11/2018 14:26, Kirill A. Shutemov wrote: > >> On Thu, Nov 29, 2018 at 09:41:25AM +, Juergen Gross wrote: > >>> On 29/11/2018 02:22, Hans van Kranenburg wrote: > Hi, > > As also seen at: > https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=914951 > > Attached there are two serial console output logs. One is starting > with > Xen 4.11 (from debian unstable) as dom0, and the other one without > Xen. > > [2.085543] BUG: unable to handle kernel paging request at > 888d9fffc000 > [2.085610] PGD 200c067 P4D 200c067 PUD 0 > [2.085674] Oops: [#1] SMP NOPTI > [2.085736] CPU: 1 PID: 1 Comm: swapper/0 Not tainted > 4.19.0-trunk-amd64 #1 Debian 4.19.5-1~exp1+pvh1 > [2.085823] Hardware name: HP ProLiant DL360 G7, BIOS P68 > 05/21/2018 > [2.085895] RIP: e030:ptdump_walk_pgd_level_core+0x1fd/0x490 > [...] > >>> > >>> The offending stable commit is > >>> 4074ca7d8a1832921c865d250bbd08f3441b3657 > >>> ("x86/mm: Move LDT remap out of KASLR region on 5-level paging"), this > >>> is commit d52888aa2753e3063a9d3a0c9f72f94aa9809c15 upstream. > >>> > >>> Current upstream kernel is booting fine under Xen, so in general the > >>> patch should be fine. Using an upstream kernel built from above commit > >>> (with the then needed Xen fixup patch 1457d8cf7664f34c4ba534) is fine, > >>> too. > >>> > >>> Kirill, are you aware of any prerequisite patch from 4.20 which could > >>> be > >>> missing in 4.19.5? > >> > >> I'm not. > >> > >> Let me look into this. > >> > > > > What is making me suspicious is the failure happening just after > > releasing the init memory. Maybe there is an access to .init.data > > segment or similar? The native kernel booting could be related to the > > usage of 2M mappings not being available in a PV-domain. > > Ahh.. Could you test this: > > diff --git a/arch/x86/mm/dump_pagetables.c > b/arch/x86/mm/dump_pagetables.c > index a12afff146d1..7dec63ec7aab 100644 > --- a/arch/x86/mm/dump_pagetables.c > +++ b/arch/x86/mm/dump_pagetables.c > @@ -496,7 +496,7 @@ static inline bool is_hypervisor_range(int idx) > * 8000 - 87ff is reserved for > * the hypervisor. > */ > -return (idx >= pgd_index(__PAGE_OFFSET) - 16) && > +return (idx >= pgd_index(__PAGE_OFFSET) - 17) && > (idx < pgd_index(__PAGE_OFFSET)); > #else > return false; > >>> > >>> Or, better, this: > >> > >> That makes it boot again! > >> > >> Any idea why upstream doesn't need it? > > > > Nope. > > > > I'll prepare a proper fix. > > > > Thanks for looking into this. > > In the meantime, I applied the "Or, better, this" change, and my dom0 > boots again. > > FYI, boot log now: (paste 90d valid) > https://paste.debian.net/plainh/48940826 I forgot to CC you: https://lkml.kernel.org/r/20181130121131.g3xvlvixv7mvl...@black.fi.intel.com Please give it a try. -- Kirill A. Shutemov ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Linux 4.19.5 fails to boot as Xen dom0
On 11/30/18 2:26 PM, Kirill A. Shutemov wrote: > On Fri, Nov 30, 2018 at 01:11:56PM +, Hans van Kranenburg wrote: >> On 11/29/18 4:06 PM, Kirill A. Shutemov wrote: >>> On Thu, Nov 29, 2018 at 03:00:45PM +, Juergen Gross wrote: On 29/11/2018 15:32, Kirill A. Shutemov wrote: > On Thu, Nov 29, 2018 at 02:24:47PM +, Kirill A. Shutemov wrote: >> On Thu, Nov 29, 2018 at 01:35:17PM +, Juergen Gross wrote: >>> On 29/11/2018 14:26, Kirill A. Shutemov wrote: On Thu, Nov 29, 2018 at 09:41:25AM +, Juergen Gross wrote: > On 29/11/2018 02:22, Hans van Kranenburg wrote: >> Hi, >> >> As also seen at: >> https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=914951 >> >> Attached there are two serial console output logs. One is starting >> with >> Xen 4.11 (from debian unstable) as dom0, and the other one without >> Xen. >> >> [2.085543] BUG: unable to handle kernel paging request at >> 888d9fffc000 >> [2.085610] PGD 200c067 P4D 200c067 PUD 0 >> [2.085674] Oops: [#1] SMP NOPTI >> [2.085736] CPU: 1 PID: 1 Comm: swapper/0 Not tainted >> 4.19.0-trunk-amd64 #1 Debian 4.19.5-1~exp1+pvh1 >> [2.085823] Hardware name: HP ProLiant DL360 G7, BIOS P68 >> 05/21/2018 >> [2.085895] RIP: e030:ptdump_walk_pgd_level_core+0x1fd/0x490 >> [...] > > The offending stable commit is > 4074ca7d8a1832921c865d250bbd08f3441b3657 > ("x86/mm: Move LDT remap out of KASLR region on 5-level paging"), this > is commit d52888aa2753e3063a9d3a0c9f72f94aa9809c15 upstream. > > Current upstream kernel is booting fine under Xen, so in general the > patch should be fine. Using an upstream kernel built from above commit > (with the then needed Xen fixup patch 1457d8cf7664f34c4ba534) is fine, > too. > > Kirill, are you aware of any prerequisite patch from 4.20 which could > be > missing in 4.19.5? I'm not. Let me look into this. >>> >>> What is making me suspicious is the failure happening just after >>> releasing the init memory. Maybe there is an access to .init.data >>> segment or similar? The native kernel booting could be related to the >>> usage of 2M mappings not being available in a PV-domain. >> >> Ahh.. Could you test this: >> >> diff --git a/arch/x86/mm/dump_pagetables.c >> b/arch/x86/mm/dump_pagetables.c >> index a12afff146d1..7dec63ec7aab 100644 >> --- a/arch/x86/mm/dump_pagetables.c >> +++ b/arch/x86/mm/dump_pagetables.c >> @@ -496,7 +496,7 @@ static inline bool is_hypervisor_range(int idx) >> * 8000 - 87ff is reserved for >> * the hypervisor. >> */ >> -return (idx >= pgd_index(__PAGE_OFFSET) - 16) && >> +return (idx >= pgd_index(__PAGE_OFFSET) - 17) && >> (idx < pgd_index(__PAGE_OFFSET)); >> #else >> return false; > > Or, better, this: That makes it boot again! Any idea why upstream doesn't need it? >>> >>> Nope. >>> >>> I'll prepare a proper fix. >>> >> >> Thanks for looking into this. >> >> In the meantime, I applied the "Or, better, this" change, and my dom0 >> boots again. >> >> FYI, boot log now: (paste 90d valid) >> https://paste.debian.net/plainh/48940826 > > I forgot to CC you: > > https://lkml.kernel.org/r/20181130121131.g3xvlvixv7mvl...@black.fi.intel.com > > Please give it a try. Ah, right, thanks. The xen-devel list is also not in Cc. I'll slam it on top of my 4.19.5 debian package build and test. Hans ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v3] pvcalls-front: Avoid __get_free_pages(GFP_KERNEL) under spinlock
On 30/11/2018 12:01, Wen Yang wrote: > The problem is that we call this with a spin lock held. > The call tree is: > pvcalls_front_accept() holds bedata->socket_lock. > -> create_active() > -> __get_free_pages() uses GFP_KERNEL > > The create_active() function is only called from pvcalls_front_accept() > with a spin_lock held, The allocation is not allowed to sleep and > GFP_KERNEL is not sufficient. > > This issue was detected by using the Coccinelle software. > > v2: Add a function doing the allocations which is called > outside the lock and passing the allocated data to > create_active(). > v3: Use the matching deallocators i.e., free_page() > and free_pages(), respectively. > > Suggested-by: Juergen Gross > Signed-off-by: Wen Yang > CC: Julia Lawall > CC: Boris Ostrovsky > CC: Juergen Gross > CC: Stefano Stabellini > CC: xen-devel@lists.xenproject.org > CC: linux-ker...@vger.kernel.org > --- This patch is malformed. Please send it via an appropriate tool/mailer. See e.g. Documentation/process/submitting-patches.rst or Documentation/process/email-clients.rst in the Linux kernel source tree. Juergen ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [freebsd-master test] 130849: all pass - PUSHED
flight 130849 freebsd-master real [real] http://logs.test-lab.xenproject.org/osstest/logs/130849/ Perfect :-) All tests in this flight passed as required version targeted for testing: freebsd b1f31943cb61465b80f786de48501e2fb03e1b61 baseline version: freebsd 8894b8e317520c031636fdfab98fc0557a0e1f4c Last test of basis 130810 2018-11-26 09:20:17 Z4 days Testing same since 130849 2018-11-28 12:49:34 Z2 days1 attempts People who touched revisions under test: 0mp <0...@freebsd.org> ae arybchik asomers bwidawsk dab des emaste eugen gallatin gjb ian imp jamie jhb jhibbits jilles kib manu marius markj markm mckusick mm vangyzen ygy yuripv jobs: build-amd64-freebsd-againpass build-amd64-freebsd pass build-amd64-xen-freebsd pass sg-report-flight on osstest.test-lab.xenproject.org logs: /home/logs/logs images: /home/logs/images Logs, config files, etc. are available at http://logs.test-lab.xenproject.org/osstest/logs Explanation of these reports, and of osstest in general, is at http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master Test harness code can be found at http://xenbits.xen.org/gitweb?p=osstest.git;a=summary Pushing revision : To xenbits.xen.org:/home/xen/git/freebsd.git 8894b8e3175..b1f31943cb6 b1f31943cb61465b80f786de48501e2fb03e1b61 -> tested/master ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [linux-3.18 test] 130843: regressions - FAIL
flight 130843 linux-3.18 real [real] http://logs.test-lab.xenproject.org/osstest/logs/130843/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: test-amd64-i386-examine 8 reboot fail REGR. vs. 128858 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 7 xen-boot fail REGR. vs. 128858 test-amd64-i386-xl-qemuu-debianhvm-amd64-shadow 7 xen-boot fail REGR. vs. 128858 test-amd64-i386-xl-raw7 xen-boot fail REGR. vs. 128858 test-amd64-amd64-xl-qemuu-dmrestrict-amd64-dmrestrict 7 xen-boot fail REGR. vs. 128858 test-amd64-amd64-xl-multivcpu 7 xen-bootfail REGR. vs. 128858 test-amd64-amd64-xl-qemut-stubdom-debianhvm-amd64-xsm 7 xen-boot fail REGR. vs. 128858 test-amd64-i386-xl7 xen-boot fail REGR. vs. 128858 test-amd64-i386-qemut-rhel6hvm-intel 7 xen-boot fail REGR. vs. 128858 test-amd64-i386-freebsd10-i386 7 xen-boot fail REGR. vs. 128858 test-amd64-i386-xl-qemuu-ovmf-amd64 7 xen-boot fail REGR. vs. 128858 test-amd64-i386-libvirt-pair 10 xen-boot/src_hostfail REGR. vs. 128858 test-amd64-i386-libvirt-pair 11 xen-boot/dst_hostfail REGR. vs. 128858 test-amd64-i386-xl-qemuu-debianhvm-amd64 7 xen-boot fail REGR. vs. 128858 test-amd64-amd64-xl-qemuu-ovmf-amd64 7 xen-boot fail REGR. vs. 128858 test-amd64-i386-pair 10 xen-boot/src_hostfail REGR. vs. 128858 test-amd64-amd64-rumprun-amd64 7 xen-boot fail REGR. vs. 128858 test-amd64-amd64-libvirt-pair 10 xen-boot/src_host fail REGR. vs. 128858 test-amd64-amd64-libvirt-pair 11 xen-boot/dst_host fail REGR. vs. 128858 test-amd64-amd64-xl-xsm 7 xen-boot fail REGR. vs. 128858 test-amd64-amd64-pair10 xen-boot/src_hostfail REGR. vs. 128858 test-amd64-amd64-pair11 xen-boot/dst_hostfail REGR. vs. 128858 test-amd64-amd64-xl-pvshim7 xen-boot fail REGR. vs. 128858 test-armhf-armhf-xl-cubietruck 16 guest-start/debian.repeat fail REGR. vs. 128858 test-armhf-armhf-libvirt16 guest-start/debian.repeat fail REGR. vs. 128858 test-armhf-armhf-xl-multivcpu 16 guest-start/debian.repeat fail REGR. vs. 128858 Tests which did not succeed, but are not blocking: test-armhf-armhf-xl-credit2 16 guest-start/debian.repeatfail like 128691 test-amd64-i386-xl-qemut-debianhvm-amd64 16 guest-localmigrate/x10 fail like 128807 test-armhf-armhf-xl-vhd 10 debian-di-installfail like 128841 test-armhf-armhf-libvirt 14 saverestore-support-checkfail like 128858 test-amd64-amd64-xl-qemut-win7-amd64 17 guest-stopfail like 128858 test-amd64-amd64-xl-qemuu-win7-amd64 17 guest-stopfail like 128858 test-amd64-i386-xl-qemuu-win7-amd64 17 guest-stop fail like 128858 test-amd64-i386-xl-qemut-win7-amd64 17 guest-stop fail like 128858 test-armhf-armhf-libvirt-raw 13 saverestore-support-checkfail like 128858 test-amd64-i386-xl-qemuu-dmrestrict-amd64-dmrestrict 10 debian-hvm-install fail never pass test-amd64-amd64-xl-pvhv2-intel 12 guest-start fail never pass test-amd64-amd64-xl-pvhv2-amd 12 guest-start fail never pass test-amd64-amd64-libvirt-xsm 13 migrate-support-checkfail never pass test-amd64-i386-libvirt 13 migrate-support-checkfail never pass test-amd64-i386-libvirt-xsm 13 migrate-support-checkfail never pass test-amd64-i386-xl-pvshim12 guest-start fail never pass test-amd64-amd64-libvirt 13 migrate-support-checkfail never pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check fail never pass test-armhf-armhf-xl-arndale 13 migrate-support-checkfail never pass test-armhf-armhf-xl-arndale 14 saverestore-support-checkfail never pass test-amd64-amd64-qemuu-nested-amd 17 debian-hvm-install/l1/l2 fail never pass test-amd64-amd64-libvirt-vhd 12 migrate-support-checkfail never pass test-armhf-armhf-xl-credit2 13 migrate-support-checkfail never pass test-armhf-armhf-xl-credit2 14 saverestore-support-checkfail never pass test-armhf-armhf-xl 13 migrate-support-checkfail never pass test-armhf-armhf-xl 14 saverestore-support-checkfail never pass test-armhf-armhf-libvirt 13 migrate-support-checkfail never pass test-armhf-armhf-xl-cubietruck 13 migrate-support-checkfail never pass test-armhf-armhf-xl-cubietruck 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-rtds 13 migrate-support-checkfail never pass test-armhf-armhf-xl-rtds 14 saverestore-support-checkfail never pass test-amd64-amd64-xl-qemuu-ws16-amd64 17 guest-stop fail never pass test-amd64-i386-xl-qemut-ws16-amd64 17 gue
[Xen-devel] [xen-unstable-smoke test] 130873: tolerable all pass - PUSHED
flight 130873 xen-unstable-smoke real [real] http://logs.test-lab.xenproject.org/osstest/logs/130873/ Failures :-/ but no regressions. Tests which did not succeed, but are not blocking: test-amd64-amd64-libvirt 13 migrate-support-checkfail never pass test-armhf-armhf-xl 13 migrate-support-checkfail never pass test-armhf-armhf-xl 14 saverestore-support-checkfail never pass version targeted for testing: xen 2634b997afabfdc5a972e07e536dfbc6febb4385 baseline version: xen 402411ec40e451d1ad051fb7e59aa6374cc4212a Last test of basis 130852 2018-11-28 17:00:31 Z1 days Testing same since 130873 2018-11-30 12:00:40 Z0 days1 attempts People who touched revisions under test: Brian Woods Jan Beulich Paul Durrant Roger Pau Monné Tim Deegan Wei Liu jobs: build-amd64 pass build-armhf pass build-amd64-libvirt pass test-armhf-armhf-xl pass test-amd64-amd64-xl-qemuu-debianhvm-i386 pass test-amd64-amd64-libvirt pass sg-report-flight on osstest.test-lab.xenproject.org logs: /home/logs/logs images: /home/logs/images Logs, config files, etc. are available at http://logs.test-lab.xenproject.org/osstest/logs Explanation of these reports, and of osstest in general, is at http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master Test harness code can be found at http://xenbits.xen.org/gitweb?p=osstest.git;a=summary Pushing revision : To xenbits.xen.org:/home/xen/git/xen.git 402411ec40..2634b997af 2634b997afabfdc5a972e07e536dfbc6febb4385 -> smoke ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH 2/2] iommu: elide flushing for higher order map/unmap operations
> -Original Message- > From: Andrew Cooper > Sent: 30 November 2018 12:49 > To: Paul Durrant ; xen-devel@lists.xenproject.org > Cc: Stefano Stabellini ; Julien Grall > ; Suravee Suthikulpanit > ; Brian Woods ; Jan > Beulich ; Kevin Tian ; Wei Liu > ; Roger Pau Monne > Subject: Re: [PATCH 2/2] iommu: elide flushing for higher order map/unmap > operations > > On 30/11/2018 10:45, Paul Durrant wrote: > > This patch removes any implicit flushing that occurs in the > implementation > > of map and unmap operations and, instead, adds explicit flushing at the > > end of the loops in the iommu_map/unmap() wrapper functions. > > > > Because VT-d currently performs two different types of flush dependent > upon > > whether a PTE is being modified versus merely added (i.e. replacing a > non- > > present PTE) a 'iommu_flush_type' enumeration is defined by this patch > and > > the iommu_ops map method is modified to pass back the type of flush > > necessary for the PTE that has been populated. When a higher order > mapping > > operation is done, the wrapper code performs the 'highest' level of > flush > > required by the individual iommu_ops method calls, where a 'modified > PTE' > > flush is deemed to be higher than a 'added PTE' flush. The ARM SMMU > > implementation currently performs no implicit flushing and therefore > > the modified map method always passes back a flush type of 'none'. > > > > NOTE: The per-cpu 'iommu_dont_flush_iotlb' is respected by the > iommu_map() > > wrapper function and therefore this now applies to all IOMMU > > implementations rather than just VT-d. Use of the flag has been > added > > to arch_iommu_hwdom_init() so that flushing is now fully elided > for > > the hardware domain's initial table population. > > iommu_dont_flush_iotlb is a misfeature. While it still exists, the > flushing API is fundamentally broken. > > Do you have a plan to remove it? I ask, because the only feasible > option I see is for iommu_{map,unmap}() to pass the flush accumulation > out to the caller, and have the caller use the appropriate flush > interfaces. > > [Edit - lunch happened around about this point, and there was a long > discussion] > > One idea with be to start with a prep patch renaming iommu_{,un}map() to > _legacy(), nothing beside them that they have implicit flushing > characteristics. Then, the nonflushing versions of iommu_{,un}map() can > be introduced, which return the accumulated flush flag, and the callers > can DTRT. Ok. I'll plan to do a new patch #2 that does the rename, and the move this code into a patch #3. > > This way, we can avoid introducing a further user of > iommu_dont_flush_iotlb in arch_iommu_hwdom_init(), and clean up the > remaining legacy callsites at a later point when more infrastructure is > in place. > > In particular, the P2M code cannot be fixed to behave in this way at the > moment because the point at which the IOMMU changes are hooked in lacks > an order parameter. The P2M code is also a mess and calls directly into VT-d and AMD IOMMU functions without going through the wrapper code in some cases. There is much cleanup to do... these patches are just a start. Paul > > > diff --git a/xen/include/xen/iommu.h b/xen/include/xen/iommu.h > > index da8294bac8..289e0e2772 100644 > > --- a/xen/include/xen/iommu.h > > +++ b/xen/include/xen/iommu.h > > @@ -155,6 +155,13 @@ struct page_info; > > */ > > typedef int iommu_grdm_t(xen_pfn_t start, xen_ulong_t nr, u32 id, void > *ctxt); > > > > This wants at least a comment stating that some IOMMUs require that we > issue a flush when modifying a not-present/otherwise invalid entry. > > > +enum iommu_flush_type > > +{ > > +IOMMU_FLUSH_none, /* no flush required */ > > +IOMMU_FLUSH_added, /* no modified entries, just additional entries > */ > > IOMMU_FLUSH_invalid ? I think it is more descriptive of the scenario in > which it is used. > > > +IOMMU_FLUSH_modified, /* modified entries */ > > +}; > > + > > struct iommu_ops { > > int (*init)(struct domain *d); > > void (*hwdom_init)(struct domain *d); > > @@ -177,7 +184,8 @@ struct iommu_ops { > > * other by the caller in order to have meaningful results. > > */ > > int __must_check (*map_page)(struct domain *d, dfn_t dfn, mfn_t > mfn, > > - unsigned int flags); > > + unsigned int flags, > > + enum iommu_flush_type *flush_type); > > Maintaining the flush type by pointer is quite awkward. > > How about folding a positive flush type in with negative errors? i.e. > map_page() becomes < 0 on error, 0 for success/no flush and >0 for > success/w flush. > > I think the result would be rather cleaner to read. > > ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Linux 4.19.5 fails to boot as Xen dom0
On 11/30/18 2:26 PM, Kirill A. Shutemov wrote: > On Fri, Nov 30, 2018 at 01:11:56PM +, Hans van Kranenburg wrote: >> On 11/29/18 4:06 PM, Kirill A. Shutemov wrote: >>> On Thu, Nov 29, 2018 at 03:00:45PM +, Juergen Gross wrote: On 29/11/2018 15:32, Kirill A. Shutemov wrote: > On Thu, Nov 29, 2018 at 02:24:47PM +, Kirill A. Shutemov wrote: >> On Thu, Nov 29, 2018 at 01:35:17PM +, Juergen Gross wrote: >>> On 29/11/2018 14:26, Kirill A. Shutemov wrote: On Thu, Nov 29, 2018 at 09:41:25AM +, Juergen Gross wrote: > On 29/11/2018 02:22, Hans van Kranenburg wrote: >> Hi, >> >> As also seen at: >> https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=914951 >> >> Attached there are two serial console output logs. One is starting >> with >> Xen 4.11 (from debian unstable) as dom0, and the other one without >> Xen. >> >> [2.085543] BUG: unable to handle kernel paging request at >> 888d9fffc000 >> [2.085610] PGD 200c067 P4D 200c067 PUD 0 >> [2.085674] Oops: [#1] SMP NOPTI >> [2.085736] CPU: 1 PID: 1 Comm: swapper/0 Not tainted >> 4.19.0-trunk-amd64 #1 Debian 4.19.5-1~exp1+pvh1 >> [2.085823] Hardware name: HP ProLiant DL360 G7, BIOS P68 >> 05/21/2018 >> [2.085895] RIP: e030:ptdump_walk_pgd_level_core+0x1fd/0x490 >> [...] > > The offending stable commit is > 4074ca7d8a1832921c865d250bbd08f3441b3657 > ("x86/mm: Move LDT remap out of KASLR region on 5-level paging"), this > is commit d52888aa2753e3063a9d3a0c9f72f94aa9809c15 upstream. > > Current upstream kernel is booting fine under Xen, so in general the > patch should be fine. Using an upstream kernel built from above commit > (with the then needed Xen fixup patch 1457d8cf7664f34c4ba534) is fine, > too. > > Kirill, are you aware of any prerequisite patch from 4.20 which could > be > missing in 4.19.5? I'm not. Let me look into this. >>> >>> What is making me suspicious is the failure happening just after >>> releasing the init memory. Maybe there is an access to .init.data >>> segment or similar? The native kernel booting could be related to the >>> usage of 2M mappings not being available in a PV-domain. >> >> Ahh.. Could you test this: >> >> diff --git a/arch/x86/mm/dump_pagetables.c >> b/arch/x86/mm/dump_pagetables.c >> index a12afff146d1..7dec63ec7aab 100644 >> --- a/arch/x86/mm/dump_pagetables.c >> +++ b/arch/x86/mm/dump_pagetables.c >> @@ -496,7 +496,7 @@ static inline bool is_hypervisor_range(int idx) >> * 8000 - 87ff is reserved for >> * the hypervisor. >> */ >> -return (idx >= pgd_index(__PAGE_OFFSET) - 16) && >> +return (idx >= pgd_index(__PAGE_OFFSET) - 17) && >> (idx < pgd_index(__PAGE_OFFSET)); >> #else >> return false; > > Or, better, this: That makes it boot again! Any idea why upstream doesn't need it? >>> >>> Nope. >>> >>> I'll prepare a proper fix. >>> >> >> Thanks for looking into this. >> >> In the meantime, I applied the "Or, better, this" change, and my dom0 >> boots again. >> >> FYI, boot log now: (paste 90d valid) >> https://paste.debian.net/plainh/48940826 > > I forgot to CC you: > > https://lkml.kernel.org/r/20181130121131.g3xvlvixv7mvl...@black.fi.intel.com > > Please give it a try. I'm not in that thread, so my response here... You paste a v2-like patch into 'Re: [PATCH 1/2]'. Juergen says: s/LDT_PGD_ENTRY/GUARD_HOLE_PGD_ENTRY/, then you say Ughh.., change it to GUARD_HOLE_ENTRY, which does not exist, and then get a Reviewed-by from Juergen. I guess it has to be GUARD_HOLE_PGD_ENTRY after all... arch/x86/include/asm/pgtable_64_types.h:116:31: error: 'GUARD_HOLE_ENTRY' undeclared (first use in this function); did you mean 'GUARD_HOLE_PGD_ENTRY'? I'll test that instead. Hans ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [Xen EFI] Impossible to limit the dom0 memory
Hi Jan, hi Juergen, I'm trying again this week to install Xen on a OVH server (https://www.ovh.com/fr/serveurs_dedies/infra/1801eg02.xml). It is still impossible to boot Xen with the option "dom0_mem=1G,max:1G" (boot : EFI->xen). I have tried with Debian 9 stable/stretch : - grub2 (2.02~beta3-5+deb9u1) : https://packages.debian.org/stretch/grub2 - xen-system-amd64 (4.8.4+xsa273+shim4.10.1+xsa273-1+deb9u10) : https://packages.debian.org/stretch/xen-system-amd64 - linux-image-amd64 (4.9+80+deb9u6) (4.9+80+deb9u6) https://packages.debian.org/stretch/linux-image-amd64 I have tried with Debian 10 testing/buster : - grub2 (2.02+dfsg1-8) : https://packages.debian.org/buster/grub2 - xen-system-amd64 (4.11.1~pre.20180911.5acdd26fdc+dfsg-5) : https://packages.debian.org/buster/xen-system-amd64 - linux-image-amd64 (4.18+99) : https://packages.debian.org/buster/linux-image-amd64 # cat /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 158 model name : Intel(R) Xeon(R) CPU E3-1270 v6 @ 3.80GHz stepping: 9 microcode : 0x8e Microcode version seems to be up to date : https://www.intel.com/content/dam/www/public/us/en/documents/sa00115-microcode-update-guidance.pdf Here is my WORKING xen.cfg file : ``` [global] default=xen [xen] options=smt=on kernel=$VMLINUZ_NAME root=/dev/md3 ro rootdelay=10 noquiet nosplash ramdisk=$INITRD_NAME ``` Here is my FAILING xen.cfg file (with dom0_mem=1G,max:1G) : ``` [global] default=xen [xen] options=smt=on dom0_mem=1G,max:1G kernel=vmlinuz root=/dev/md3 ro rootdelay=10 noquiet nosplash ramdisk=initrd.img ``` For information, if helpful : - Booting Linux with EFI->grub->Linux works fine. - Booting Xen with EFI->grub->Xen fails too (In January I only had 1/8 core available, now I cannot boot but I will discuss this problem in another thread). Do you have more information on the dom0_mem argument problem ? Best regards, Guillaume Le 25/01/2018 à 17:07, msd+xen-de...@msd.im a écrit : I have installed `linux-image-amd64-dbg` and `binutils`. I can now execute `addr2line -pfi -e vmlinux-4.14.0-0.bpo.3-amd64 `. I have generated a file "commands.txt" with all the addresses after "Guest stack trace from rsp=82003cb0:" in my log file "dom0_crash_with_dom0_memory.txt". I attached the result : "result.txt". We can see inside this file "xen/mmu_pv.c:1548" and "drivers/firmware/efi/efi.c:558", so I hope it will be helpful. Is that ok for you ? Can I do something more ? Regards, Guillaume ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH OSSTEST] ts-xen-build: Enable ITS driver in Xen
The ITS driver was added in Xen 4.10 as a technical preview feature. However, it is required in order to boot Xen as Thunder-X because PCI devices don't support legacy interrupt. So enable CONFIG_ITS in our Xen build. Signed-off-by: Julien Grall --- ts-xen-build | 4 1 file changed, 4 insertions(+) diff --git a/ts-xen-build b/ts-xen-build index c5d2a1d..dedf474 100755 --- a/ts-xen-build +++ b/ts-xen-build @@ -126,6 +126,10 @@ END echo >>xen/.config CONFIG_EXPERT=y echo >>xen/.config CONFIG_HVM_FEP=y echo >>xen/.config CONFIG_VERBOSE_DEBUG=y + # ITS driver is required to boot the Hardware Domain + # on Xen. For now (Xen 4.10/4.11 at at least), + # will be not built by default and gated by expert mode + echo >>xen/.config CONFIG_HAS_ITS=y fi END ); -- 2.11.0 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [qemu-mainline test] 130844: regressions - FAIL
flight 130844 qemu-mainline real [real] http://logs.test-lab.xenproject.org/osstest/logs/130844/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: test-amd64-i386-xl-qemuu-dmrestrict-amd64-dmrestrict 10 debian-hvm-install fail REGR. vs. 129996 test-amd64-amd64-xl-qemuu-dmrestrict-amd64-dmrestrict 10 debian-hvm-install fail REGR. vs. 129996 Tests which did not succeed, but are not blocking: test-amd64-amd64-xl-qemuu-win7-amd64 17 guest-stopfail like 129996 test-armhf-armhf-libvirt 14 saverestore-support-checkfail like 129996 test-amd64-i386-xl-qemuu-win7-amd64 17 guest-stop fail like 129996 test-amd64-amd64-xl-qemuu-ws16-amd64 17 guest-stopfail like 129996 test-armhf-armhf-libvirt-raw 13 saverestore-support-checkfail like 129996 test-amd64-i386-xl-pvshim12 guest-start fail never pass test-amd64-i386-libvirt 13 migrate-support-checkfail never pass test-amd64-amd64-libvirt-xsm 13 migrate-support-checkfail never pass test-amd64-i386-libvirt-xsm 13 migrate-support-checkfail never pass test-amd64-amd64-libvirt 13 migrate-support-checkfail never pass test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check fail never pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check fail never pass test-amd64-amd64-qemuu-nested-amd 17 debian-hvm-install/l1/l2 fail never pass test-amd64-amd64-libvirt-vhd 12 migrate-support-checkfail never pass test-armhf-armhf-xl-credit2 13 migrate-support-checkfail never pass test-armhf-armhf-xl-credit2 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit1 13 migrate-support-checkfail never pass test-armhf-armhf-xl-credit1 14 saverestore-support-checkfail never pass test-armhf-armhf-libvirt 13 migrate-support-checkfail never pass test-armhf-armhf-xl-rtds 13 migrate-support-checkfail never pass test-armhf-armhf-xl-rtds 14 saverestore-support-checkfail never pass test-armhf-armhf-xl 13 migrate-support-checkfail never pass test-armhf-armhf-xl 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-cubietruck 13 migrate-support-checkfail never pass test-armhf-armhf-xl-cubietruck 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-multivcpu 13 migrate-support-checkfail never pass test-armhf-armhf-xl-multivcpu 14 saverestore-support-checkfail never pass test-armhf-armhf-libvirt-raw 12 migrate-support-checkfail never pass test-armhf-armhf-xl-arndale 13 migrate-support-checkfail never pass test-armhf-armhf-xl-arndale 14 saverestore-support-checkfail never pass test-amd64-i386-xl-qemuu-ws16-amd64 17 guest-stop fail never pass test-armhf-armhf-xl-vhd 12 migrate-support-checkfail never pass test-armhf-armhf-xl-vhd 13 saverestore-support-checkfail never pass test-amd64-amd64-xl-qemuu-win10-i386 10 windows-installfail never pass test-amd64-i386-xl-qemuu-win10-i386 10 windows-install fail never pass version targeted for testing: qemuu039d4e3df0049bdd8f93a2de735a816483b13954 baseline version: qemuucb968d275c145467c8b385a3618a207ec111eab1 Last test of basis 129996 2018-11-13 22:49:16 Z 16 days Failing since130168 2018-11-16 04:27:30 Z 14 days9 attempts Testing same since 130844 2018-11-28 04:15:43 Z2 days1 attempts People who touched revisions under test: Alberto Garcia Aleksandar Markovic Alex Bennée Alistair Francis baldu...@units.it Bandan Das Bastian Koppelmann Cornelia Huck Daniel P. Berrangé David Hildenbrand Dr. David Alan Gilbert Edgar E. Iglesias Eduardo Habkost Eric Auger Eric Blake Erik Skultety Fredrik Noring Gerd Hoffmann Greg Kurz Guenter Roeck Hervé Poussineau Igor Druzhinin Jason Wang John Snow Keith Busch Kevin Wolf Laurent Vivier Li Qiang linzhecheng Logan Gunthorpe Luc Michel Mao Zhongyi Marc-André Lureau Mark Cave-Ayland Markus Armbruster Max Filippov Max Reitz Michael Roth Palmer Dabbelt Paolo Bonzini Peter Maydell Philippe Mathieu-Daudé Philippe Mathieu-Daudé Prasad J Pandit Richard Henderson Richard W.M. Jones Roman Kagan Seth Kintigh Stefan Berger Stefan Berger Stefan Markovic Thomas Huth Wang Xin Zhang Chen Zhang Chen ZhiPeng Lu jobs: build-amd64-xsm pass build-i386-xsm pass build-amd64 pass build-armhf
Re: [Xen-devel] [PATCH 7/9] libxl: Make killing of device model asynchronous
> On Nov 28, 2018, at 4:43 PM, Ian Jackson wrote: > > George Dunlap writes ("[PATCH 7/9] libxl: Make killing of device model > asynchronous"): >> Or at least, give it an asynchronous interface so that we can make it >> actually asynchronous in subsequent patches. >> >> Create state structures and callback function signatures. Add the >> state structure to libxl__destroy_domid_state. Break >> libxl__destroy_domid down into two functions. > ... >> +/* Used to detroy the device model */ >> +_hidden void libxl__destroy_device_model(libxl__egc *egc, >> + libxl__destroy_devicemodel_state >> *ddms); > > I think that comment is rather pointless (but I won't object if you > really want to keep it). Yes; that comment looks rather vestigal. > Conversely it would be nice to say somewhere > that ddms->callback may be called reentrantly. What do you mean by reentrantly? That libxl__destroy_device_model() may end up calling it directly (on this execution stack), rather than arranging for it to be called by someone else after returning? -George ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH 7/9] libxl: Make killing of device model asynchronous
> On Nov 30, 2018, at 4:12 PM, George Dunlap wrote: > > > >> On Nov 28, 2018, at 4:43 PM, Ian Jackson wrote: >> >> George Dunlap writes ("[PATCH 7/9] libxl: Make killing of device model >> asynchronous"): >>> Or at least, give it an asynchronous interface so that we can make it >>> actually asynchronous in subsequent patches. >>> >>> Create state structures and callback function signatures. Add the >>> state structure to libxl__destroy_domid_state. Break >>> libxl__destroy_domid down into two functions. >> ... >>> +/* Used to detroy the device model */ >>> +_hidden void libxl__destroy_device_model(libxl__egc *egc, >>> + libxl__destroy_devicemodel_state >>> *ddms); >> >> I think that comment is rather pointless (but I won't object if you >> really want to keep it). > > Yes; that comment looks rather vestigal. Oh, I see; I was following suit with the code around it: /* Used to destroy a domain with the passed id (it doesn't check for stubs) */ _hidden void libxl__destroy_domid(libxl__egc *egc, libxl__destroy_domid_state *dis); /* Used to detroy the device model */ _hidden void libxl__destroy_device_model(libxl__egc *egc, libxl__destroy_devicemodel_state *ddms); /* Entry point for devices destruction */ _hidden void libxl__devices_destroy(libxl__egc *egc, libxl__devices_remove_state *drs); It looks cleaner to me to have *something* there than not, just to visually make it clear that it has nothing to do with the previous function. -George ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH 7/9] libxl: Make killing of device model asynchronous
George Dunlap writes ("Re: [PATCH 7/9] libxl: Make killing of device model asynchronous"): > On Nov 28, 2018, at 4:43 PM, Ian Jackson wrote: > > Conversely it would be nice to say somewhere > > that ddms->callback may be called reentrantly. > > What do you mean by reentrantly? That libxl__destroy_device_model() may end > up calling it directly (on this execution stack), rather than arranging for > it to be called by someone else after returning? Precisely. This kind of callback can, in some situations, be a hazard for the caller, because it might result in data structures that the caller is about to rely on vanishing. It is conventional in libxl to return after setting up callbacks so this generally doesn't arise, but adding a note to functions with this property is a good idea. Ian. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Linux 4.19.5 fails to boot as Xen dom0
On Fri, Nov 30, 2018 at 02:53:50PM +, Hans van Kranenburg wrote: > On 11/30/18 2:26 PM, Kirill A. Shutemov wrote: > > On Fri, Nov 30, 2018 at 01:11:56PM +, Hans van Kranenburg wrote: > >> On 11/29/18 4:06 PM, Kirill A. Shutemov wrote: > >>> On Thu, Nov 29, 2018 at 03:00:45PM +, Juergen Gross wrote: > On 29/11/2018 15:32, Kirill A. Shutemov wrote: > > On Thu, Nov 29, 2018 at 02:24:47PM +, Kirill A. Shutemov wrote: > >> On Thu, Nov 29, 2018 at 01:35:17PM +, Juergen Gross wrote: > >>> On 29/11/2018 14:26, Kirill A. Shutemov wrote: > On Thu, Nov 29, 2018 at 09:41:25AM +, Juergen Gross wrote: > > On 29/11/2018 02:22, Hans van Kranenburg wrote: > >> Hi, > >> > >> As also seen at: > >> https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=914951 > >> > >> Attached there are two serial console output logs. One is starting > >> with > >> Xen 4.11 (from debian unstable) as dom0, and the other one without > >> Xen. > >> > >> [2.085543] BUG: unable to handle kernel paging request at > >> 888d9fffc000 > >> [2.085610] PGD 200c067 P4D 200c067 PUD 0 > >> [2.085674] Oops: [#1] SMP NOPTI > >> [2.085736] CPU: 1 PID: 1 Comm: swapper/0 Not tainted > >> 4.19.0-trunk-amd64 #1 Debian 4.19.5-1~exp1+pvh1 > >> [2.085823] Hardware name: HP ProLiant DL360 G7, BIOS P68 > >> 05/21/2018 > >> [2.085895] RIP: e030:ptdump_walk_pgd_level_core+0x1fd/0x490 > >> [...] > > > > The offending stable commit is > > 4074ca7d8a1832921c865d250bbd08f3441b3657 > > ("x86/mm: Move LDT remap out of KASLR region on 5-level paging"), > > this > > is commit d52888aa2753e3063a9d3a0c9f72f94aa9809c15 upstream. > > > > Current upstream kernel is booting fine under Xen, so in general the > > patch should be fine. Using an upstream kernel built from above > > commit > > (with the then needed Xen fixup patch 1457d8cf7664f34c4ba534) is > > fine, > > too. > > > > Kirill, are you aware of any prerequisite patch from 4.20 which > > could be > > missing in 4.19.5? > > I'm not. > > Let me look into this. > > >>> > >>> What is making me suspicious is the failure happening just after > >>> releasing the init memory. Maybe there is an access to .init.data > >>> segment or similar? The native kernel booting could be related to the > >>> usage of 2M mappings not being available in a PV-domain. > >> > >> Ahh.. Could you test this: > >> > >> diff --git a/arch/x86/mm/dump_pagetables.c > >> b/arch/x86/mm/dump_pagetables.c > >> index a12afff146d1..7dec63ec7aab 100644 > >> --- a/arch/x86/mm/dump_pagetables.c > >> +++ b/arch/x86/mm/dump_pagetables.c > >> @@ -496,7 +496,7 @@ static inline bool is_hypervisor_range(int idx) > >> * 8000 - 87ff is reserved for > >> * the hypervisor. > >> */ > >> - return (idx >= pgd_index(__PAGE_OFFSET) - 16) && > >> + return (idx >= pgd_index(__PAGE_OFFSET) - 17) && > >>(idx < pgd_index(__PAGE_OFFSET)); > >> #else > >>return false; > > > > Or, better, this: > > That makes it boot again! > > Any idea why upstream doesn't need it? > >>> > >>> Nope. > >>> > >>> I'll prepare a proper fix. > >>> > >> > >> Thanks for looking into this. > >> > >> In the meantime, I applied the "Or, better, this" change, and my dom0 > >> boots again. > >> > >> FYI, boot log now: (paste 90d valid) > >> https://paste.debian.net/plainh/48940826 > > > > I forgot to CC you: > > > > https://lkml.kernel.org/r/20181130121131.g3xvlvixv7mvl...@black.fi.intel.com > > > > Please give it a try. > > I'm not in that thread, so my response here... > > You paste a v2-like patch into 'Re: [PATCH 1/2]'. Juergen says: > s/LDT_PGD_ENTRY/GUARD_HOLE_PGD_ENTRY/, then you say Ughh.., change it to > GUARD_HOLE_ENTRY, which does not exist, and then get a Reviewed-by from > Juergen. > > I guess it has to be GUARD_HOLE_PGD_ENTRY after all... > > arch/x86/include/asm/pgtable_64_types.h:116:31: error: > 'GUARD_HOLE_ENTRY' undeclared (first use in this function); did you mean > 'GUARD_HOLE_PGD_ENTRY'? > > I'll test that instead. Yes, thank you. It was a long week... :/ Let me know if it works. I'll repost the fixed version with your Tested-by. -- Kirill A. Shutemov ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH 7/9] libxl: Make killing of device model asynchronous
George Dunlap writes ("Re: [PATCH 7/9] libxl: Make killing of device model asynchronous"): > It looks cleaner to me to have *something* there than not, just to visually > make it clear that it has nothing to do with the previous function. That's OK by me. Ian. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v2 2/2] ns16550: enable use of PCI MSI
On Mon, Oct 01, 2018 at 10:26:05AM -0600, Jan Beulich wrote: > Which, on x86, requires fiddling with the INTx bit in PCI config space, > since for internally used MSI we can't delegate this to Dom0. > > ns16550_init_postirq() also needs (benign) re-ordering of its > operations. > > Signed-off-by: Jan Beulich > --- > v2: Re-base. > > --- a/docs/misc/xen-command-line.markdown > +++ b/docs/misc/xen-command-line.markdown > @@ -307,7 +307,7 @@ Flag to indicate whether to probe for a > ACPI indicating none to be there. > > ### com1,com2 > -> `= > [/][,[DPS][,[|pci|amt][,[][,[][,[]]` > +> `= > [/][,[DPS][,[|pci|amt][,[|msi][,[][,[]]` > > Both option `com1` and `com2` follow the same format. > > @@ -328,7 +328,7 @@ Both option `com1` and `com2` follow the > * `` is an integer which specifies the IO base port for UART >registers. > * `` is the IRQ number to use, or `0` to use the UART in poll > - mode only. > + mode only, or `msi` to set up a Message Signaled Interrupt. > * `` is the PCI location of the UART, in >`:.` notation. > * `` is the PCI bridge behind which is the UART, in > --- a/xen/arch/x86/msi.c > +++ b/xen/arch/x86/msi.c > @@ -742,6 +742,16 @@ static int msi_capability_init(struct pc > > *desc = entry; > /* Restore the original MSI enabled bits */ > +if ( !hardware_domain ) > +{ > +/* > + * ..., except for internal requests (before Dom0 starts), in which > + * case we rather need to behave "normally", i.e. not follow the > split > + * brain model where Dom0 actually enables MSI (and disables INTx). > + */ > +pci_intx(dev, 0); > +control |= PCI_MSI_FLAGS_ENABLE; Sorry for the split reply, I've been wondering about the MSI enabling and INTX disabling done here. Xen already owns other PCI devices (AMD IOMMU for example, see set_iommu_interrupt_handler) that use MSI, yet they seem to manage to work without this by doing a manual MSI enable (and I cannot figure out where the INTX disable is done). Shouldn't Xen have a more uniform way of dealing with MSI interrupt setup for such devices? And doesn't your change here imply that some code from the current internal MSI users should be dropped? There's a call to __msi_set_enable in the AMD IOMMU code (amd_iommu_msi_enable) that I guess can be dropped? Thanks, Roger. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH 8/9] libxl: Kill QEMU by uid when possible
> On Nov 28, 2018, at 4:56 PM, Ian Jackson wrote: > >> if (!xs_rm(CTX->xsh, XBT_NULL, path)) >> LOGD(ERROR, domid, "xs_rm failed for %s", path); >> >> -/* We should try to destroy the device model anyway. */ >> -rc = kill_device_model(gc, >> - GCSPRINTF("/local/domain/%d/image/device-model-pid", domid)); >> +/* >> + * We should try to destroy the device model anyway. Check to see >> + * if we can kill by UID >> + */ >> +ret = libxl__xs_read_checked(gc, XBT_NULL, >> + >> GCSPRINTF("/local/domain/%d/image/device-model-uid", >> + domid), >> + &dm_uid_str); > > I know this function is bad in its use of `rc' for syscall return but > please don't make it worse by introducing `ret' for what should be > `rc'. Would you mind adding a pre-patch to change `rc' to `r' and > then you can use `rc’ ? Actually, it looks like kill_device_model() returns a libxl error value. So we should use the existing rc for both kill_device_model() and xs_read_checked(), and introduce `r` to use for setresuid() &c -George ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v4 6/6] x86/microcode: Synchronize late microcode loading
On 11/30/18 4:46 AM, Jan Beulich wrote: On 29.11.18 at 23:43, wrote: >> One other comment about this patch (which IIRC was raised by Andrew on >> an earlier version) is that it may be worth to stop timer calibration. I >> am pretty sure we've seen deadlocks, which is why we ended up disabling >> it during microcode updates. > I recall the claim, but I don't think I've ever seen proof. I can't provide proof at this point, only somewhat vague memory of seeing calibration code in the stack dump. > My point was > ans still is that if there's a problem with ucode loading using the > stop-machine logic here, then there's a problem with the stop-machine > logic in general, which would make other uses, perhaps most notably > rcu_barrier(), unsafe too. Possibly. rcu_barrier() appears to be only used in cpu hotplug code and power management, and I don't know whether either has been tested under stress. In our case it would take multiple microcode updates on relatively large (~100 cpus) systems before we'd hit the deadlock. > Otoh from your reply it's not clear whether > the observed issue wasn't with our present way of loading ucode, > but then it would put under question the general correctness of > continue_hypercall_on_cpu(), which again we use for more than just > ucode loading. It was with a variation of this new code, not with what's currently in the tree. -boris ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v3 00/21] QOM'ify SysBusDeviceClass->init
On Fri, 30 Nov 2018 at 09:39, Mao Zhongyi wrote: > > v3 -> v2: > > - rebase to the HEAD > - use SysBusDevice *sbd variable in patch15 > Mao Zhongyi (21): > musicpal: Convert sysbus init function to realize function > block/noenand: Convert sysbus init function to realize function > char/grlib_apbuart: Convert sysbus init function to realize function > core/empty_slot: Convert sysbus init function to realize function > display/g364fb: Convert sysbus init function to realize function > dma/puv3_dma: Convert sysbus init function to realize function > gpio/puv3_gpio: Convert sysbus init function to realize function > milkymist-softusb: Convert sysbus init function to realize function > input/pl050: Convert sysbus init function to realize function > intc/puv3_intc: Convert sysbus init function to realize function > milkymist-hpdmc: Convert sysbus init function to realize function > milkymist-pfpu: Convert sysbus init function to realize function > puv3_pm.c: Convert sysbus init function to realize function > nvram/ds1225y: Convert sysbus init function to realize function > pci-bridge/dec: Convert sysbus init function to realize function > timer/etraxfs_timer: Convert sysbus init function to realize function > timer/grlib_gptimer: Convert sysbus init function to realize function > timer/puv3_ost: Convert sysbus init function to realize function > usb/tusb6010: Convert sysbus init function to realize function > xen_backend: remove xen_sysdev_init() function > core/sysbus: remove the SysBusDeviceClass::init path > > hw/arm/musicpal.c| 9 - > hw/block/onenand.c | 16 +++- > hw/char/grlib_apbuart.c | 12 +--- > hw/core/empty_slot.c | 9 - > hw/core/sysbus.c | 15 +-- > hw/display/g364fb.c | 9 +++-- > hw/dma/puv3_dma.c| 10 -- > hw/gpio/puv3_gpio.c | 29 ++--- > hw/input/milkymist-softusb.c | 16 +++- > hw/input/pl050.c | 11 +-- > hw/intc/puv3_intc.c | 11 --- > hw/misc/milkymist-hpdmc.c| 9 +++-- > hw/misc/milkymist-pfpu.c | 12 +--- > hw/misc/puv3_pm.c| 10 -- > hw/nvram/ds1225y.c | 12 +--- > hw/pci-bridge/dec.c | 12 ++-- > hw/timer/etraxfs_timer.c | 14 +++--- > hw/timer/grlib_gptimer.c | 11 +-- > hw/timer/puv3_ost.c | 13 ++--- > hw/usb/tusb6010.c| 8 +++- > hw/xen/xen_backend.c | 7 --- > include/hw/sysbus.h | 3 --- > 22 files changed, 106 insertions(+), 152 deletions(-) I think all these patches have now been reviewed. Does anybody have a preference for which tree it goes through, given that it touches lots of devices? I can take this via target-arm, unless anybody would particularly like to have some or all of it go some other route. thanks -- PMM ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH 3/9] libxl: Get rid of support for QEMU_USER_BASE (xen-qemuuser-domidNN)
On Fri, Nov 30, 2018 at 12:53:20PM +, George Dunlap wrote: > > > > On Nov 28, 2018, at 3:32 PM, Wei Liu wrote: > > > > On Fri, Nov 23, 2018 at 05:14:56PM +, George Dunlap wrote: > >> QEMU_USER_BASE allows a user to specify the UID to use when running > >> the devicemodel for a specific domain number. Unfortunately, this is > >> not really practical: It requires nearly 32,000 entries in > >> /etc/passwd. QEMU_USER_RANGE_BASE is much more practical. > >> > >> Remove support for QEMU_USER_BASE. > >> > >> Signed-off-by: George Dunlap > > > > I have no problem with you removing this, but you forgot to change > > xl.cfg.pod.5.in. > > Hmm… it seems what I *really* forgot to do was remove that entire > section and point to the feature doc instead. It’s quite out of date > now (e.g., cdrom insert should work I think, qemu is chrooted so can’t > read world-readable files, &c). With your permission I’ll put fixing > that up properly on a to-do list (to be done before 4.12 release at a > minimum). Sure. Wei. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH 2/2] x86/hvm: Corrections to RDTSCP intercept handling
For both VT-x and SVM, the RDTSCP intercept will trigger if the pipeline supports the instruction, but the guest may have not have rdtscp in its featureset. Bring the vmexit handlers in line with the main emulator behaviour by optionally handing back #UD. Next on the AMD side, if RDTSCP actually ends up being intercepted on a debug build, we first update regs->rcx, then call __get_instruction_length() asking for RDTSC. As the two instructions are different (and indeed, different lengths!), __get_instruction_length_from_list() fails and hands back a #GP fault. This can demonstrated by putting a guest into tsc_mode="always emulate" and executing an rdtscp instruction: (d1) --- Xen Test Framework --- (d1) Environment: HVM 64bit (Long mode 4 levels) (d1) Test rdtscp (d1) TSC mode 1 (XEN) emulate.c:159:d1v0 __get_instruction_length_from_list: Mismatch between expected and actual instruction: (XEN) emulate.c:163:d1v0 list[0] val 8, { opc 0xf0031, modrm 0 }, list entries: 1 (XEN) emulate.c:165:d1v0 rip 0x10475f, nextrip 0x104762, len 3 (XEN) Insn_len emulation failed (1): d1v0 64bit @ 0008:0010475f -> 0f 01 f9 5b 31 ff 31 c0 e9 c4 db ff ff 00 00 00 (d1) ** (d1) PANIC: Unhandled exception at 0008:0010475f (d1) Vec 13 #GP[] (d1) ** First, teach __get_instruction_length() to cope with RDTSCP, and improve svm_vmexit_do_rdtsc() to ask for the correct instruction. Move the regs->rcx adjustment into this function to ensure it gets done after we are done potentially raising faults. Reported-by: Paul Durrant Signed-off-by: Andrew Cooper --- CC: Jan Beulich CC: Wei Liu CC: Roger Pau Monné CC: Paul Durrant CC: Jun Nakajima CC: Kevin Tian CC: Boris Ostrovsky CC: Suravee Suthikulpanit CC: Brian Woods CC: Juergen Gross There is a further 4.12 bug/regression here. For some reason, master and staging are now defaulting VMs into an emulated TSC mode. I have yet to figure out why. --- xen/arch/x86/hvm/svm/emulate.c| 1 + xen/arch/x86/hvm/svm/svm.c| 22 +- xen/arch/x86/hvm/vmx/vmx.c| 8 xen/include/asm-x86/hvm/svm/emulate.h | 1 + 4 files changed, 27 insertions(+), 5 deletions(-) diff --git a/xen/arch/x86/hvm/svm/emulate.c b/xen/arch/x86/hvm/svm/emulate.c index 71a1b6e..0290264 100644 --- a/xen/arch/x86/hvm/svm/emulate.c +++ b/xen/arch/x86/hvm/svm/emulate.c @@ -78,6 +78,7 @@ static const struct { [INSTR_STGI]= { X86EMUL_OPC(0x0f, 0x01), MODRM(3, 3, 4) }, [INSTR_CLGI]= { X86EMUL_OPC(0x0f, 0x01), MODRM(3, 3, 5) }, [INSTR_INVLPGA] = { X86EMUL_OPC(0x0f, 0x01), MODRM(3, 3, 7) }, +[INSTR_RDTSCP] = { X86EMUL_OPC(0x0f, 0x01), MODRM(3, 7, 1) }, [INSTR_INVD]= { X86EMUL_OPC(0x0f, 0x08) }, [INSTR_WBINVD] = { X86EMUL_OPC(0x0f, 0x09) }, [INSTR_WRMSR] = { X86EMUL_OPC(0x0f, 0x30) }, diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c index b9a8900..d8d3813 100644 --- a/xen/arch/x86/hvm/svm/svm.c +++ b/xen/arch/x86/hvm/svm/svm.c @@ -2279,14 +2279,28 @@ static void svm_vmexit_do_hlt(struct vmcb_struct *vmcb, hvm_hlt(regs->eflags); } -static void svm_vmexit_do_rdtsc(struct cpu_user_regs *regs) +static void svm_vmexit_do_rdtsc(struct cpu_user_regs *regs, bool rdtscp) { +struct vcpu *curr = current; +const struct domain *currd = curr->domain; +enum instruction_index insn = rdtscp ? INSTR_RDTSCP : INSTR_RDTSC; unsigned int inst_len; -if ( (inst_len = __get_instruction_length(current, INSTR_RDTSC)) == 0 ) +if ( rdtscp && !currd->arch.cpuid->extd.rdtscp && + currd->arch.tsc_mode != TSC_MODE_PVRDTSCP ) +{ +hvm_inject_hw_exception(TRAP_invalid_op, X86_EVENT_NO_EC); return; +} + +if ( (inst_len = __get_instruction_length(curr, insn)) == 0 ) +return; + __update_guest_eip(regs, inst_len); +if ( rdtscp ) +regs->rcx = hvm_msr_tsc_aux(curr); + hvm_rdtsc_intercept(regs); } @@ -2968,10 +2982,8 @@ void svm_vmexit_handler(struct cpu_user_regs *regs) break; case VMEXIT_RDTSCP: -regs->rcx = hvm_msr_tsc_aux(v); -/* fall through */ case VMEXIT_RDTSC: -svm_vmexit_do_rdtsc(regs); +svm_vmexit_do_rdtsc(regs, exit_reason == VMEXIT_RDTSCP); break; case VMEXIT_MONITOR: diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c index 365eeb2..a9f9b9b 100644 --- a/xen/arch/x86/hvm/vmx/vmx.c +++ b/xen/arch/x86/hvm/vmx/vmx.c @@ -3589,6 +3589,7 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs) unsigned long exit_qualification, exit_reason, idtv_info, intr_info = 0; unsigned int vector = 0, mode; struct vcpu *v = current; +struct domain *currd = v->domain; __vmread(GUEST_RIP,®s->rip); __vmread(GUEST_RSP,®s->rsp); @@ -3956,6 +3957,13 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs) vmx_invlpg_
[Xen-devel] [PATCH 1/2] x86/svm: Improve diagnostics when __get_instruction_length_from_list() fails
Sadly, a lone: (XEN) emulate.c:156:d2v0 __get_instruction_length_from_list: Mismatch between expected and actual instruction: eip = f804564139c0 on the console is of no use trying to identify what went wrong. Dump as much state as we can to help identify what went wrong. Reported-by: Paul Durrant Signed-off-by: Andrew Cooper --- CC: Jan Beulich CC: Wei Liu CC: Roger Pau Monné CC: Paul Durrant CC: Boris Ostrovsky CC: Suravee Suthikulpanit CC: Brian Woods RFC: __get_instruction_length_from_list() tries to cope with VMEXIT_IOIO, but IN/OUT instructions aren't in the decode list and I can't spot an entry point from the IOIO path. Am I missing something? Also, I'm not entirely convinced that making modrm an annonymous union is going to work with older CentOS compilers, and therefore am not sure whether that part of the change is worth it. The instruction in question can be obtained from the printed INSN_ constant alone. --- xen/arch/x86/hvm/svm/emulate.c | 26 +++--- 1 file changed, 19 insertions(+), 7 deletions(-) diff --git a/xen/arch/x86/hvm/svm/emulate.c b/xen/arch/x86/hvm/svm/emulate.c index 3d04af0..71a1b6e 100644 --- a/xen/arch/x86/hvm/svm/emulate.c +++ b/xen/arch/x86/hvm/svm/emulate.c @@ -56,11 +56,14 @@ static unsigned long svm_nextrip_insn_length(struct vcpu *v) static const struct { unsigned int opcode; -struct { -unsigned int rm:3; -unsigned int reg:3; -unsigned int mod:2; -#define MODRM(mod, reg, rm) { rm, reg, mod } +union { +struct { +unsigned int rm:3; +unsigned int reg:3; +unsigned int mod:2; +}; +unsigned int raw; +#define MODRM(mod, reg, rm) {{ rm, reg, mod }} } modrm; } opc_tab[INSTR_MAX_COUNT] = { [INSTR_PAUSE] = { X86EMUL_OPC_F3(0, 0x90) }, @@ -152,8 +155,17 @@ int __get_instruction_length_from_list(struct vcpu *v, } gdprintk(XENLOG_WARNING, - "%s: Mismatch between expected and actual instruction: " - "eip = %lx\n", __func__, (unsigned long)vmcb->rip); + "%s: Mismatch between expected and actual instruction:\n", + __func__); +gdprintk(XENLOG_WARNING, + " list[0] val %d, { opc %#x, modrm %#x }, list entries: %u\n", + list[0], opc_tab[list[0]].opcode, opc_tab[list[0]].modrm.raw, + list_count); +gdprintk(XENLOG_WARNING, " rip 0x%lx, nextrip 0x%lx, len %lu\n", + vmcb->rip, vmcb->nextrip, vmcb->nextrip - vmcb->rip); +hvm_dump_emulation_state(XENLOG_G_WARNING, "Insn_len", + &ctxt, X86EMUL_UNHANDLEABLE); + hvm_inject_hw_exception(TRAP_gp_fault, 0); return 0; } -- 2.1.4 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH 0/2] Fixes to RDTSCP interception
For some reason (on EPYC at least) we've gained a regression into master whereby VMs are defaulting to one of the emulated TSC modes. This may be Xen coming to the conclusion that there isn't a reliable TSC. Combined with a debug Xen, this breaks RDTSCP completely. With this series in place and RDTSCP functioning correctly again, VMs are still unwilling to boot. I haven't managed to figure out why yet. Andrew Cooper (2): x86/svm: Improve diagnostics when __get_instruction_length_from_list() fails x86/hvm: Corrections to RDTSCP intercept handling xen/arch/x86/hvm/svm/emulate.c| 27 --- xen/arch/x86/hvm/svm/svm.c| 22 +- xen/arch/x86/hvm/vmx/vmx.c| 8 xen/include/asm-x86/hvm/svm/emulate.h | 1 + 4 files changed, 46 insertions(+), 12 deletions(-) -- 2.1.4 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v2 0/3] Remove tmem
Wei Liu writes ("[PATCH v2 0/3] Remove tmem"): > It is agreed that tmem can be removed from xen.git. See the thread starting > > > from . Those are notes from some phone call amongst industry stakeholders. None of the messages have a Subject line mentioning tmem. There is no explanation of the basis for the decision; just a confirmation from the current maintainers that they will ack the removal. I think this is not really an appropriate way to carry on! What if there is someone else who wants to step up to maintain this ? What about user communication ? Going straight from `Supported' to `Deleted' seems rather vigorous. In summary I think the claim that "It is agreed" in this cover letter is false (or, at least, if it is true, the cover letter provides no references to any basis for thinking that it is true). If it didn't happen on the mailing list it didn't happen. Unfortunately, therefore, on process grounds, Nacked-by: Ian Jackson I dare say the decision to remove it now might be right. Can we please start this again with a proper explanation of why this should be summarily deleted, rather than (say) made unmaintained and deprecated for a release ? Can someone explain why we don't feel the need to consult anyone by (say) posting to xen-announce ? etc. Then we can actually have an on-list discussion where the decision would be taken. Next time I suggest a good first step would be a patch which deletes the M: line from MAINTAINERS and changes the status to Orphan, since obviously the current maintainers don't want it. That patch should be uncontroversial. Also in general, depending who we think might be using a feature, a plan which gives some warning to users (by deprecating the feature, for example) would often be a good idea. Ian. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v2 1/3] tools: remove tmem code and commands
Wei Liu writes ("[PATCH v2 1/3] tools: remove tmem code and commands"): > Remove all tmem related code in libxc. > > Leave some stubs in libxl in case anyone has linked to those functions > before the removal. > > Remove all tmem related commands in xl, all tmem related code in other > utilities we ship. Amazingly I see nothing in the libxl domain config about this. If there were then we would have to decide what to do if the domain had tmem-related config. But AFAICT there is could be no such thing. On a technical level therefore, there is nothing wrong with this patch. However, for the process reasons I have explained in my other message, Nacked-by: Ian Jackson Sorry, Ian. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH 1/2] x86/svm: Improve diagnostics when __get_instruction_length_from_list() fails
On Fri, Nov 30, 2018 at 05:07:19PM +, Andy Cooper wrote: > Sadly, a lone: > > (XEN) emulate.c:156:d2v0 __get_instruction_length_from_list: Mismatch > between expected and actual instruction: eip = f804564139c0 > > on the console is of no use trying to identify what went wrong. Dump as much > state as we can to help identify what went wrong. > > Reported-by: Paul Durrant > Signed-off-by: Andrew Cooper Acked-by: Brian Woods -- Brian Woods ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH for-4.12] xen/arm: gic: Make sure the number of interrupt lines is valid before using it
GICv2 and GICv3 supports up to 1020 interrupts. However, the value computed from GICD_TYPER.ITLinesNumber can be up to 1024. On GICv3, we will end up to write in reserved registers that are right after the IROUTERs one as the value is not capped early enough. Cap the number of interrupts as soon as we compute it so we know we can safely using it afterwards. Signed-off-by: Julien Grall Reported-by: Jan-Peter Larsson --- This patch should be backport up to Xen 4.9. --- xen/arch/arm/gic-v2.c | 7 --- xen/arch/arm/gic-v3.c | 7 --- 2 files changed, 8 insertions(+), 6 deletions(-) diff --git a/xen/arch/arm/gic-v2.c b/xen/arch/arm/gic-v2.c index 1a744c576f..e9fb8a01ab 100644 --- a/xen/arch/arm/gic-v2.c +++ b/xen/arch/arm/gic-v2.c @@ -353,6 +353,10 @@ static void __init gicv2_dist_init(void) type = readl_gicd(GICD_TYPER); nr_lines = 32 * ((type & GICD_TYPE_LINES) + 1); +/* Only 1020 interrupts are supported */ +nr_lines = min(1020U, nr_lines); +gicv2_info.nr_lines = nr_lines; + gic_cpus = 1 + ((type & GICD_TYPE_CPUS) >> 5); printk("GICv2: %d lines, %d cpu%s%s (IID %8.8x).\n", nr_lines, gic_cpus, (gic_cpus == 1) ? "" : "s", @@ -377,9 +381,6 @@ static void __init gicv2_dist_init(void) for ( i = 32; i < nr_lines; i += 32 ) writel_gicd(~0x0, GICD_ICENABLER + (i / 32) * 4); -/* Only 1020 interrupts are supported */ -gicv2_info.nr_lines = min(1020U, nr_lines); - /* Turn on the distributor */ writel_gicd(GICD_CTL_ENABLE, GICD_CTLR); } diff --git a/xen/arch/arm/gic-v3.c b/xen/arch/arm/gic-v3.c index 6fbc106757..c9200d24e1 100644 --- a/xen/arch/arm/gic-v3.c +++ b/xen/arch/arm/gic-v3.c @@ -607,6 +607,10 @@ static void __init gicv3_dist_init(void) if ( type & GICD_TYPE_LPIS ) gicv3_lpi_init_host_lpis(GICD_TYPE_ID_BITS(type)); +/* Only 1020 interrupts are supported */ +nr_lines = min(1020U, nr_lines); +gicv3_info.nr_lines = nr_lines; + printk("GICv3: %d lines, (IID %8.8x).\n", nr_lines, readl_relaxed(GICD + GICD_IIDR)); @@ -646,9 +650,6 @@ static void __init gicv3_dist_init(void) for ( i = NR_GIC_LOCAL_IRQS; i < nr_lines; i++ ) writeq_relaxed(affinity, GICD + GICD_IROUTER + i * 8); - -/* Only 1020 interrupts are supported */ -gicv3_info.nr_lines = min(1020U, nr_lines); } static int gicv3_enable_redist(void) -- 2.11.0 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH 2/2] x86/hvm: Corrections to RDTSCP intercept handling
On Fri, Nov 30, 2018 at 05:07:20PM +, Andy Cooper wrote: > For both VT-x and SVM, the RDTSCP intercept will trigger if the pipeline > supports the instruction, but the guest may have not have rdtscp in its > featureset. Bring the vmexit handlers in line with the main emulator > behaviour by optionally handing back #UD. > > Next on the AMD side, if RDTSCP actually ends up being intercepted on a debug > build, we first update regs->rcx, then call __get_instruction_length() asking > for RDTSC. As the two instructions are different (and indeed, different > lengths!), __get_instruction_length_from_list() fails and hands back a #GP > fault. > > This can demonstrated by putting a guest into tsc_mode="always emulate" and > executing an rdtscp instruction: > > (d1) --- Xen Test Framework --- > (d1) Environment: HVM 64bit (Long mode 4 levels) > (d1) Test rdtscp > (d1) TSC mode 1 > (XEN) emulate.c:159:d1v0 __get_instruction_length_from_list: Mismatch > between expected and actual instruction: > (XEN) emulate.c:163:d1v0 list[0] val 8, { opc 0xf0031, modrm 0 }, list > entries: 1 > (XEN) emulate.c:165:d1v0 rip 0x10475f, nextrip 0x104762, len 3 > (XEN) Insn_len emulation failed (1): d1v0 64bit @ 0008:0010475f -> 0f 01 f9 > 5b 31 ff 31 c0 e9 c4 db ff ff 00 00 00 > (d1) ** > (d1) PANIC: Unhandled exception at 0008:0010475f > (d1) Vec 13 #GP[] > (d1) ** > > First, teach __get_instruction_length() to cope with RDTSCP, and improve > svm_vmexit_do_rdtsc() to ask for the correct instruction. Move the regs->rcx > adjustment into this function to ensure it gets done after we are done > potentially raising faults. > > Reported-by: Paul Durrant > Signed-off-by: Andrew Cooper As far as the SVM part: Reviewed-by: Brian Woods -- Brian Woods ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] Backport candidate for Arm
Hi, Below a list of backport candidate for Arm. For Xen 4.10+ to handle correctly SMC call parameters and result 35fc608612 xen/arm: smccc-1.1: Make return values unsigned long fa7974f743 xen/arm: smccc-1.1: Handle function result as parameters For Xen 4.9+ to avoid Dom0 crash when using less vCPUs than pCPUs on GICv3 703d9d5ec1 xen/arm: vgic-v3: Delay the initialization of the domain information 54ec59f6b0 xen/arm: vgic-v3: Don't create empty re-distributor regions The following patch is required in Xen 4.11 to avoid breaking the new vGIC after applying the 2 previous patches. 62aa9e7f1b xen/arm: Don't build GICv3 with the new vGIC For 4.9+ to make the interrupt path safer by adding missing barriers 177afec455 xen/arm: gic: Ensure we have an ISB between ack and do_IRQ() 555e5f1bd2 xen/arm: gic: Ensure ordering between read of INTACK and shared data For 4.9+ to comply with the binding description 3689c54630 xen/arm: check for multiboot nodes only under /chosen Cheers, -- Julien Grall ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [xen-unstable-smoke test] 130877: tolerable all pass - PUSHED
flight 130877 xen-unstable-smoke real [real] http://logs.test-lab.xenproject.org/osstest/logs/130877/ Failures :-/ but no regressions. Tests which did not succeed, but are not blocking: test-amd64-amd64-libvirt 13 migrate-support-checkfail never pass test-armhf-armhf-xl 13 migrate-support-checkfail never pass test-armhf-armhf-xl 14 saverestore-support-checkfail never pass version targeted for testing: xen 82855aba5bf91e50c81526167c11d4aeaf665e66 baseline version: xen 2634b997afabfdc5a972e07e536dfbc6febb4385 Last test of basis 130873 2018-11-30 12:00:40 Z0 days Testing same since 130877 2018-11-30 15:00:26 Z0 days1 attempts People who touched revisions under test: Andrew Cooper Olaf Hering Wei Liu jobs: build-amd64 pass build-armhf pass build-amd64-libvirt pass test-armhf-armhf-xl pass test-amd64-amd64-xl-qemuu-debianhvm-i386 pass test-amd64-amd64-libvirt pass sg-report-flight on osstest.test-lab.xenproject.org logs: /home/logs/logs images: /home/logs/images Logs, config files, etc. are available at http://logs.test-lab.xenproject.org/osstest/logs Explanation of these reports, and of osstest in general, is at http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master Test harness code can be found at http://xenbits.xen.org/gitweb?p=osstest.git;a=summary Pushing revision : To xenbits.xen.org:/home/xen/git/xen.git 2634b997af..82855aba5b 82855aba5bf91e50c81526167c11d4aeaf665e66 -> smoke ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Linux 4.19.5 fails to boot as Xen dom0
On 11/30/18 5:21 PM, Kirill A. Shutemov wrote: > On Fri, Nov 30, 2018 at 02:53:50PM +, Hans van Kranenburg wrote: >> On 11/30/18 2:26 PM, Kirill A. Shutemov wrote: >>> On Fri, Nov 30, 2018 at 01:11:56PM +, Hans van Kranenburg wrote: On 11/29/18 4:06 PM, Kirill A. Shutemov wrote: > On Thu, Nov 29, 2018 at 03:00:45PM +, Juergen Gross wrote: >> On 29/11/2018 15:32, Kirill A. Shutemov wrote: >>> On Thu, Nov 29, 2018 at 02:24:47PM +, Kirill A. Shutemov wrote: On Thu, Nov 29, 2018 at 01:35:17PM +, Juergen Gross wrote: > On 29/11/2018 14:26, Kirill A. Shutemov wrote: >> On Thu, Nov 29, 2018 at 09:41:25AM +, Juergen Gross wrote: >>> On 29/11/2018 02:22, Hans van Kranenburg wrote: Hi, As also seen at: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=914951 Attached there are two serial console output logs. One is starting with Xen 4.11 (from debian unstable) as dom0, and the other one without Xen. [2.085543] BUG: unable to handle kernel paging request at 888d9fffc000 [2.085610] PGD 200c067 P4D 200c067 PUD 0 [2.085674] Oops: [#1] SMP NOPTI [2.085736] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 4.19.0-trunk-amd64 #1 Debian 4.19.5-1~exp1+pvh1 [2.085823] Hardware name: HP ProLiant DL360 G7, BIOS P68 05/21/2018 [2.085895] RIP: e030:ptdump_walk_pgd_level_core+0x1fd/0x490 [...] >>> >>> The offending stable commit is >>> 4074ca7d8a1832921c865d250bbd08f3441b3657 >>> ("x86/mm: Move LDT remap out of KASLR region on 5-level paging"), >>> this >>> is commit d52888aa2753e3063a9d3a0c9f72f94aa9809c15 upstream. >>> >>> Current upstream kernel is booting fine under Xen, so in general the >>> patch should be fine. Using an upstream kernel built from above >>> commit >>> (with the then needed Xen fixup patch 1457d8cf7664f34c4ba534) is >>> fine, >>> too. >>> >>> Kirill, are you aware of any prerequisite patch from 4.20 which >>> could be >>> missing in 4.19.5? >> >> I'm not. >> >> Let me look into this. >> > > What is making me suspicious is the failure happening just after > releasing the init memory. Maybe there is an access to .init.data > segment or similar? The native kernel booting could be related to the > usage of 2M mappings not being available in a PV-domain. Ahh.. Could you test this: diff --git a/arch/x86/mm/dump_pagetables.c b/arch/x86/mm/dump_pagetables.c index a12afff146d1..7dec63ec7aab 100644 --- a/arch/x86/mm/dump_pagetables.c +++ b/arch/x86/mm/dump_pagetables.c @@ -496,7 +496,7 @@ static inline bool is_hypervisor_range(int idx) * 8000 - 87ff is reserved for * the hypervisor. */ - return (idx >= pgd_index(__PAGE_OFFSET) - 16) && + return (idx >= pgd_index(__PAGE_OFFSET) - 17) && (idx < pgd_index(__PAGE_OFFSET)); #else return false; >>> >>> Or, better, this: >> >> That makes it boot again! >> >> Any idea why upstream doesn't need it? > > Nope. > > I'll prepare a proper fix. > Thanks for looking into this. In the meantime, I applied the "Or, better, this" change, and my dom0 boots again. FYI, boot log now: (paste 90d valid) https://paste.debian.net/plainh/48940826 >>> >>> I forgot to CC you: >>> >>> https://lkml.kernel.org/r/20181130121131.g3xvlvixv7mvl...@black.fi.intel.com >>> >>> Please give it a try. >> >> I'm not in that thread, so my response here... >> >> You paste a v2-like patch into 'Re: [PATCH 1/2]'. Juergen says: >> s/LDT_PGD_ENTRY/GUARD_HOLE_PGD_ENTRY/, then you say Ughh.., change it to >> GUARD_HOLE_ENTRY, which does not exist, and then get a Reviewed-by from >> Juergen. >> >> I guess it has to be GUARD_HOLE_PGD_ENTRY after all... >> >> arch/x86/include/asm/pgtable_64_types.h:116:31: error: >> 'GUARD_HOLE_ENTRY' undeclared (first use in this function); did you mean >> 'GUARD_HOLE_PGD_ENTRY'? >> >> I'll test that instead. > > Yes, thank you. It was a long week... :/ > > Let me know if it works. I'll repost the fixed version with your > Tested-by. Ok. It boots fine as Xen dom0. \o/ You can use "Hans van Kranenburg " (lowercase please) for reported/tested in the real v2. Hans ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mail
Re: [Xen-devel] [PATCH v3] pvcalls-front: Avoid __get_free_pages(GFP_KERNEL) under spinlock
On 11/30/18 6:01 AM, Wen Yang wrote: > The problem is that we call this with a spin lock held. > The call tree is: > pvcalls_front_accept() holds bedata->socket_lock. > -> create_active() > -> __get_free_pages() uses GFP_KERNEL > > The create_active() function is only called from pvcalls_front_accept() > with a spin_lock held, The allocation is not allowed to sleep and > GFP_KERNEL is not sufficient. > > This issue was detected by using the Coccinelle software. > > v2: Add a function doing the allocations which is called > outside the lock and passing the allocated data to > create_active(). > v3: Use the matching deallocators i.e., free_page() > and free_pages(), respectively. > > Suggested-by: Juergen Gross > Signed-off-by: Wen Yang > CC: Julia Lawall > CC: Boris Ostrovsky > CC: Juergen Gross > CC: Stefano Stabellini > CC: xen-devel@lists.xenproject.org > CC: linux-ker...@vger.kernel.org > --- > drivers/xen/pvcalls-front.c | 71 ++--- > 1 file changed, 59 insertions(+), 12 deletions(-) > > diff --git a/drivers/xen/pvcalls-front.c b/drivers/xen/pvcalls-front.c > index 2f11ca72a281..a26f416daf46 100644 > --- a/drivers/xen/pvcalls-front.c > +++ b/drivers/xen/pvcalls-front.c > @@ -335,7 +335,43 @@ int pvcalls_front_socket(struct socket *sock) > return ret; > } > > -static int create_active(struct sock_mapping *map, int *evtchn) > +struct sock_mapping_active_ring { > + struct pvcalls_data_intf *ring; > + RING_IDX ring_order; > + void *bytes; > +}; > + > +static int alloc_active_ring(struct sock_mapping_active_ring *active_ring) > +{ > + active_ring->ring = NULL; This is not necessary. > + active_ring->bytes = NULL; > + > + active_ring->ring = (struct pvcalls_data_intf *) > + __get_free_page(GFP_KERNEL | __GFP_ZERO); > + if (active_ring->ring == NULL) > + goto out_error; > + active_ring->ring_order = PVCALLS_RING_ORDER; > + active_ring->bytes = (void *)__get_free_pages(GFP_KERNEL | __GFP_ZERO, > + PVCALLS_RING_ORDER); > + if (active_ring->bytes == NULL) > + goto out_error; > + > + return 0; > + > +out_error: > + free_pages((unsigned long)active_ring->bytes, active_ring->ring_order); > + free_page((unsigned long)active_ring->ring); > + return -ENOMEM; > +} > + > @@ -397,6 +427,7 @@ int pvcalls_front_connect(struct socket *sock, struct > sockaddr *addr, > struct sock_mapping *map = NULL; > struct xen_pvcalls_request *req; > int notify, req_id, ret, evtchn; > + struct sock_mapping_active_ring active_ring; > > if (addr->sa_family != AF_INET || sock->type != SOCK_STREAM) > return -EOPNOTSUPP; > @@ -406,15 +437,21 @@ int pvcalls_front_connect(struct socket *sock, struct > sockaddr *addr, > return PTR_ERR(map); > > bedata = dev_get_drvdata(&pvcalls_front_dev->dev); > + ret = alloc_active_ring(&active_ring); Why not just alloc_active_ring(map)? -boris ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH RFCv2 1/4] mm/memory_hotplug: Introduce memory block types
Memory onlining should always be handled by user space, because only user space knows which use cases it wants to satisfy. E.g. memory might be onlined to the MOVABLE zone even if it can never be removed from the system, e.g. to make usage of huge pages more reliable. However to implement such rules (especially default rules in distributions) we need more information about the memory that was added in user space. E.g. on x86 we want to online memory provided by balloon devices (e.g. XEN, Hyper-V) differently (-> will not be unplugged by offlining the whole block) than ordinary DIMMs (-> might eventually be unplugged by offlining the whole block). This might also become relevat for other architectures. Also, udev rules right now check if running on s390x and treat all added memory blocks as standby memory (-> don't online automatically). As soon as we support other memory hotplug mechanism (e.g. virtio-mem) checks would have to get more involved (e.g. also check if under KVM) but eventually also wrong (e.g. if KVM ever supports standby memory we are doomed). I decided to allow to specify the type of memory that is getting added to the system. Let's start with two types, BOOT and UNSPECIFIED to get the basic infrastructure running. We'll introduce and use further types in follow-up patches. For now we classify any hotplugged memory temporarily as as UNSPECIFIED (which will eventually be dropped later on). Cc: Greg Kroah-Hartman Cc: "Rafael J. Wysocki" Cc: Andrew Morton Cc: Ingo Molnar Cc: Pavel Tatashin Cc: Stephen Rothwell Cc: Andrew Banman Cc: "mike.tra...@hpe.com" Cc: Oscar Salvador Cc: Dave Hansen Cc: Michal Hocko Cc: Michal Suchánek Cc: Vitaly Kuznetsov Cc: Dan Williams Cc: Pavel Tatashin Cc: Martin Schwidefsky Cc: Heiko Carstens Signed-off-by: David Hildenbrand --- drivers/base/memory.c | 38 +++--- include/linux/memory.h | 27 +++ 2 files changed, 62 insertions(+), 3 deletions(-) diff --git a/drivers/base/memory.c b/drivers/base/memory.c index 0c290f86ab20..17f2985c07c5 100644 --- a/drivers/base/memory.c +++ b/drivers/base/memory.c @@ -381,6 +381,29 @@ static ssize_t show_phys_device(struct device *dev, return sprintf(buf, "%d\n", mem->phys_device); } +static ssize_t type_show(struct device *dev, struct device_attribute *attr, +char *buf) +{ + struct memory_block *mem = to_memory_block(dev); + ssize_t len = 0; + + switch (mem->type) { + case MEMORY_BLOCK_UNSPECIFIED: + len = sprintf(buf, "unspecified\n"); + break; + case MEMORY_BLOCK_BOOT: + len = sprintf(buf, "boot\n"); + break; + default: + len = sprintf(buf, "ERROR-UNKNOWN-%ld\n", + mem->state); + WARN_ON(1); + break; + } + + return len; +} + #ifdef CONFIG_MEMORY_HOTREMOVE static void print_allowed_zone(char *buf, int nid, unsigned long start_pfn, unsigned long nr_pages, int online_type, @@ -442,6 +465,7 @@ static DEVICE_ATTR(phys_index, 0444, show_mem_start_phys_index, NULL); static DEVICE_ATTR(state, 0644, show_mem_state, store_mem_state); static DEVICE_ATTR(phys_device, 0444, show_phys_device, NULL); static DEVICE_ATTR(removable, 0444, show_mem_removable, NULL); +static DEVICE_ATTR_RO(type); /* * Block size attribute stuff @@ -620,6 +644,7 @@ static struct attribute *memory_memblk_attrs[] = { &dev_attr_state.attr, &dev_attr_phys_device.attr, &dev_attr_removable.attr, + &dev_attr_type.attr, #ifdef CONFIG_MEMORY_HOTREMOVE &dev_attr_valid_zones.attr, #endif @@ -657,13 +682,17 @@ int register_memory(struct memory_block *memory) } static int init_memory_block(struct memory_block **memory, -struct mem_section *section, unsigned long state) +struct mem_section *section, unsigned long state, +int type) { struct memory_block *mem; unsigned long start_pfn; int scn_nr; int ret = 0; + if (type == MEMORY_BLOCK_NONE) + return -EINVAL; + mem = kzalloc(sizeof(*mem), GFP_KERNEL); if (!mem) return -ENOMEM; @@ -675,6 +704,7 @@ static int init_memory_block(struct memory_block **memory, mem->state = state; start_pfn = section_nr_to_pfn(mem->start_section_nr); mem->phys_device = arch_get_memory_phys_device(start_pfn); + mem->type = type; ret = register_memory(mem); @@ -699,7 +729,8 @@ static int add_memory_block(int base_section_nr) if (section_count == 0) return 0; - ret = init_memory_block(&mem, __nr_to_section(section_nr), MEM_ONLINE); + ret = init_memory_block(&mem, __nr_to_section(section_nr), MEM_ONLINE, + MEMORY_BLOCK_BOOT);
[Xen-devel] [PATCH RFCv2 3/4] mm/memory_hotplug: Introduce and use more memory types
Let's introduce new types for different kinds of memory blocks and use them in existing code. As I don't see an easy way to split this up, do it in one hunk for now. acpi: Use DIMM or DIMM_UNREMOVABLE depending on hotremove support in the kernel. Properly change the type when trying to add memory that was already detected and used during boot (so this memory will correctly end up as "acpi" in user space). pseries: Use DIMM or DIMM_UNREMOVABLE depending on hotremove support in the kernel. As far as I see, handling like in the acpi case for existing blocks is not required. probed memory from user space: Use DIMM_UNREMOVABLE as there is no interface to get rid of this code again. hv_balloon,xen/balloon: Use BALLOON. As simple as that :) s390x/sclp: Use a dedicated type S390X_STANDBY as this type of memory and it's semantics are very s390x specific. powernv/memtrace: Only allow to use BOOT memory for memtrace. I consider this code in general dangerous, but we have to keep it working ... most probably just a debug feature. Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Michael Ellerman Cc: "Rafael J. Wysocki" Cc: Len Brown Cc: Greg Kroah-Hartman Cc: "K. Y. Srinivasan" Cc: Haiyang Zhang Cc: Stephen Hemminger Cc: Martin Schwidefsky Cc: Heiko Carstens Cc: Boris Ostrovsky Cc: Juergen Gross Cc: Stefano Stabellini Cc: Rashmica Gupta Cc: Andrew Morton Cc: Pavel Tatashin Cc: Balbir Singh Cc: Michael Neuling Cc: Nathan Fontenot Cc: YueHaibing Cc: Vasily Gorbik Cc: Ingo Molnar Cc: Stephen Rothwell Cc: "mike.tra...@hpe.com" Cc: Oscar Salvador Cc: Joonsoo Kim Cc: Mathieu Malaterre Cc: Michal Hocko Cc: Arun KS Cc: Andrew Banman Cc: Dave Hansen Cc: Michal Suchánek Cc: Vitaly Kuznetsov Cc: Dan Williams Signed-off-by: David Hildenbrand --- At first I tried to abstract the types quite a lot, but I think there are subtle differences that are worth differentiating. More details about the types can be found in the excessive documentation. It is wort noting that BALLOON_MOVABLE has no user yet, but I have something in mind that might want to make use of that (virtio-mem). Just included it to discuss the general approach. I can drop it from this patch. --- arch/powerpc/platforms/powernv/memtrace.c | 9 ++-- .../platforms/pseries/hotplug-memory.c| 7 ++- drivers/acpi/acpi_memhotplug.c| 16 ++- drivers/base/memory.c | 18 ++- drivers/hv/hv_balloon.c | 3 +- drivers/s390/char/sclp_cmd.c | 3 +- drivers/xen/balloon.c | 2 +- include/linux/memory.h| 47 ++- include/linux/memory_hotplug.h| 6 +-- mm/memory_hotplug.c | 15 +++--- 10 files changed, 104 insertions(+), 22 deletions(-) diff --git a/arch/powerpc/platforms/powernv/memtrace.c b/arch/powerpc/platforms/powernv/memtrace.c index 248a38ad25c7..5d08db87091e 100644 --- a/arch/powerpc/platforms/powernv/memtrace.c +++ b/arch/powerpc/platforms/powernv/memtrace.c @@ -54,9 +54,9 @@ static const struct file_operations memtrace_fops = { .open = simple_open, }; -static int check_memblock_online(struct memory_block *mem, void *arg) +static int check_memblock_boot_and_online(struct memory_block *mem, void *arg) { - if (mem->state != MEM_ONLINE) + if (mem->type != MEM_BLOCK_BOOT || mem->state != MEM_ONLINE) return -1; return 0; @@ -77,7 +77,7 @@ static bool memtrace_offline_pages(u32 nid, u64 start_pfn, u64 nr_pages) u64 end_pfn = start_pfn + nr_pages - 1; if (walk_memory_range(start_pfn, end_pfn, NULL, - check_memblock_online)) + check_memblock_boot_and_online)) return false; walk_memory_range(start_pfn, end_pfn, (void *)MEM_GOING_OFFLINE, @@ -233,7 +233,8 @@ static int memtrace_online(void) ent->mem = 0; } - if (add_memory(ent->nid, ent->start, ent->size)) { + if (add_memory(ent->nid, ent->start, ent->size, + MEMORY_BLOCK_BOOT)) { pr_err("Failed to add trace memory to node %d\n", ent->nid); ret += 1; diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c b/arch/powerpc/platforms/pseries/hotplug-memory.c index 2a983b5a52e1..5f91359c7993 100644 --- a/arch/powerpc/platforms/pseries/hotplug-memory.c +++ b/arch/powerpc/platforms/pseries/hotplug-memory.c @@ -651,7 +651,7 @@ static int dlpar_memory_remove_by_ic(u32 lmbs_to_remove, u32 drc_index) static int dlpar_add_lmb(struct drmem_lmb *lmb) { unsigned long block_sz; - int nid, rc; + int nid, rc, type = MEMORY_BLOCK_DIMM; if (lmb->flags & DRCONF_MEM_ASSIGNED) return -EINVAL; @@ -667,8 +667,11 @@ static int dlpar_add_lmb(struct drmem_
[Xen-devel] [PATCH RFCv2 0/4] mm/memory_hotplug: Introduce memory block types
This is the second approach, introducing more meaningful memory block types and not changing online behavior in the kernel. It is based on latest linux-next. As we found out during dicussion, user space should always handle onlining of memory, in any case. However in order to make smart decisions in user space about if and how to online memory, we have to export more information about memory blocks. This way, we can formulate rules in user space. One such information is the type of memory block we are talking about. This helps to answer some questions like: - Does this memory block belong to a DIMM? - Can this DIMM theoretically ever be unplugged again? - Was this memory added by a balloon driver that will rely on balloon inflation to remove chunks of that memory again? Which zone is advised? - Is this special standby memory on s390x that is usually not automatically onlined? And in short it helps to answer to some extend (excluding zone imbalances) - Should I online this memory block? - To which zone should I online this memory block? ... of course special use cases will result in different anwers. But that's why user space has control of onlining memory. More details can be found in Patch 1 and Patch 3. Tested on x86 with hotplugged DIMMs. Cross-compiled for PPC and s390x. Example: $ udevadm info -q all -a /sys/devices/system/memory/memory0 KERNEL=="memory0" SUBSYSTEM=="memory" DRIVER=="" ATTR{online}=="1" ATTR{phys_device}=="0" ATTR{phys_index}=="" ATTR{removable}=="0" ATTR{state}=="online" ATTR{type}=="boot" ATTR{valid_zones}=="none" $ udevadm info -q all -a /sys/devices/system/memory/memory90 KERNEL=="memory90" SUBSYSTEM=="memory" DRIVER=="" ATTR{online}=="1" ATTR{phys_device}=="0" ATTR{phys_index}=="005a" ATTR{removable}=="1" ATTR{state}=="online" ATTR{type}=="dimm" ATTR{valid_zones}=="Normal" RFC -> RFCv2: - Now also taking care of PPC (somehow missed it :/ ) - Split the series up to some degree (some ideas on how to split up patch 3 would be very welcome) - Introduce more memory block types. Turns out abstracting too much was rather confusing and not helpful. Properly document them. Notes: - I wanted to convert the enum of types into a named enum but this provoked all kinds of different errors. For now, I am doing it just like the other types (e.g. online_type) we are using in that context. - The "removable" property should never have been named like that. It should have been "offlinable". Can we still rename that? E.g. boot memory is sometimes marked as removable ... David Hildenbrand (4): mm/memory_hotplug: Introduce memory block types mm/memory_hotplug: Replace "bool want_memblock" by "int type" mm/memory_hotplug: Introduce and use more memory types mm/memory_hotplug: Drop MEMORY_TYPE_UNSPECIFIED arch/ia64/mm/init.c | 4 +- arch/powerpc/mm/mem.c | 4 +- arch/powerpc/platforms/powernv/memtrace.c | 9 +-- .../platforms/pseries/hotplug-memory.c| 7 +- arch/s390/mm/init.c | 4 +- arch/sh/mm/init.c | 4 +- arch/x86/mm/init_32.c | 4 +- arch/x86/mm/init_64.c | 8 +-- drivers/acpi/acpi_memhotplug.c| 16 - drivers/base/memory.c | 60 ++-- drivers/hv/hv_balloon.c | 3 +- drivers/s390/char/sclp_cmd.c | 3 +- drivers/xen/balloon.c | 2 +- include/linux/memory.h| 69 ++- include/linux/memory_hotplug.h| 18 ++--- kernel/memremap.c | 6 +- mm/memory_hotplug.c | 29 17 files changed, 194 insertions(+), 56 deletions(-) -- 2.17.2 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH RFCv2 2/4] mm/memory_hotplug: Replace "bool want_memblock" by "int type"
Let's pass a memory block type instead. Pass "MEMORY_BLOCK_NONE" for device memory and for now "MEMORY_BLOCK_UNSPECIFIED" for anything else. No functional change. Cc: Tony Luck Cc: Fenghua Yu Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Michael Ellerman Cc: Martin Schwidefsky Cc: Heiko Carstens Cc: Yoshinori Sato Cc: Rich Felker Cc: Dave Hansen Cc: Andy Lutomirski Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: Ingo Molnar Cc: Borislav Petkov Cc: "H. Peter Anvin" Cc: x...@kernel.org Cc: Greg Kroah-Hartman Cc: "Rafael J. Wysocki" Cc: Andrew Morton Cc: Mike Rapoport Cc: Michal Hocko Cc: Dan Williams Cc: "Kirill A. Shutemov" Cc: Oscar Salvador Cc: Nicholas Piggin Cc: Stephen Rothwell Cc: Christophe Leroy Cc: "Jonathan Neuschäfer" Cc: Mauricio Faria de Oliveira Cc: Vasily Gorbik Cc: Arun KS Cc: Rob Herring Cc: Pavel Tatashin Cc: "mike.tra...@hpe.com" Cc: Joonsoo Kim Cc: Wei Yang Cc: Logan Gunthorpe Cc: "Jérôme Glisse" Cc: "Jan H. Schönherr" Cc: Dave Jiang Cc: Matthew Wilcox Cc: Mathieu Malaterre Signed-off-by: David Hildenbrand --- arch/ia64/mm/init.c| 4 ++-- arch/powerpc/mm/mem.c | 4 ++-- arch/s390/mm/init.c| 4 ++-- arch/sh/mm/init.c | 4 ++-- arch/x86/mm/init_32.c | 4 ++-- arch/x86/mm/init_64.c | 8 drivers/base/memory.c | 11 +++ include/linux/memory.h | 2 +- include/linux/memory_hotplug.h | 12 ++-- kernel/memremap.c | 6 -- mm/memory_hotplug.c| 16 11 files changed, 40 insertions(+), 35 deletions(-) diff --git a/arch/ia64/mm/init.c b/arch/ia64/mm/init.c index 904fe55e10fc..408635d2902f 100644 --- a/arch/ia64/mm/init.c +++ b/arch/ia64/mm/init.c @@ -646,13 +646,13 @@ mem_init (void) #ifdef CONFIG_MEMORY_HOTPLUG int arch_add_memory(int nid, u64 start, u64 size, struct vmem_altmap *altmap, - bool want_memblock) + int type) { unsigned long start_pfn = start >> PAGE_SHIFT; unsigned long nr_pages = size >> PAGE_SHIFT; int ret; - ret = __add_pages(nid, start_pfn, nr_pages, altmap, want_memblock); + ret = __add_pages(nid, start_pfn, nr_pages, altmap, type); if (ret) printk("%s: Problem encountered in __add_pages() as ret=%d\n", __func__, ret); diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c index b3c9ee5c4f78..e394637da270 100644 --- a/arch/powerpc/mm/mem.c +++ b/arch/powerpc/mm/mem.c @@ -118,7 +118,7 @@ int __weak remove_section_mapping(unsigned long start, unsigned long end) } int __meminit arch_add_memory(int nid, u64 start, u64 size, struct vmem_altmap *altmap, - bool want_memblock) + int type) { unsigned long start_pfn = start >> PAGE_SHIFT; unsigned long nr_pages = size >> PAGE_SHIFT; @@ -135,7 +135,7 @@ int __meminit arch_add_memory(int nid, u64 start, u64 size, struct vmem_altmap * } flush_inval_dcache_range(start, start + size); - return __add_pages(nid, start_pfn, nr_pages, altmap, want_memblock); + return __add_pages(nid, start_pfn, nr_pages, altmap, type); } #ifdef CONFIG_MEMORY_HOTREMOVE diff --git a/arch/s390/mm/init.c b/arch/s390/mm/init.c index 3e82f66d5c61..ba2c56328e6d 100644 --- a/arch/s390/mm/init.c +++ b/arch/s390/mm/init.c @@ -225,7 +225,7 @@ device_initcall(s390_cma_mem_init); #endif /* CONFIG_CMA */ int arch_add_memory(int nid, u64 start, u64 size, struct vmem_altmap *altmap, - bool want_memblock) + int type) { unsigned long start_pfn = PFN_DOWN(start); unsigned long size_pages = PFN_DOWN(size); @@ -235,7 +235,7 @@ int arch_add_memory(int nid, u64 start, u64 size, struct vmem_altmap *altmap, if (rc) return rc; - rc = __add_pages(nid, start_pfn, size_pages, altmap, want_memblock); + rc = __add_pages(nid, start_pfn, size_pages, altmap, type); if (rc) vmem_remove_mapping(start, size); return rc; diff --git a/arch/sh/mm/init.c b/arch/sh/mm/init.c index 1a483a008872..5fbb8724e0f2 100644 --- a/arch/sh/mm/init.c +++ b/arch/sh/mm/init.c @@ -419,14 +419,14 @@ void free_initrd_mem(unsigned long start, unsigned long end) #ifdef CONFIG_MEMORY_HOTPLUG int arch_add_memory(int nid, u64 start, u64 size, struct vmem_altmap *altmap, - bool want_memblock) + int type) { unsigned long start_pfn = PFN_DOWN(start); unsigned long nr_pages = size >> PAGE_SHIFT; int ret; /* We only have ZONE_NORMAL, so this is easy.. */ - ret = __add_pages(nid, start_pfn, nr_pages, altmap, want_memblock); + ret = __add_pages(nid, start_pfn, nr_pages, altmap, type); if (unlikely(ret)) printk("%s: Failed, __add_pages() == %d\n", __func__, ret); diff --git a/arch/x86/mm
[Xen-devel] [PATCH RFCv2 4/4] mm/memory_hotplug: Drop MEMORY_TYPE_UNSPECIFIED
We now have proper types for all users, we can drop this one. Cc: Greg Kroah-Hartman Cc: "Rafael J. Wysocki" Cc: Andrew Morton Cc: Ingo Molnar Cc: Pavel Tatashin Cc: Stephen Rothwell Cc: Andrew Banman Cc: "mike.tra...@hpe.com" Cc: Oscar Salvador Cc: Dave Hansen Cc: Michal Hocko Cc: Michal Suchánek Cc: Vitaly Kuznetsov Cc: Dan Williams Cc: Pavel Tatashin Signed-off-by: David Hildenbrand --- drivers/base/memory.c | 3 --- include/linux/memory.h | 5 - 2 files changed, 8 deletions(-) diff --git a/drivers/base/memory.c b/drivers/base/memory.c index c5fdca7a3009..a6e524f0ea38 100644 --- a/drivers/base/memory.c +++ b/drivers/base/memory.c @@ -388,9 +388,6 @@ static ssize_t type_show(struct device *dev, struct device_attribute *attr, ssize_t len = 0; switch (mem->type) { - case MEMORY_BLOCK_UNSPECIFIED: - len = sprintf(buf, "unspecified\n"); - break; case MEMORY_BLOCK_BOOT: len = sprintf(buf, "boot\n"); break; diff --git a/include/linux/memory.h b/include/linux/memory.h index a3a1e9764805..11679622f743 100644 --- a/include/linux/memory.h +++ b/include/linux/memory.h @@ -50,10 +50,6 @@ int set_memory_block_size_order(unsigned int order); * No memory block is to be created (e.g. device memory). Not exposed to * user space. * - * MEMORY_BLOCK_UNSPECIFIED: - * The type of memory block was not further specified when adding the - * memory block. - * * MEMORY_BLOCK_BOOT: * This memory block was added during boot by the basic system. No * specific device driver takes care of this memory block. This memory @@ -103,7 +99,6 @@ int set_memory_block_size_order(unsigned int order); */ enum { MEMORY_BLOCK_NONE = 0, - MEMORY_BLOCK_UNSPECIFIED, MEMORY_BLOCK_BOOT, MEMORY_BLOCK_DIMM, MEMORY_BLOCK_DIMM_UNREMOVABLE, -- 2.17.2 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v2 0/3] Remove tmem
On Fri, Nov 30, 2018 at 05:09:42PM +, Ian Jackson wrote: > Wei Liu writes ("[PATCH v2 0/3] Remove tmem"): > > It is agreed that tmem can be removed from xen.git. See the thread starting > > > > > > from . > > Those are notes from some phone call amongst industry stakeholders. > None of the messages have a Subject line mentioning tmem. There is no > explanation of the basis for the decision; just a confirmation from > the current maintainers that they will ack the removal. > > I think this is not really an appropriate way to carry on! What if > there is someone else who wants to step up to maintain this ? What > about user communication ? Going straight from `Supported' to > `Deleted' seems rather vigorous. Step up to maintain> I would rather say step up to develop. The status in MAINTAINERS is wrong. According to SUPPORT.md, it is only experimental. Our definition of "experimental" is: Functional completeness: No Functional stability: Here be dragons Interface stability: Not stable Security supported: No (https://wiki.xenproject.org/wiki/Xen_Project_Release_Features/Definitions) This means without putting in significant effort, no-one would be able to use TMEM. There is no stability guarantee at all for TMEM interface. Deleting something experimental doesn't seem controversial to me. I dare say no-one cared because it has got zero development effort in years since 4.6. Also as you already noticed, no-one can possibly uses TMEM since the switch to xl (that' even earlier than 4.6). > > > In summary I think the claim that "It is agreed" in this cover letter > is false (or, at least, if it is true, the cover letter provides no > references to any basis for thinking that it is true). > > If it didn't happen on the mailing list it didn't happen. > > > Unfortunately, therefore, on process grounds, > > Nacked-by: Ian Jackson > Yet the removal of ia64 port wasn't warned or announced on xen-announce, so I disagree the removal is wrong on process grounds -- since there has already been a precedence. If there is a policy file, I would be happy to comply. > > I dare say the decision to remove it now might be right. > > Can we please start this again with a proper explanation of why this > should be summarily deleted, rather than (say) made unmaintained and > deprecated for a release ? Can someone explain why we don't feel the > need to consult anyone by (say) posting to xen-announce ? etc. > See above. > Then we can actually have an on-list discussion where the decision > would be taken. > > Next time I suggest a good first step would be a patch which deletes > the M: line from MAINTAINERS and changes the status to Orphan, since > obviously the current maintainers don't want it. That patch should be > uncontroversial. Also in general, depending who we think might be > using a feature, a plan which gives some warning to users (by > deprecating the feature, for example) would often be a good idea. > Can we not invent policy and ask for compliance on the fly? Wei. > Ian. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] remove the ->mapping_error method from dma_map_ops V3
On Fri, Nov 30, 2018 at 02:22:08PM +0100, Christoph Hellwig wrote: > Error reporting for the dma_map_single and dma_map_page operations is > currently a mess. Both APIs directly return the dma_addr_t to be used for > the DMA, with a magic error escape that is specific to the instance and > checked by another method provided. This has a few downsides: > > - the error check is easily forgotten and a __must_check marker doesn't >help as the value always is consumed anyway > - the error checking requires another indirect call, which have gotten >incredibly expensive > - a lot of code is wasted on implementing these methods > > The historical reason for this is that people thought DMA mappings would > not fail when the API was created, which sounds like a really bad > assumption in retrospective, and then we tried to cram error handling > onto it later on. > > There basically are two variants: the error code is 0 because the > implementation will never return 0 as a valid DMA address, or the error > code is all-F as the implementation won't ever return an address that > high. The old AMD GART is the only one not falling into these two camps > as it picks sort of a relative zero relative to where it is mapped. > > The 0 return doesn't work for direct mappings that have ram at address > zero and a lot of IOMMUs that start allocating bus space from address > zero, so we can't consolidate on that, but I think we can move everyone > to all-Fs, which the patch here does. The reason for that is that > there is only one way to ever get this address: by doing a 1-byte long, > 1-byte aligned mapping, but all our mappings are not only longer but > generally aligned, and the mappings have to keep at least the basic > alignment. > > A git tree is also available here: > > git://git.infradead.org/users/hch/misc.git dma-mapping-error.3 Hi Chris, For patches 1, 3 and 23, Acked-by: Russell King Thanks. -- RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up According to speedtest.net: 11.9Mbps down 500kbps up ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] remove the ->mapping_error method from dma_map_ops V3
On Fri, Nov 30, 2018 at 5:23 AM Christoph Hellwig wrote: > > Error reporting for the dma_map_single and dma_map_page operations is > currently a mess. [..] I don't see anything objectionable there, but it's not like I checked any of the context of the patches. For all I know there are odd corner cases where some zero vs DMA_MAPPING_ERROR confusion lurks, but I guess we'll find out.. So you can add my acked-by, fwiw. Linus ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH 1/4] xen/arm: gic: Ensure we have an ISB between ack and do_IRQ()
Hello Andre, Please see my comments below: On 23.11.18 14:18, Andre Przywara wrote: Fundamentally there is a semantic difference between edge and level triggered IRQs: When the guest has handled an *edge* IRQ (EOIed so the LR's state goes to 0), this is done and dusted, and Xen doesn't need to care about this anymore until the next IRQ occurs.> For level triggered IRQs, even though the guest has handled it, we need to resample the (potentially virtual) IRQ line, as it may come up or down at the *device*'s discretion: the interrupt reason might have gone away (GPIO condition no longer true), even before we were able to inject it, or there might be another interrupt reason not yet handled (incoming serial character while serving a transmit interrupt). Also typically it's up to the interrupt handler to confirm handling the interrupt, either explicitly by clearing an interrupt bit in some status register or implicitly, for instance by draining a FIFO, say on a serial device. So even though from the (V)GIC's point of view the interrupt has been processed (EOIed), it might still be pending. So, as I understand the intended behavior of a vGIC for the level interrupt is following cases: 1. in case the interrupt line is still active from HW side, but interrupt handler from VM EOIs the interrupt, it should be signaled to vCPU by vGIC again 2. in case a peripheral deactivated interrupt line, but VM did not activated it yet, this interrupt should be removed from pending for VM IMO, case 1 is indirectly supported by old vgic. For HW interrupts its pretty natural: deactivation by VM in VGIC leads to deactivation in GIC, so the interrupt priority is restored and GIC will trap PCPU to reinsert it. This will be seen by VM as immediate IRQ trap after EOI. Also Case 2 is not implemented in the old vgic. It is somehow supported by new vgic, though it also relies on the trap to the hypervisor to commit the update to LRs. But it's rather a problem of GIC arch/implementation, which does not signal CPU nor updates associated LR on level interrupt deassertion. My intimate "old Xen VGIC" knowledge has been swapped out from my brain meanwhile, but IIRC Xen treats every IRQ as if it would be an edge IRQ. Which works if the guest's interrupt handler behaves correctly. Most IRQ handlers have a loop to iterate over all possible interrupt reasons and process them, so the line goes indeed down before they EOI the IRQ. I've spent some time to look through the new vgic implementation and I have a note about it: It's not clear why are you probing level interrupts on guest->hyp transition. While it targets case 2 described above, it seems to be more relevant to probe the level interrupts right before hyp->guest transition. Because vcpu might be descheduled and while it hangs on scheduler queues interrupt line level has more chances to be changed by peripheral itself. Also I'm pretty scared of new vgic locking scheme with per-irq locks and locking logic i.e. in vgic_queue_irq_unlock() function. Also sorting list in vgic_flush_lr_state() with vgic_irq_cmp() looks very expensive. But, for sure all that stuff performance should be properly evaluated and measured. -- Sincerely, Andrii Anisov. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [xen-4.8-testing test] 130847: FAIL
flight 130847 xen-4.8-testing real [real] http://logs.test-lab.xenproject.org/osstest/logs/130847/ Failures and problems with tests :-( Tests which did not succeed and are blocking, including tests which could not be run: build-arm64 broken in 130804 build-arm64-pvopsbroken in 130804 build-arm64-xsm broken in 130804 Tests which are failing intermittently (not blocking): test-armhf-armhf-xl-vhd 6 xen-installfail pass in 130804 test-amd64-i386-xl-raw 10 debian-di-install fail pass in 130804 Regressions which are regarded as allowable (not blocking): build-arm64-xsm 2 hosts-allocate broken in 130804 REGR. vs. 129810 build-arm64 2 hosts-allocate broken in 130804 REGR. vs. 129810 build-arm64-pvops 2 hosts-allocate broken in 130804 REGR. vs. 129810 Tests which did not succeed, but are not blocking: test-arm64-arm64-libvirt-xsm 1 build-check(1) blocked in 130804 n/a test-arm64-arm64-xl-credit2 1 build-check(1) blocked in 130804 n/a build-arm64-libvirt 1 build-check(1) blocked in 130804 n/a test-arm64-arm64-xl-credit1 1 build-check(1) blocked in 130804 n/a test-arm64-arm64-xl-xsm 1 build-check(1) blocked in 130804 n/a test-arm64-arm64-xl 1 build-check(1) blocked in 130804 n/a build-arm64-xsm 3 capture-logs broken in 130804 blocked in 129810 build-arm64-pvops3 capture-logs broken in 130804 blocked in 129810 build-arm64 3 capture-logs broken in 130804 blocked in 129810 test-xtf-amd64-amd64-4 69 xtf/test-hvm64-xsa-278 fail blocked in 129810 test-xtf-amd64-amd64-1 69 xtf/test-hvm64-xsa-278 fail in 130804 blocked in 129810 test-xtf-amd64-amd64-1 50 xtf/test-hvm64-lbr-tsx-vmentry fail in 130804 like 129729 test-armhf-armhf-xl-vhd 12 migrate-support-check fail in 130804 never pass test-armhf-armhf-xl-vhd 13 saverestore-support-check fail in 130804 never pass test-xtf-amd64-amd64-4 50 xtf/test-hvm64-lbr-tsx-vmentry fail like 129570 test-xtf-amd64-amd64-3 50 xtf/test-hvm64-lbr-tsx-vmentry fail like 129729 test-amd64-amd64-xl-qemut-win7-amd64 17 guest-stopfail like 129810 test-armhf-armhf-xl-rtds 16 guest-start/debian.repeatfail like 129810 test-amd64-i386-xl-qemut-win7-amd64 17 guest-stop fail like 129810 test-amd64-i386-xl-qemuu-win7-amd64 17 guest-stop fail like 129810 test-amd64-amd64-xl-qemuu-win7-amd64 17 guest-stopfail like 129810 test-amd64-i386-xl-qemuu-ws16-amd64 17 guest-stop fail like 129810 test-amd64-amd64-xl-rtds 10 debian-install fail like 129810 test-amd64-amd64-xl-qemuu-ws16-amd64 17 guest-stopfail like 129810 test-amd64-amd64-xl-qemut-ws16-amd64 17 guest-stopfail like 129810 test-amd64-i386-xl-qemut-ws16-amd64 17 guest-stop fail like 129810 test-amd64-i386-libvirt 13 migrate-support-checkfail never pass test-amd64-amd64-libvirt-xsm 13 migrate-support-checkfail never pass test-amd64-i386-libvirt-xsm 13 migrate-support-checkfail never pass test-amd64-amd64-libvirt 13 migrate-support-checkfail never pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check fail never pass test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check fail never pass test-amd64-amd64-qemuu-nested-amd 17 debian-hvm-install/l1/l2 fail never pass test-amd64-amd64-libvirt-vhd 12 migrate-support-checkfail never pass test-armhf-armhf-xl-rtds 13 migrate-support-checkfail never pass test-armhf-armhf-xl-rtds 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit1 13 migrate-support-checkfail never pass test-armhf-armhf-xl-credit1 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit2 13 migrate-support-checkfail never pass test-armhf-armhf-xl-credit2 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-cubietruck 13 migrate-support-checkfail never pass test-armhf-armhf-xl-cubietruck 14 saverestore-support-checkfail never pass test-armhf-armhf-libvirt 13 migrate-support-checkfail never pass test-armhf-armhf-libvirt 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-multivcpu 13 migrate-support-checkfail never pass test-armhf-armhf-xl-multivcpu 14 saverestore-support-checkfail never pass test-armhf-armhf-libvirt-raw 12 migrate-support-checkfail never pass test-armhf-armhf-libvirt-raw 13 saverestore-support-checkfail never pass test-armhf-armhf-xl 13 migrate-support-checkfail never pass test-armhf-armhf-xl 14 saverestore-
[Xen-devel] [PATCHv2 1/2] x86/mm: Fix guard hole handling
There is a guard hole at the beginning of kernel address space, also used by hypervisors. It occupies 16 PGD entries. We do not state the reserved range directly, but calculate it relative to other entities: direct mapping and user space ranges. The calculation got broken by recent change in kernel memory layout: LDT remap range is now mapped before direct mapping and makes the calculation invalid. The breakage leads to crash on Xen dom0 boot[1]. State the reserved range directly. It's part of kernel ABI (hypervisors expect it to be stable) and must not depend on changes in the rest of kernel memory layout. [1] https://lists.xenproject.org/archives/html/xen-devel/2018-11/msg03313.html Signed-off-by: Kirill A. Shutemov Reported-by: Hans van Kranenburg Tested-by: Hans van Kranenburg Reviewed-by: Juergen Gross Fixes: d52888aa2753 ("x86/mm: Move LDT remap out of KASLR region on 5-level paging") Signed-off-by: Kirill A. Shutemov --- arch/x86/include/asm/pgtable_64_types.h | 5 + arch/x86/mm/dump_pagetables.c | 8 arch/x86/xen/mmu_pv.c | 11 ++- 3 files changed, 15 insertions(+), 9 deletions(-) diff --git a/arch/x86/include/asm/pgtable_64_types.h b/arch/x86/include/asm/pgtable_64_types.h index 84bd9bdc1987..88bca456da99 100644 --- a/arch/x86/include/asm/pgtable_64_types.h +++ b/arch/x86/include/asm/pgtable_64_types.h @@ -111,6 +111,11 @@ extern unsigned int ptrs_per_p4d; */ #define MAXMEM (1UL << MAX_PHYSMEM_BITS) +#define GUARD_HOLE_PGD_ENTRY -256UL +#define GUARD_HOLE_SIZE(16UL << PGDIR_SHIFT) +#define GUARD_HOLE_BASE_ADDR (GUARD_HOLE_PGD_ENTRY << PGDIR_SHIFT) +#define GUARD_HOLE_END_ADDR(GUARD_HOLE_BASE_ADDR + GUARD_HOLE_SIZE) + #define LDT_PGD_ENTRY -240UL #define LDT_BASE_ADDR (LDT_PGD_ENTRY << PGDIR_SHIFT) #define LDT_END_ADDR (LDT_BASE_ADDR + PGDIR_SIZE) diff --git a/arch/x86/mm/dump_pagetables.c b/arch/x86/mm/dump_pagetables.c index fc37bbd23eb8..dad153e5a427 100644 --- a/arch/x86/mm/dump_pagetables.c +++ b/arch/x86/mm/dump_pagetables.c @@ -512,11 +512,11 @@ static inline bool is_hypervisor_range(int idx) { #ifdef CONFIG_X86_64 /* -* 8000 - 87ff is reserved for -* the hypervisor. +* A hole in the beginning of kernel address space reserved +* for a hypervisor. */ - return (idx >= pgd_index(__PAGE_OFFSET) - 16) && - (idx < pgd_index(__PAGE_OFFSET)); + return (idx >= pgd_index(GUARD_HOLE_BASE_ADDR)) && + (idx < pgd_index(GUARD_HOLE_END_ADDR)); #else return false; #endif diff --git a/arch/x86/xen/mmu_pv.c b/arch/x86/xen/mmu_pv.c index a5d7ed125337..0f4fe206dcc2 100644 --- a/arch/x86/xen/mmu_pv.c +++ b/arch/x86/xen/mmu_pv.c @@ -648,19 +648,20 @@ static int __xen_pgd_walk(struct mm_struct *mm, pgd_t *pgd, unsigned long limit) { int i, nr, flush = 0; - unsigned hole_low, hole_high; + unsigned hole_low = 0, hole_high = 0; /* The limit is the last byte to be touched */ limit--; BUG_ON(limit >= FIXADDR_TOP); +#ifdef CONFIG_X86_64 /* * 64-bit has a great big hole in the middle of the address -* space, which contains the Xen mappings. On 32-bit these -* will end up making a zero-sized hole and so is a no-op. +* space, which contains the Xen mappings. */ - hole_low = pgd_index(USER_LIMIT); - hole_high = pgd_index(PAGE_OFFSET); + hole_low = pgd_index(GUARD_HOLE_BASE_ADDR); + hole_high = pgd_index(GUARD_HOLE_END_ADDR); +#endif nr = pgd_index(limit) + 1; for (i = 0; i < nr; i++) { -- 2.19.2 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCHv2 0/2] Fixups for LDT remap placement change
There's a couple fixes for the recent LDT remap placement change. The first patch fixes crash when kernel booted as Xen dom0. The second patch fixes address space markers in dump_pagetables output. It's purely cosmetic change, backporting to the stable tree is optional. v2: - Fix typo Kirill A. Shutemov (2): x86/mm: Fix guard hole handling x86/dump_pagetables: Fix LDT remap address marker arch/x86/include/asm/pgtable_64_types.h | 5 + arch/x86/mm/dump_pagetables.c | 15 ++- arch/x86/xen/mmu_pv.c | 11 ++- 3 files changed, 17 insertions(+), 14 deletions(-) -- 2.19.2 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel