Re: [Xen-devel] [PATCH] smpboot: Add smpboot state variables instead of reusing CPU hotplug states
Hi Paul, I guess this patch got the summer conference period treatment. ACK, NACK, completely STUPID idea? cheers, daniel On 10/15/2015 01:32 PM, Daniel Wagner wrote: > The cpu hotplug state machine in smpboot.c is reusing the states from > cpu.h. That is confusing when it comes to the CPU_DEAD_FROZEN usage. > Paul explained to me that he was in need of an additional state > for destinguishing between a CPU error states. For this he just > picked CPU_DEAD_FROZEN. > > 8038dad7e888581266c76df15d70ca457a3c5910 smpboot: Add common code for > notification from dying CPU > 2a442c9c6453d3d043dfd89f2e03a1deff8a6f06 x86: Use common > outgoing-CPU-notification code > > Instead of reusing the states, let's add new definition inside > the smpboot.c file with explenation what those states > mean. Thanks Paul for providing them. > > Signed-off-by: Daniel Wagner > Cc: Thomas Gleixner > Cc: "Paul E. McKenney" > Cc: Peter Zijlstra > Cc: xen-de...@lists.xenproject.org > Cc: linux-ker...@vger.kernel.org > --- > arch/x86/xen/smp.c | 4 +-- > include/linux/cpu.h | 3 +- > kernel/smpboot.c| 82 > - > 3 files changed, 67 insertions(+), 22 deletions(-) > > diff --git a/arch/x86/xen/smp.c b/arch/x86/xen/smp.c > index 3f4ebf0..804bf5c 100644 > --- a/arch/x86/xen/smp.c > +++ b/arch/x86/xen/smp.c > @@ -495,7 +495,7 @@ static int xen_cpu_up(unsigned int cpu, struct > task_struct *idle) > rc = HYPERVISOR_vcpu_op(VCPUOP_up, cpu, NULL); > BUG_ON(rc); > > - while (cpu_report_state(cpu) != CPU_ONLINE) > + while (!cpu_check_online(cpu)) > HYPERVISOR_sched_op(SCHEDOP_yield, NULL); > > return 0; > @@ -767,7 +767,7 @@ static int xen_hvm_cpu_up(unsigned int cpu, struct > task_struct *tidle) >* This can happen if CPU was offlined earlier and >* offlining timed out in common_cpu_die(). >*/ > - if (cpu_report_state(cpu) == CPU_DEAD_FROZEN) { > + if (cpu_check_timeout(cpu)) { > xen_smp_intr_free(cpu); > xen_uninit_lock_cpu(cpu); > } > diff --git a/include/linux/cpu.h b/include/linux/cpu.h > index 23c30bd..f78ab46 100644 > --- a/include/linux/cpu.h > +++ b/include/linux/cpu.h > @@ -284,7 +284,8 @@ void arch_cpu_idle_dead(void); > > DECLARE_PER_CPU(bool, cpu_dead_idle); > > -int cpu_report_state(int cpu); > +int cpu_check_online(int cpu); > +int cpu_check_timeout(int cpu); > int cpu_check_up_prepare(int cpu); > void cpu_set_state_online(int cpu); > #ifdef CONFIG_HOTPLUG_CPU > diff --git a/kernel/smpboot.c b/kernel/smpboot.c > index a818cbc..75e5724 100644 > --- a/kernel/smpboot.c > +++ b/kernel/smpboot.c > @@ -371,19 +371,63 @@ int smpboot_update_cpumask_percpu_thread(struct > smp_hotplug_thread *plug_thread, > } > EXPORT_SYMBOL_GPL(smpboot_update_cpumask_percpu_thread); > > +/* The CPU is offline, and its last offline operation was > + * successful and proceeded normally. (Or, alternatively, the > + * CPU never has come online, as this is the initial state.) > + */ > +#define CPUHP_POST_DEAD 0x01 > + > +/* The CPU is in the process of coming online. > + * Simple architectures can skip this state, and just invoke > + * cpu_set_state_online() unconditionally instead. > + */ > +#define CPUHP_UP_PREPARE 0x02 > + > +/* The CPU is now online. Simple architectures can skip this > + * state, and just invoke cpu_wait_death() and cpu_report_death() > + * unconditionally instead. > + */ > +#define CPUHP_ONLINE 0x03 > + > +/* The CPU has gone offline, so that it may now be safely > + * powered off (or whatever the architecture needs to do to it). > + */ > +#define CPUHP_DEAD 0x04 > + > +/* The CPU did not go offline in a timely fashion, if at all, > + * so it might need special processing at the next online (for > + * example, simply refusing to bring it online). > + */ > +#define CPUHP_BROKEN 0x05 > + > +/* The CPU eventually did go offline, but not in a timely > + * fashion. If some sort of reset operation is required before it > + * can be brought online, that reset operation needs to be carried > + * out at online time. (Or, again, the architecture might simply > + * refuse to bring it online.) > + */ > +#define CPUHP_TIMEOUT0x06 > + > static DEFINE_PER_CPU(atomic_t, cpu_hotplug_state) = > ATOMIC_INIT(CPU_POST_DEAD); > > /* > * Called to poll specified CPU's state, for example, when waiting for > * a CPU to come online. > */ > -int cpu_report_state(int cpu) > +int cpu_check_online(int cpu) > +{ > + return atomic_read(&per_cpu(cpu_hotplug_state, cpu)) == > +CPUHP_ONLINE; > +} > + > +int cpu_check_timeout(int cpu) > { > - return atomic_read(&per_cpu(cpu_hotplug_state, cpu)); > + return atomic_read(&per_cpu(cpu_hotplug_state, cpu)) == > +CPUHP_TIMEOUT; > } > > /* > - * If CPU has died properly, set its state to CPU_UP_PREPARE an
Re: [Xen-devel] [PATCH 4/4] xen/public: arm: rework the macro set_xen_guest_handle_raw
>>> On 04.11.15 at 18:06, wrote: > Jan Beulich writes ("Re: [PATCH 4/4] xen/public: arm: rework the macro > set_xen_guest_handle_raw"): >> On 04.11.15 at 17:50, wrote: >> > If we don't provide a get_xen_guest_handle, a kernel developer will be >> > sorely tempted to make one. >> >> What use would it be to them? Kernels only write handles, they >> shouldn't have a need for reading them. > > I foresee situations where a kernel might like to update a proposed > hypercall argument structure in place, which might involve reading the > handles. I guess you think of e.g. the privcmd filtering done in XenServer, but I think this is an odd thing for a kernel to do: Down to the final actual hypercall invocation, it should deal with pointers, not handles. Filtering should either be done prior to reaching that layer (obviously not an option for privcmd, but that layer is guarded against issues with the compiler doing the wrong thing afaict), or would better be left to the hypervisor (said filtering in XenServer could likely be moved into the hypervisor, with a flag added to the hypercall number indicating whether to invoke the filtering, which the privcmd layer then would set unconditionally). Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [V9 1/3] x86/xsaves: enable xsaves/xrstors/xsavec in xen
>>> On 05.11.15 at 02:34, wrote: > On Wed, Nov 04, 2015 at 10:04:33AM -0700, Jan Beulich wrote: >> >>> On 03.11.15 at 07:27, wrote: >> > @@ -158,6 +334,20 @@ void xsave(struct vcpu *v, uint64_t mask) >> > ptr->fpu_sse.x[FPU_WORD_SIZE_OFFSET] = word_size; >> > } >> > +#define XSTATE_FIXUP ".section .fixup,\"ax\" \n"\ >> > + "2: mov %5,%%ecx \n"\ >> > + " xor %1,%1\n"\ >> > + " rep stosb\n"\ >> > + " lea %2,%0\n"\ >> > + " mov %3,%1\n"\ >> > + " jmp 1b \n"\ >> > + ".previous \n"\ >> > + _ASM_EXTABLE(1b, 2b)\ >> > + : "+&D" (ptr), "+&a" (lmask)\ >> > + : "m" (*ptr), "g" (lmask), "d" (hmask), \ >> > + "m" (xsave_cntxt_size)\ >> > + : "ecx" >> > + >> > void xrstor(struct vcpu *v, uint64_t mask) >> > { >> > uint32_t hmask = mask >> 32; >> > @@ -187,39 +377,22 @@ void xrstor(struct vcpu *v, uint64_t mask) >> > switch ( __builtin_expect(ptr->fpu_sse.x[FPU_WORD_SIZE_OFFSET], 8) ) >> > { >> > default: >> > -asm volatile ( "1: .byte 0x48,0x0f,0xae,0x2f\n" >> > - ".section .fixup,\"ax\" \n" >> > - "2: mov %5,%%ecx \n" >> > - " xor %1,%1\n" >> > - " rep stosb\n" >> > - " lea %2,%0\n" >> > - " mov %3,%1\n" >> > - " jmp 1b \n" >> > - ".previous \n" >> > - _ASM_EXTABLE(1b, 2b) >> > - : "+&D" (ptr), "+&a" (lmask) >> > - : "m" (*ptr), "g" (lmask), "d" (hmask), >> > - "m" (xsave_cntxt_size) >> > - : "ecx" ); >> > +alternative_input("1: "".byte 0x48,0x0f,0xae,0x2f", >> > + ".byte 0x48,0x0f,0xc7,0x1f", >> > + X86_FEATURE_XSAVES, >> > + "D" (ptr), "m" (*ptr), "a" (lmask), "d" > (hmask)); >> > +asm volatile (XSTATE_FIXUP); >> > break; >> > case 4: case 2: >> > -asm volatile ( "1: .byte 0x0f,0xae,0x2f\n" >> > - ".section .fixup,\"ax\" \n" >> > - "2: mov %5,%%ecx\n" >> > - " xor %1,%1 \n" >> > - " rep stosb \n" >> > - " lea %2,%0 \n" >> > - " mov %3,%1 \n" >> > - " jmp 1b \n" >> > - ".previous \n" >> > - _ASM_EXTABLE(1b, 2b) >> > - : "+&D" (ptr), "+&a" (lmask) >> > - : "m" (*ptr), "g" (lmask), "d" (hmask), >> > - "m" (xsave_cntxt_size) >> > - : "ecx" ); >> > +alternative_input("1: "".byte 0x0f,0xae,0x2f", >> > + ".byte 0x0f,0xc7,0x1f", >> > + X86_FEATURE_XSAVES, >> > + "D" (ptr), "m" (*ptr), "a" (lmask), "d" > (hmask)); >> > +asm volatile (XSTATE_FIXUP); >> > break; >> > } >> > } >> > +#undef XSTATE_FIXUP >> >> Repeating my comment on v8: "I wonder whether at least for the >> restore side alternative asm wouldn't result in better readable code >> and at the same time in a smaller patch." Did you at least look into >> that option? >> > I may misunderstand your meaning. I have adressed the comment by changing > the restor side using alternative_input. Does "alternative_input" not what > you want ? > if it is not what you want, please give me some suggestions how to > address this ? Oh, I'm sorry, I should have looked more closely. The fact that XSTATE_FIXUP survived made me draw wrong conclusions without looking more closely. Now the bad news is - you can't split things like this, as the compiler doesn't make any guarantees as to register values between two asm()-s. The whole construct needs to and up as a single asm(), which is why XSTATE_FIXUP and is unlikely to be of much use here (at least in the context of this patch; a separate cleanup patch might eliminate the redundancy). Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Linux 4.4 MW: Boot under Xen fails with CONFIG_DEBUG_WX enabled: RIP: ptdump_walk_pgd_level_core
On 2015-11-05 00:13, Boris Ostrovsky wrote: On 11/04/2015 03:02 PM, Sander Eikelenboom wrote: On 2015-11-04 19:47, Stephen Smalley wrote: On 11/04/2015 01:28 PM, Sander Eikelenboom wrote: On 2015-11-04 16:52, Stephen Smalley wrote: On 11/04/2015 06:55 AM, Sander Eikelenboom wrote: Hi All, I just tried to boot with the current linus mergewindow tree under Xen. It fails with a kernel panic at boot with the new "CONFIG_DEBUG_WX" option enabled. Disabling it makes the kernel boot fine. The splat: [ 18.424241] Freeing unused kernel memory: 1104K (822fc000 - 8241) [ 18.430314] Write protecting the kernel read-only data: 18432k [ 18.441054] Freeing unused kernel memory: 1144K (880001ae2000 - 880001c0) [ 18.447966] Freeing unused kernel memory: 1560K (88000207a000 - 88000220) [ 18.453947] BUG: unable to handle kernel paging request at 88055c883000 [ 18.459943] IP: [] ptdump_walk_pgd_level_core+0x20e/0x440 [ 18.465847] PGD 2212067 PUD 0 [ 18.471564] Oops: [#1] SMP [ 18.477248] Modules linked in: [ 18.482918] CPU: 2 PID: 1 Comm: swapper/0 Not tainted 4.3.0-mw-20151104-linus-doflr+ #1 [ 18.488804] Hardware name: MSI MS-7640/890FXA-GD70 (MS-7640) , BIOS V1.8B1 09/13/2010 [ 18.494778] task: 880059b9 ti: 880059b98000 task.ti: 880059b98000 [ 18.500852] RIP: e030:[] [] ptdump_walk_pgd_level_core+0x20e/0x440 [ 18.507102] RSP: e02b:880059b9be48 EFLAGS: 00010296 [ 18.513351] RAX: 88055c883000 RBX: 81ae2000 RCX: 8800 [ 18.519733] RDX: 0067 RSI: 880059b9be98 RDI: 88001000 [ 18.526129] RBP: 880059b9bf00 R08: R09: [ 18.532522] R10: 88005fd0e790 R11: 0001 R12: 88008000 [ 18.538891] R13: cfff R14: 880059b9be98 R15: [ 18.545247] FS: () GS:88005f68() knlGS: [ 18.551708] CS: e033 DS: ES: CR0: 8005003b [ 18.558153] CR2: 88055c883000 CR3: 02211000 CR4: 0660 [ 18.564686] Stack: [ 18.571106] 000159b9be50 82211000 88055c884000 0800 [ 18.577704] 8000 88055c883000 0007 88005fd0e790 [ 18.584291] 880059b9bed8 81156ace 0001 [ 18.590916] Call Trace: [ 18.597458] [] ? free_reserved_area+0x11e/0x120 [ 18.604180] [] ptdump_walk_pgd_level_checkwx+0x12/0x20 [ 18.611014] [] mark_rodata_ro+0xe9/0xf0 [ 18.617819] [] ? rest_init+0x80/0x80 [ 18.624512] [] kernel_init+0x18/0xe0 [ 18.631095] [] ret_from_fork+0x3f/0x70 [ 18.637650] [] ? rest_init+0x80/0x80 [ 18.644178] Code: 70 ff ff ff 48 3b 85 58 ff ff ff 0f 84 c0 fe ff ff 48 8b 85 68 ff ff ff 48 c1 e0 10 48 c1 f8 10 48 89 45 b0 48 8b 85 70 ff ff ff <48> 8b 38 48 85 ff 0f 85 4e ff ff ff b9 02 00 00 00 31 d2 4c 89 [ 18.658246] RIP [] ptdump_walk_pgd_level_core+0x20e/0x440 [ 18.665211] RSP [ 18.672073] CR2: 88055c883000 [ 18.678852] ---[ end trace d84e34461c40637a ]--- [ 18.685641] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0009 [ 18.685641] [ 18.699520] Kernel Offset: disable What's your .config? Does cat /sys/kernel/debug/kernel_page_tables produce a similar fault even with CONFIG_DEBUG_WX=n? .config is attached Hmm that sysfs file doesn't seem to exist then: # cat /sys/kernel/debug/kernel_page_tables cat: /sys/kernel/debug/kernel_page_tables: No such file or directory Needs CONFIG_X86_PTDUMP=y. Also assumes you have debugfs mounted there. Recompiled, and the result is that it also blows up: Can you try this: diff --git a/arch/x86/mm/dump_pagetables.c b/arch/x86/mm/dump_pagetables.c index 1bf417e..b534216 100644 --- a/arch/x86/mm/dump_pagetables.c +++ b/arch/x86/mm/dump_pagetables.c @@ -362,8 +362,13 @@ static void ptdump_walk_pgd_level_core(struct seq_file *m, pgd_t *pgd, bool checkwx) { #ifdef CONFIG_X86_64 +/* 8000 - 87ff is reserved for hypervisor */ +#define is_hypervisor_range(idx) (paravirt_enabled() && \ + ((idx >= pgd_index(__PAGE_OFFSET) - 16) && \ + (idx < pgd_index(__PAGE_OFFSET pgd_t *start = (pgd_t *) &init_level4_pgt; #else +#define is_hypervisor_range(idx) 0 pgd_t *start = swapper_pg_dir; #endif pgprotval_t prot; @@ -381,7 +386,7 @@ static void ptdump_walk_pgd_level_core(struct seq_file *m, pgd_t *pgd, for (i = 0; i < PTRS_PER_PGD; i++) { st.current_address = normalize_addr(i * PGD_LEVEL_MULT); -if (!pgd_none(*start)) { +if (!pgd_none(*start) && !is_hypervisor_range(i)) { if (pgd_large(*start) || !pgd_present(*start)) { prot = pgd_flags(*start); note_page(m, &st, __pgprot(prot), 1); Hi Boris, Thank for your patch ! It makes "cat /
Re: [Xen-devel] [PATCH v3 2/4] arm64: Add xen_boot module file
Hi Ian, On 3 November 2015 at 23:22, Ian Campbell wrote: > On Tue, 2015-11-03 at 22:57 +0800, Fu Wei wrote: >> Hi Vladimir, >> >> After discussing with Ian Campbell, Since we already can load all >> the necessary binaries for Xen boot on arm64 for now, we don't really >> need "xen_module" command now. >> But maybe someday , xen need a new type of binary in boot time, then >> we still need this support. > > You mean support for "--type" passed to the xen_module command, right? I > thought the xen_module stuff had been applied. Or am I misunderstanding > which bits have been applied? Actually, I mean: xen-module command is for "--type" support. If we don't need "--type" now, we can delete xen-module code(which has been deleted by Vladimir from my patch, so now, the upstream grub has not --type support). Vladimir has applied most of my patch, except xen-module command code. > >> So I will submit a "xen_module" command patch soon, in case we need >> it. > > Just to clarify, my suggestion was to repost the bits which were omitted > from the prior patches just so that they are available in the ML archives > etc should anyone ever want to resurrect them in the future. yes, that is what I am gonna do. > > Ian. > -- Best regards, Fu Wei Software Engineer Red Hat Software (Beijing) Co.,Ltd.Shanghai Branch Ph: +86 21 61221326(direct) Ph: +86 186 2020 4684 (mobile) Room 1512, Regus One Corporate Avenue,Level 15, One Corporate Avenue,222 Hubin Road,Huangpu District, Shanghai,China 200021 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] Getting the XSAVE size from userspace
Hello, I need to get the XSAVE size from userspace. The easiest way seems to be to use the XEN_DOMCTL_getvcpuextstate hypercall, but that hypercall is not public / there's no xenctrl.h wrapper for it. There's also struct hvm_hw_cpu_xsave, which I can get to, but it doesn't have a size member: 542 /* 543 * The save area of XSAVE/XRSTOR. 544 */ 545 546 struct hvm_hw_cpu_xsave { 547 uint64_t xfeature_mask;/* Ignored */ 548 uint64_t xcr0; /* Updated by XSETBV */ 549 uint64_t xcr0_accum; /* Updated by XSETBV */ 550 struct { 551 struct { char x[512]; } fpu_sse; 552 553 struct { 554 uint64_t xstate_bv; /* Updated by XRSTOR */ 555 uint64_t reserved[7]; 556 } xsave_hdr;/* The 64-byte header */ 557 558 struct { char x[0]; } ymm;/* YMM */ 559 } save_area; 560 }; I see that in the hypervisor code the length is computed by using the HVM_CPU_XSAVE_SIZE() macro: 2126 #define HVM_CPU_XSAVE_SIZE(xcr0) (offsetof(struct hvm_hw_cpu_xsave, \ 2127save_area) + \ 2128 xstate_ctxt_size(xcr0)) where: 256 static unsigned int _xstate_ctxt_size(u64 xcr0) 257 { 258 u64 act_xcr0 = get_xcr0(); 259 u32 eax, ebx = 0, ecx, edx; 260 bool_t ok = set_xcr0(xcr0); 261 262 ASSERT(ok); 263 cpuid_count(XSTATE_CPUID, 0, &eax, &ebx, &ecx, &edx); 264 ASSERT(ebx <= ecx); 265 ok = set_xcr0(act_xcr0); 266 ASSERT(ok); 267 268 return ebx; 269 } 270 271 /* Fastpath for common xstate size requests, avoiding reloads of xcr0. */ 272 unsigned int xstate_ctxt_size(u64 xcr0) 273 { 274 if ( xcr0 == xfeature_mask ) 275 return xsave_cntxt_size; 276 277 if ( xcr0 == 0 ) 278 return 0; 279 280 return _xstate_ctxt_size(xcr0); 281 } But that doesn't seem to translate cleanly to userspace code. I had hoped that I would be able to get this with no custom Xen patches, is there a simpler way I'm not aware of to get to this information? And if there isn't, would you prefer a libxc patch that exposes XEN_DOMCTL_getvcpuextstate, or one that adds a size member to struct hvm_hw_cpu_xsave (I'd guess the latter)? Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [distros-debian-wheezy test] 38249: all pass
flight 38249 distros-debian-wheezy real [real] http://osstest.xs.citrite.net/~osstest/testlogs/logs/38249/ Perfect :-) All tests in this flight passed baseline version: flight 38221 jobs: build-amd64 pass build-armhf pass build-i386 pass build-amd64-pvopspass build-armhf-pvopspass build-i386-pvops pass test-amd64-amd64-amd64-wheezy-netboot-pvgrub pass test-amd64-i386-i386-wheezy-netboot-pvgrub pass test-amd64-i386-amd64-wheezy-netboot-pygrub pass test-amd64-amd64-i386-wheezy-netboot-pygrub pass sg-report-flight on osstest.xs.citrite.net logs: /home/osstest/logs images: /home/osstest/images Logs, config files, etc. are available at http://osstest.xs.citrite.net/~osstest/testlogs/logs Test harness code can be found at http://xenbits.xensource.com/gitweb?p=osstest.git;a=summary Push not applicable. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [V9 1/3] x86/xsaves: enable xsaves/xrstors/xsavec in xen
On Thu, Nov 05, 2015 at 02:06:25AM -0700, Jan Beulich wrote: > >>> On 05.11.15 at 02:34, wrote: > > On Wed, Nov 04, 2015 at 10:04:33AM -0700, Jan Beulich wrote: > >> >>> On 03.11.15 at 07:27, wrote: > >> > @@ -158,6 +334,20 @@ void xsave(struct vcpu *v, uint64_t mask) > >> > case 4: case 2: > >> > -asm volatile ( "1: .byte 0x0f,0xae,0x2f\n" > >> > - ".section .fixup,\"ax\" \n" > >> > - "2: mov %5,%%ecx\n" > >> > - " xor %1,%1 \n" > >> > - " rep stosb \n" > >> > - " lea %2,%0 \n" > >> > - " mov %3,%1 \n" > >> > - " jmp 1b \n" > >> > - ".previous \n" > >> > - _ASM_EXTABLE(1b, 2b) > >> > - : "+&D" (ptr), "+&a" (lmask) > >> > - : "m" (*ptr), "g" (lmask), "d" (hmask), > >> > - "m" (xsave_cntxt_size) > >> > - : "ecx" ); > >> > +alternative_input("1: "".byte 0x0f,0xae,0x2f", > >> > + ".byte 0x0f,0xc7,0x1f", > >> > + X86_FEATURE_XSAVES, > >> > + "D" (ptr), "m" (*ptr), "a" (lmask), "d" > > (hmask)); > >> > +asm volatile (XSTATE_FIXUP); > >> > break; > >> > } > >> > } > >> > +#undef XSTATE_FIXUP > >> > >> Repeating my comment on v8: "I wonder whether at least for the > >> restore side alternative asm wouldn't result in better readable code > >> and at the same time in a smaller patch." Did you at least look into > >> that option? > >> > > I may misunderstand your meaning. I have adressed the comment by changing > > the restor side using alternative_input. Does "alternative_input" not what > > you want ? > > if it is not what you want, please give me some suggestions how to > > address this ? > > Oh, I'm sorry, I should have looked more closely. The fact that > XSTATE_FIXUP survived made me draw wrong conclusions without > looking more closely. Now the bad news is - you can't split things > like this, as the compiler doesn't make any guarantees as to > register values between two asm()-s. The whole construct needs > to and up as a single asm(), which is why XSTATE_FIXUP and is > unlikely to be of much use here (at least in the context of this > patch; a separate cleanup patch might eliminate the redundancy). > Ok. So alternative_input will not used here (means use the way xrstor in Patch 8)? Or put the XSTATE_FIXUP into alternative_input ? Which one is ok to you ? Thanks > > ___ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel > ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [V9 1/3] x86/xsaves: enable xsaves/xrstors/xsavec in xen
>>> On 05.11.15 at 10:57, wrote: > Ok. So alternative_input will not used here (means use the way > xrstor in Patch 8)? Or put the XSTATE_FIXUP into alternative_input ? > Which one is ok to you ? The latter, if necessary by extending alternative_input() accordingly (or provide a second, more flexible variant if need be; iirc Linux has gained a couple of variants over the years). Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [V9 2/3] x86/xsaves: enable xsaves/xrstors for hvm guest
>>> On 03.11.15 at 07:27, wrote: > @@ -640,6 +640,14 @@ static void vmx_save_msr(struct vcpu *v, struct hvm_msr > *ctxt) > } > > vmx_vmcs_exit(v); > + > +if ( cpu_has_xsaves ) > +{ > +ctxt->msr[ctxt->count].val = v->arch.hvm_vcpu.msr_xss; > +if ( ctxt->msr[ctxt->count].val ) > +ctxt->msr[ctxt->count++].index = MSR_IA32_XSS; > +} > + > } Stray blank line (not the first time I have to make this comment on this series). With it removed, Reviewed-by: Jan Beulich ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v1 07/11] xsplice: Implement payload loading
>>> On 04.11.15 at 23:21, wrote: >> +int xsplice_perform_rela(struct xsplice_elf *elf, >> + struct xsplice_elf_sec *base, >> + struct xsplice_elf_sec *rela) >> +{ >> +Elf64_Rela *r; >> +int symndx, i; > > unsigned int > >> +uint64_t val; >> +uint8_t *dest; >> + > > Can you double check that rela->sec-sh_entsize is not zero first? Perhaps not just not zero, but at least a certain minimum? Or even equaling some sizeof()? Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v1 05/11] elf: Add relocation types to elfstructs.h
>>> On 03.11.15 at 19:16, wrote: > --- a/xen/include/xen/elfstructs.h > +++ b/xen/include/xen/elfstructs.h > @@ -348,6 +348,27 @@ typedef struct { > #define ELF64_R_TYPE(info) ((info) & 0x) > #define ELF64_R_INFO(s,t)(((s) << 32) + (u_int32_t)(t)) > > +/* x86-64 relocation types */ > +#define R_X86_64_NONE0 /* No reloc */ > +#define R_X86_64_64 1 /* Direct 64 bit */ > +#define R_X86_64_PC322 /* PC relative 32 bit signed */ > +#define R_X86_64_GOT32 3 /* 32 bit GOT entry */ > +#define R_X86_64_PLT32 4 /* 32 bit PLT address */ > +#define R_X86_64_COPY5 /* Copy symbol at runtime */ > +#define R_X86_64_GLOB_DAT6 /* Create GOT entry */ > +#define R_X86_64_JUMP_SLOT 7 /* Create PLT entry */ > +#define R_X86_64_RELATIVE8 /* Adjust by program base */ > +#define R_X86_64_GOTPCREL9 /* 32 bit signed pc relative > +offset to GOT */ > +#define R_X86_64_32 10 /* Direct 32 bit zero extended */ > +#define R_X86_64_32S 11 /* Direct 32 bit sign extended */ > +#define R_X86_64_16 12 /* Direct 16 bit zero extended */ > +#define R_X86_64_PC1613 /* 16 bit sign extended pc > relative */ > +#define R_X86_64_8 14 /* Direct 8 bit sign extended */ > +#define R_X86_64_PC8 15 /* 8 bit sign extended pc relative */ > + > +#define R_X86_64_NUM 16 Since the set isn't complete anyway - any reason not to drop everything that's of no relevance to xSplice? Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Getting the XSAVE size from userspace
>>> On 05.11.15 at 10:52, wrote: > I need to get the XSAVE size from userspace. The easiest way seems to be > to use the XEN_DOMCTL_getvcpuextstate hypercall, but that hypercall is > not public / there's no xenctrl.h wrapper for it. Before going into any detail of the rest of your mail - any reason you can't just consult CPUID output? Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Getting the XSAVE size from userspace
On 11/05/2015 12:42 PM, Jan Beulich wrote: On 05.11.15 at 10:52, wrote: >> I need to get the XSAVE size from userspace. The easiest way seems to be >> to use the XEN_DOMCTL_getvcpuextstate hypercall, but that hypercall is >> not public / there's no xenctrl.h wrapper for it. > > Before going into any detail of the rest of your mail - any reason you > can't just consult CPUID output? That's because the userspace application doesn't live in dom0, but in a dedicated privileged domain, and I'm unsure if a CPUID issued there yields the same results as a CPUID issued in dom0. So I thought the safest way is to get the information directly from the hypervisor. Is this assumption incorrect? Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v1 01/11] xsplice: Design document (v2).
On 11/04/2015 09:10 PM, Konrad Rzeszutek Wilk wrote: snip +The payload **MUST** contain enough data to allow us to apply the update +and also safely reverse it. As such we **MUST** know: + + * The locations in memory to be patched. This can be determined dynamically + via symbols or via virtual addresses. + * The new code that will be patched in. + * Signature to verify the payload. Argh. We need to move the 'Signature to verify' in the 'v2' section as I don't think we can get that done in time. No, not for V1. + +This binary format can be constructed using an custom binary format but +there are severe disadvantages of it: + + * The format might need to be changed and we need an mechanism to accommodate + that. + * It has to be platform agnostic. + * Easily constructed using existing tools. + +As such having the payload in an ELF file is the sensible way. We would be +carrying the various sets of structures (and data) in the ELF sections under +different names and with definitions. The prefix for the ELF section name +would always be: *.xsplice* to match up to the names of the structures. + +Note that every structure has padding. This is added so that the hypervisor +can re-use those fields as it sees fit. + +Earlier design attempted to ineptly explain the relations of the ELF sections +to each other without using proper ELF mechanism (sh_info, sh_link, data +structures using Elf types, etc). This design will explain in detail +the structures and how they are used together and not dig in the ELF +format - except mention that the section names should match the +structure names. + +The xSplice payload is a relocatable ELF binary. A typical binary would have: + + * One or more .text sections + * Zero or more read-only data sections + * Zero or more data sections + * Relocations for each of these sections + +It may also have some architecture-specific sections. For example: + + * Alternatives instructions + * Bug frames + * Exception tables + * Relocations for each of these sections + +The xSplice core code loads the payload as a standard ELF binary, relocates it +and handles the architecture-specifc sections as needed. This process is much +like what the Linux kernel module loader does. It contains no xSplice-specific +details and thus will not be discussed further. What is 'it'? The 'process of what module loader does'? 'It' refers to the process of module loading in the previous sentence. + +Importantly, the payload also contains a section with an array of structures +describing the functions to be patched: + +struct xsplice_patch_func { +unsigned long new_addr; +unsigned long new_size; +unsigned long old_addr; +unsigned long old_size; +char *name; +uint8_t pad[64]; +}; + Uh, so 104 bytes ? Or did you mean to s/64/24/ so the structure is nicely padded to 64-bytes? I think that is what you meant. OK. I'm not too fussed about exact sizes for V1 anyway, it's likely to change at some point. -- Ross Lagerwall ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [xen-unstable test] 63540: regressions - FAIL
>>> On 05.11.15 at 04:01, wrote: > flight 63540 xen-unstable real [real] > http://logs.test-lab.xenproject.org/osstest/logs/63540/ > > Regressions :-( > > Tests which did not succeed and are blocking, > including tests which could not be run: > test-amd64-amd64-xl-qemut-winxpsp3 6 xen-bootfail REGR. vs. > 63475 Hmm, did there something go wrong during install? The first boot after install appears to be a kernel booted natively, and then nothing else. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Getting the XSAVE size from userspace
On 05/11/15 10:42, Jan Beulich wrote: On 05.11.15 at 10:52, wrote: >> I need to get the XSAVE size from userspace. The easiest way seems to be >> to use the XEN_DOMCTL_getvcpuextstate hypercall, but that hypercall is >> not public / there's no xenctrl.h wrapper for it. > Before going into any detail of the rest of your mail - any reason you > can't just consult CPUID output? It depends on precisely what you want. CPUID.0xD[0].ecx gives you the maximum xsave area on this processor CPUID.0xD[0].ebx gives you the current size for the value in xcr0, but that is not very useful from userspace. ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Getting the XSAVE size from userspace
>>> On 05.11.15 at 11:49, wrote: > On 05/11/15 10:42, Jan Beulich wrote: > On 05.11.15 at 10:52, wrote: >>> I need to get the XSAVE size from userspace. The easiest way seems to be >>> to use the XEN_DOMCTL_getvcpuextstate hypercall, but that hypercall is >>> not public / there's no xenctrl.h wrapper for it. >> Before going into any detail of the rest of your mail - any reason you >> can't just consult CPUID output? > > It depends on precisely what you want. > > CPUID.0xD[0].ecx gives you the maximum xsave area on this processor > CPUID.0xD[0].ebx gives you the current size for the value in xcr0, but > that is not very useful from userspace. Why would the maximum size not be sufficient for most (all?) user mode purposes? Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Getting the XSAVE size from userspace
>>> On 05.11.15 at 11:47, wrote: > On 11/05/2015 12:42 PM, Jan Beulich wrote: > On 05.11.15 at 10:52, wrote: >>> I need to get the XSAVE size from userspace. The easiest way seems to be >>> to use the XEN_DOMCTL_getvcpuextstate hypercall, but that hypercall is >>> not public / there's no xenctrl.h wrapper for it. >> >> Before going into any detail of the rest of your mail - any reason you >> can't just consult CPUID output? > > That's because the userspace application doesn't live in dom0, but in a > dedicated privileged domain, and I'm unsure if a CPUID issued there > yields the same results as a CPUID issued in dom0. So I thought the > safest way is to get the information directly from the hypervisor. Is > this assumption incorrect? See my other reply (to Andrew) - as long as there's no problem with using the maximum possible size, I don't see why you couldn't use just CPUID. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Getting the XSAVE size from userspace
On 05/11/15 10:47, Razvan Cojocaru wrote: > On 11/05/2015 12:42 PM, Jan Beulich wrote: > On 05.11.15 at 10:52, wrote: >>> I need to get the XSAVE size from userspace. The easiest way seems to be >>> to use the XEN_DOMCTL_getvcpuextstate hypercall, but that hypercall is >>> not public / there's no xenctrl.h wrapper for it. >> Before going into any detail of the rest of your mail - any reason you >> can't just consult CPUID output? > That's because the userspace application doesn't live in dom0, but in a > dedicated privileged domain, and I'm unsure if a CPUID issued there > yields the same results as a CPUID issued in dom0. So I thought the > safest way is to get the information directly from the hypervisor. Is > this assumption incorrect? What purpose are you wanting the information for? Using cpuid (should) get you the information concerning your domain, which is liable to be different to what another domain might see. Currently, the information available through the domain cpuid policy is inaccurate, and *not* migration safe. I am working on fixing this as part 2 of my cpuid levelling fixes. ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [xen-4.3-testing test] 63569: regressions - FAIL
flight 63569 xen-4.3-testing real [real] http://logs.test-lab.xenproject.org/osstest/logs/63569/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: test-amd64-amd64-migrupgrade 21 guest-migrate/src_host/dst_host fail REGR. vs. 63212 Tests which are failing intermittently (not blocking): test-armhf-armhf-xl 3 host-install(3) broken in 63524 pass in 63569 test-amd64-i386-xl-qemuu-winxpsp3-vcpus1 13 guest-localmigrate fail pass in 63524 Regressions which are regarded as allowable (not blocking): test-amd64-amd64-xl-qemuu-win7-amd64 17 guest-stop fail like 63212 Tests which did not succeed, but are not blocking: test-amd64-amd64-rumpuserxen-amd64 1 build-check(1) blocked n/a test-amd64-i386-rumpuserxen-i386 1 build-check(1) blocked n/a test-amd64-amd64-xl-qemuu-ovmf-amd64 9 debian-hvm-install fail never pass build-amd64-rumpuserxen 6 xen-buildfail never pass build-i386-rumpuserxen6 xen-buildfail never pass test-amd64-i386-xl-qemuu-ovmf-amd64 9 debian-hvm-install fail never pass test-amd64-i386-migrupgrade 21 guest-migrate/src_host/dst_host fail never pass test-armhf-armhf-xl-vhd 6 xen-boot fail never pass test-amd64-i386-libvirt 12 migrate-support-checkfail never pass test-armhf-armhf-xl-arndale 6 xen-boot fail never pass test-armhf-armhf-libvirt-qcow2 6 xen-boot fail never pass test-armhf-armhf-libvirt 6 xen-boot fail never pass test-armhf-armhf-xl-multivcpu 6 xen-boot fail never pass test-armhf-armhf-xl-cubietruck 6 xen-boot fail never pass test-armhf-armhf-xl-credit2 6 xen-boot fail never pass test-armhf-armhf-libvirt-raw 6 xen-boot fail never pass test-amd64-i386-xl-qemut-win7-amd64 17 guest-stop fail never pass test-amd64-amd64-libvirt-vhd 11 migrate-support-checkfail never pass test-amd64-amd64-xl-qemut-win7-amd64 17 guest-stop fail never pass test-amd64-amd64-libvirt 12 migrate-support-checkfail never pass test-armhf-armhf-xl 6 xen-boot fail never pass test-amd64-i386-xl-qemuu-win7-amd64 17 guest-stop fail never pass test-amd64-i386-xend-qemut-winxpsp3 21 leak-check/checkfail never pass version targeted for testing: xen e875e0e5fcc5912f71422b53674a97e5c0ae77be baseline version: xen 85ca813ec23c5a60680e4a13777dad530065902b Last test of basis63212 2015-10-22 10:03:01 Z 14 days Failing since 63360 2015-10-29 13:39:04 Z6 days5 attempts Testing same since63381 2015-10-30 18:44:54 Z5 days4 attempts People who touched revisions under test: Andrew Cooper Ian Campbell Ian Jackson Jan Beulich jobs: build-amd64 pass build-armhf pass build-i386 pass build-amd64-libvirt pass build-armhf-libvirt pass build-i386-libvirt pass build-amd64-prev pass build-i386-prev pass build-amd64-pvopspass build-armhf-pvopspass build-i386-pvops pass build-amd64-rumpuserxen fail build-i386-rumpuserxen fail test-amd64-amd64-xl pass test-armhf-armhf-xl fail test-amd64-i386-xl pass test-amd64-i386-qemut-rhel6hvm-amd pass test-amd64-i386-qemuu-rhel6hvm-amd pass test-amd64-amd64-xl-qemut-debianhvm-amd64pass test-amd64-i386-xl-qemut-debianhvm-amd64 pass test-amd64-amd64-xl-qemuu-debianhvm-amd64pass test-amd64-i386-xl-qemuu-debianhvm-amd64 pass test-amd64-i386-freebsd10-amd64 pass test-amd64-amd64-xl-qemuu-ovmf-amd64 fail test-amd64-i386-xl-qemuu-ovmf-amd64 fail test-amd64-amd64-rumpuserxen-amd64 blocked test-amd64-amd64-xl-qemut-win7-amd64 fail test-amd64-i386-xl-qemut-win7-amd64 fail test-amd64-amd64-xl
Re: [Xen-devel] [V9 2/3] x86/xsaves: enable xsaves/xrstors for hvm guest
On Thu, Nov 05, 2015 at 03:28:47AM -0700, Jan Beulich wrote: > >>> On 03.11.15 at 07:27, wrote: > > @@ -640,6 +640,14 @@ static void vmx_save_msr(struct vcpu *v, struct > > hvm_msr *ctxt) > > } > > > > vmx_vmcs_exit(v); > > + > > +if ( cpu_has_xsaves ) > > +{ > > +ctxt->msr[ctxt->count].val = v->arch.hvm_vcpu.msr_xss; > > +if ( ctxt->msr[ctxt->count].val ) > > +ctxt->msr[ctxt->count++].index = MSR_IA32_XSS; > > +} > > + > > } > > Stray blank line (not the first time I have to make this comment on > this series). Sorry for that. > > With it removed, > Reviewed-by: Jan Beulich > Thanks. > > ___ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel > ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v1 07/11] xsplice: Implement payload loading
On 11/04/2015 10:21 PM, Konrad Rzeszutek Wilk wrote: snip + +/* + * The following functions prepare an xSplice module to be executed by + * allocating space, loading the allocated sections, resolving symbols, + * performing relocations, etc. + */ +#ifdef CONFIG_X86 +static void *alloc_module(size_t size) s/module/payload/ My intention was that all the code which implements the "module loader" functionality (and is sort of independent from xSplice) uses the term "module" whereas the payload implies the loaded module + the other xSplice-specific bits. Your thoughts? +{ +mfn_t *mfn, *mfn_ptr; +size_t pages, i; +struct page_info *pg; +unsigned long hole_start, hole_end, cur; +struct payload *data, *data2; + +ASSERT(size); + +pages = PFN_UP(size); +mfn = xmalloc_array(mfn_t, pages); +if ( mfn == NULL ) +return NULL; + +for ( i = 0; i < pages; i++ ) +{ +pg = alloc_domheap_page(NULL, 0); +if ( pg == NULL ) +goto error; +mfn[i] = _mfn(page_to_mfn(pg)); +} This looks like 'vmalloc'. Why not use that? (That explanation should be part of the commit description probably) vmalloc allocates pages and then maps them to an arbitrary virtual address with PAGE_HYPERVISOR. I needed to use a specific virtual address with PAGE_HYPERVISOR_RWX. + +hole_start = (unsigned long)module_virt_start; +hole_end = hole_start + pages * PAGE_SIZE; +spin_lock(&payload_list_lock); +list_for_each_entry ( data, &payload_list, list ) +{ +list_for_each_entry ( data2, &payload_list, list ) +{ +unsigned long start, end; + +start = (unsigned long)data2->module_address; +end = start + data2->module_pages * PAGE_SIZE; +if ( hole_end > start && hole_start < end ) +{ +hole_start = end; +hole_end = hole_start + pages * PAGE_SIZE; +break; +} +} +if ( &data2->list == &payload_list ) +break; +} +spin_unlock(&payload_list_lock); This could be made in a nice function. 'find_hole' perhaps? + +if ( hole_end >= module_virt_end ) +goto error; + +for ( cur = hole_start, mfn_ptr = mfn; pages--; ++mfn_ptr, cur += PAGE_SIZE ) +{ +if ( map_pages_to_xen(cur, mfn_x(*mfn_ptr), 1, PAGE_HYPERVISOR_RWX) ) +{ +if ( cur != hole_start ) +destroy_xen_mappings(hole_start, cur); I think 'destroy_xen_mappings' is OK handling hole_start == cur. +goto error; +} +} +xfree(mfn); +return (void *)hole_start; + + error: +while ( i-- ) +free_domheap_page(mfn_to_page(mfn_x(mfn[i]))); +xfree(mfn); +return NULL; +} +#else +static void *alloc_module(size_t size) s/module/payload/ +{ +return NULL; +} +#endif + +static void free_module(struct payload *payload) +{ +int i; unsigned int; +struct page_info *pg; +PAGE_LIST_HEAD(pg_list); +void *va = payload->module_address; +unsigned long addr = (unsigned long)va; + +if ( !payload->module_address ) +return; How about 'if ( !addr ) return; ? + +payload->module_address = NULL; + +for ( i = 0; i < payload->module_pages; i++ ) +page_list_add(vmap_to_page(va + i * PAGE_SIZE), &pg_list); + +destroy_xen_mappings(addr, addr + payload->module_pages * PAGE_SIZE); + +while ( (pg = page_list_remove_head(&pg_list)) != NULL ) +free_domheap_page(pg); + +payload->module_pages = 0; +} + +static void alloc_section(struct xsplice_elf_sec *sec, size_t *core_size) s/alloc/compute/? +{ +size_t align_size = ROUNDUP(*core_size, sec->sec->sh_addralign); +sec->sec->sh_entsize = align_size; +*core_size = sec->sec->sh_size + align_size; +} + +static int move_module(struct payload *payload, struct xsplice_elf *elf) +{ +uint8_t *buf; +int i; unsigned int i; +size_t core_size = 0; + +/* Allocate text regions */ s/Allocate/Compute/ +for ( i = 0; i < elf->hdr->e_shnum; i++ ) +{ +if ( (elf->sec[i].sec->sh_flags & (SHF_ALLOC|SHF_EXECINSTR)) == + (SHF_ALLOC|SHF_EXECINSTR) ) +alloc_section(&elf->sec[i], &core_size); +} + +/* Allocate rw data */ +for ( i = 0; i < elf->hdr->e_shnum; i++ ) +{ +if ( (elf->sec[i].sec->sh_flags & SHF_ALLOC) && + !(elf->sec[i].sec->sh_flags & SHF_EXECINSTR) && + (elf->sec[i].sec->sh_flags & SHF_WRITE) ) +alloc_section(&elf->sec[i], &core_size); +} + +/* Allocate ro data */ +for ( i = 0; i < elf->hdr->e_shnum; i++ ) +{ +if ( (elf->sec[i].sec->sh_flags & SHF_ALLOC) && + !(elf->sec[i].sec->sh_flags & SHF_EXECINSTR) && + !(elf->sec[i].sec->sh_flags & SHF_WRITE) ) +alloc_section(&elf->sec[i], &core_size); +} + +buf = alloc_module(core_si
Re: [Xen-devel] [RFC PATCH] x86/paravirt: Kill some unused patching functions
On 11/03/2015 10:18 AM, Borislav Petkov wrote: From: Borislav Petkov paravirt_patch_ignore() is completely unused and paravirt_patch_nop() doesn't do a whole lot. Remove them both. Signed-off-by: Borislav Petkov Reviewed-by: Juergen Gross Cc: Andrew Morton Cc: Andy Lutomirski Cc: Chris Wright Cc: "H. Peter Anvin" Cc: Ingo Molnar Cc: Jeremy Fitzhardinge Cc: Juergen Gross Cc: "Peter Zijlstra (Intel)" Cc: Rusty Russell Cc: Thomas Gleixner Cc: virtualizat...@lists.linux-foundation.org Cc: xen-de...@lists.xenproject.org --- arch/x86/include/asm/paravirt_types.h | 2 -- arch/x86/kernel/paravirt.c| 13 + 2 files changed, 1 insertion(+), 14 deletions(-) diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h index 31247b5bff7c..e1f31dfc3b31 100644 --- a/arch/x86/include/asm/paravirt_types.h +++ b/arch/x86/include/asm/paravirt_types.h @@ -402,10 +402,8 @@ extern struct pv_lock_ops pv_lock_ops; __visible extern const char start_##ops##_##name[], end_##ops##_##name[]; \ asm(NATIVE_LABEL("start_", ops, name) code NATIVE_LABEL("end_", ops, name)) -unsigned paravirt_patch_nop(void); unsigned paravirt_patch_ident_32(void *insnbuf, unsigned len); unsigned paravirt_patch_ident_64(void *insnbuf, unsigned len); -unsigned paravirt_patch_ignore(unsigned len); unsigned paravirt_patch_call(void *insnbuf, const void *target, u16 tgt_clobbers, unsigned long addr, u16 site_clobbers, diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c index c2130aef3f9d..4f32a10979db 100644 --- a/arch/x86/kernel/paravirt.c +++ b/arch/x86/kernel/paravirt.c @@ -74,16 +74,6 @@ void __init default_banner(void) /* Undefined instruction for dealing with missing ops pointers. */ static const unsigned char ud2a[] = { 0x0f, 0x0b }; -unsigned paravirt_patch_nop(void) -{ - return 0; -} - -unsigned paravirt_patch_ignore(unsigned len) -{ - return len; -} - struct branch { unsigned char opcode; u32 delta; @@ -152,8 +142,7 @@ unsigned paravirt_patch_default(u8 type, u16 clobbers, void *insnbuf, /* If there's no function, patch it with a ud2a (BUG) */ ret = paravirt_patch_insns(insnbuf, len, ud2a, ud2a+sizeof(ud2a)); else if (opfunc == _paravirt_nop) - /* If the operation is a nop, then nop the callsite */ - ret = paravirt_patch_nop(); + ret = 0; /* identity functions just return their single argument */ else if (opfunc == _paravirt_ident_32) ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [libvirt test] 63578: regressions - FAIL
flight 63578 libvirt real [real] http://logs.test-lab.xenproject.org/osstest/logs/63578/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: build-armhf-libvirt 5 libvirt-build fail REGR. vs. 63340 Tests which did not succeed, but are not blocking: test-armhf-armhf-libvirt-qcow2 1 build-check(1) blocked n/a test-armhf-armhf-libvirt-raw 1 build-check(1) blocked n/a test-armhf-armhf-libvirt-xsm 1 build-check(1) blocked n/a test-armhf-armhf-libvirt 1 build-check(1) blocked n/a test-amd64-amd64-libvirt 12 migrate-support-checkfail never pass test-amd64-amd64-libvirt-xsm 12 migrate-support-checkfail never pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 10 migrate-support-check fail never pass test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 10 migrate-support-check fail never pass test-amd64-amd64-libvirt-vhd 11 migrate-support-checkfail never pass test-amd64-i386-libvirt-xsm 12 migrate-support-checkfail never pass test-amd64-i386-libvirt 12 migrate-support-checkfail never pass version targeted for testing: libvirt ac339206bfe98e78925b183cba058d0e2e7f03e3 baseline version: libvirt 3c7590e0a435d833895fc7b5be489e53e223ad95 Last test of basis63340 2015-10-28 04:19:47 Z8 days Failing since 63352 2015-10-29 04:20:29 Z7 days6 attempts Testing same since63373 2015-10-30 04:21:45 Z6 days5 attempts People who touched revisions under test: Laine Stump Luyao Huang Maxim Perevedentsev Michal Privoznik Roman Bogorodskiy jobs: build-amd64-xsm pass build-armhf-xsm pass build-i386-xsm pass build-amd64 pass build-armhf pass build-i386 pass build-amd64-libvirt pass build-armhf-libvirt fail build-i386-libvirt pass build-amd64-pvopspass build-armhf-pvopspass build-i386-pvops pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm pass test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsmpass test-amd64-amd64-libvirt-xsm pass test-armhf-armhf-libvirt-xsm blocked test-amd64-i386-libvirt-xsm pass test-amd64-amd64-libvirt pass test-armhf-armhf-libvirt blocked test-amd64-i386-libvirt pass test-amd64-amd64-libvirt-pairpass test-amd64-i386-libvirt-pair pass test-armhf-armhf-libvirt-qcow2 blocked test-armhf-armhf-libvirt-raw blocked test-amd64-amd64-libvirt-vhd pass sg-report-flight on osstest.test-lab.xenproject.org logs: /home/logs/logs images: /home/logs/images Logs, config files, etc. are available at http://logs.test-lab.xenproject.org/osstest/logs Explanation of these reports, and of osstest in general, is at http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master Test harness code can be found at http://xenbits.xen.org/gitweb?p=osstest.git;a=summary Not pushing. commit ac339206bfe98e78925b183cba058d0e2e7f03e3 Author: Laine Stump Date: Thu Oct 29 14:09:59 2015 -0400 util: set max wait for IPv6 DAD to 20 seconds This was originally set to 5 seconds, but times of 5.5 to 7 seconds were experienced. Since it's an arbitrary number intended to prevent an infinite hang, having it a bit too high won't hurt anything, and 20 seconds looks to be adequate (i.e. I think/hope we don't need to make it tunable in libvirtd.conf) commit d41a64a1948c88ccec5b4cff34fd04d3aae7a71e Author: Luyao Huang Date: Thu Oct 29 17:47:33 2015 +0800 util: set error if DAD is not finished If DAD not finished in 5 seconds, user will get an unknown error like this: # virsh net-start ipv6
Re: [Xen-devel] Getting the XSAVE size from userspace
On 11/5/2015 12:51 PM, Jan Beulich wrote: On 05.11.15 at 11:49, wrote: On 05/11/15 10:42, Jan Beulich wrote: On 05.11.15 at 10:52, wrote: I need to get the XSAVE size from userspace. The easiest way seems to be to use the XEN_DOMCTL_getvcpuextstate hypercall, but that hypercall is not public / there's no xenctrl.h wrapper for it. Before going into any detail of the rest of your mail - any reason you can't just consult CPUID output? It depends on precisely what you want. CPUID.0xD[0].ecx gives you the maximum xsave area on this processor CPUID.0xD[0].ebx gives you the current size for the value in xcr0, but that is not very useful from userspace. Why would the maximum size not be sufficient for most (all?) user mode purposes? Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel Hello, The use-case is the following: whenever an EPT violation is triggered inside a monitored VM, the introspection logic needs to know how many bytes were accessed (read/written). This is done by inspecting the faulting instruction and directly inferring the size, which is not straight-forward for XSAVE/XRSTOR family. Using the maximum possible size is wrong, as in any given moment the OS may or may not desire to XSAVE/XRSTOR the entire state (and thinking that the instruction tries to access more than it actually does may yield undesired effects). Therefore, the size needed for the currently enabled features of the monitored guest is required instead. Normally, it could be done by running CPUID with eax = 0xD and ecx = i, where i >= 2 and XCR0[i] is 1 (XCR0 belongs to the monitored guest), but I am unsure if using CPUID this way would be safe/desired: will Xen expose the same CPUID features, for XSAVE related functionality, on all VMs? (using XCPUID with eax = 0xD and ecx = 0 would give us the needed size for the SVA, and like I said, using the maximum size would not be safe, even if it's the same across all VMs on a given host). Also, I'm unsure how this would get along with migration... Thanks, Andrei. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH] ocaml/xc: correct shutdown_reason enumeration
As defined by the Xen public header the fifth value of shutdown_reason is watchdog. Signed-off-by: Simon Rowe --- tools/ocaml/libs/xc/xenctrl.ml |2 +- tools/ocaml/libs/xc/xenctrl.mli |2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/tools/ocaml/libs/xc/xenctrl.ml b/tools/ocaml/libs/xc/xenctrl.ml index b7ba8b7..beb95b8 100644 --- a/tools/ocaml/libs/xc/xenctrl.ml +++ b/tools/ocaml/libs/xc/xenctrl.ml @@ -89,7 +89,7 @@ type compile_info = compile_date : string; } -type shutdown_reason = Poweroff | Reboot | Suspend | Crash | Halt +type shutdown_reason = Poweroff | Reboot | Suspend | Crash | Watchdog type domain_create_flag = CDF_HVM | CDF_HAP diff --git a/tools/ocaml/libs/xc/xenctrl.mli b/tools/ocaml/libs/xc/xenctrl.mli index bc4af56..8928a2e 100644 --- a/tools/ocaml/libs/xc/xenctrl.mli +++ b/tools/ocaml/libs/xc/xenctrl.mli @@ -61,7 +61,7 @@ type compile_info = { compile_domain : string; compile_date : string; } -type shutdown_reason = Poweroff | Reboot | Suspend | Crash | Halt +type shutdown_reason = Poweroff | Reboot | Suspend | Crash | Watchdog type domain_create_flag = CDF_HVM | CDF_HAP -- 1.7.10.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Getting the XSAVE size from userspace
On 05/11/15 11:35, Andrei LUTAS wrote: > On 11/5/2015 12:51 PM, Jan Beulich wrote: > On 05.11.15 at 11:49, wrote: >>> On 05/11/15 10:42, Jan Beulich wrote: >>> On 05.11.15 at 10:52, wrote: > I need to get the XSAVE size from userspace. The easiest way seems > to be > to use the XEN_DOMCTL_getvcpuextstate hypercall, but that > hypercall is > not public / there's no xenctrl.h wrapper for it. Before going into any detail of the rest of your mail - any reason you can't just consult CPUID output? >>> It depends on precisely what you want. >>> >>> CPUID.0xD[0].ecx gives you the maximum xsave area on this processor >>> CPUID.0xD[0].ebx gives you the current size for the value in xcr0, but >>> that is not very useful from userspace. >> Why would the maximum size not be sufficient for most (all?) user >> mode purposes? >> >> Jan >> >> >> ___ >> Xen-devel mailing list >> Xen-devel@lists.xen.org >> http://lists.xen.org/xen-devel >> > Hello, > > The use-case is the following: whenever an EPT violation is triggered > inside a monitored VM, the introspection logic needs to know how many > bytes were accessed (read/written). This is done by inspecting the > faulting instruction and directly inferring the size, which is not > straight-forward for XSAVE/XRSTOR family. Using the maximum possible > size is wrong, as in any given moment the OS may or may not desire to > XSAVE/XRSTOR the entire state (and thinking that the instruction tries > to access more than it actually does may yield undesired effects). > Therefore, the size needed for the currently enabled features of the > monitored guest is required instead. Normally, it could be done by > running CPUID with eax = 0xD and ecx = i, where i >= 2 and XCR0[i] is > 1 (XCR0 belongs to the monitored guest), but I am unsure if using > CPUID this way would be safe/desired: will Xen expose the same CPUID > features, for XSAVE related functionality, on all VMs? (using XCPUID > with eax = 0xD and ecx = 0 would give us the needed size for the SVA, > and like I said, using the maximum size would not be safe, even if > it's the same across all VMs on a given host). Also, I'm unsure how > this would get along with migration... Hmm yes - there is no way to do this currently. Xen's CPUID handling for xsave related things is broken in levelling and migration scenarios, which is why it is *still* disabled by default in XenServer. I am working on fixing it, and will take this usecase into account (although I think I had already included enough for this usecase to work). At the point of the xsave/xrestor trap, you need to know xcr0 and be able to perfom a cpuid instruction in the context of a target domain, to make use of 0xD[0].ebx to get the "current size based on xcr0". ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v1 08/11] xsplice: Implement support for applying patches
On 11/05/2015 03:17 AM, Konrad Rzeszutek Wilk wrote: snip diff --git a/xen/arch/x86/xsplice.c b/xen/arch/x86/xsplice.c index dbff0d5..31e4124 100644 --- a/xen/arch/x86/xsplice.c +++ b/xen/arch/x86/xsplice.c @@ -3,6 +3,25 @@ #include #include +#define PATCH_INSN_SIZE 5 + +void xsplice_apply_jmp(struct xsplice_patch_func *func) Don't we want for it to be 'int' Only if an error is expected. +{ +uint32_t val; +uint8_t *old_ptr; + +old_ptr = (uint8_t *)func->old_addr; +memcpy(func->undo, old_ptr, PATCH_INSN_SIZE); And perhaps use something which can catch an exception (#GP) so that this can error out? Why would this fail? +*old_ptr++ = 0xe9; /* Relative jump */ +val = func->new_addr - func->old_addr - PATCH_INSN_SIZE; +memcpy(old_ptr, &val, sizeof val); +} + +void xsplice_revert_jmp(struct xsplice_patch_func *func) +{ +memcpy((void *)func->old_addr, func->undo, PATCH_INSN_SIZE); +} + int xsplice_verify_elf(uint8_t *data, ssize_t len) { diff --git a/xen/common/xsplice.c b/xen/common/xsplice.c index 5e88c55..4476be5 100644 --- a/xen/common/xsplice.c +++ b/xen/common/xsplice.c @@ -11,16 +11,21 @@ #include #include #include +#include #include +#include #include #include #include #include +#include static DEFINE_SPINLOCK(payload_list_lock); static LIST_HEAD(payload_list); +static LIST_HEAD(applied_list); + static unsigned int payload_cnt; static unsigned int payload_version = 1; @@ -29,15 +34,34 @@ struct payload { int32_t rc; /* 0 or -EXX. */ struct list_head list; /* Linked to 'payload_list'. */ +struct list_head applied_list; /* Linked to 'applied_list'. */ +struct xsplice_patch_func *funcs; +int nfuncs; unsigned int; void *module_address; size_t module_pages; char id[XEN_XSPLICE_NAME_SIZE + 1]; /* Name of it. */ }; +/* Defines an outstanding patching action. */ +struct xsplice_work +{ +atomic_t semaphore; /* Used for rendezvous */ +atomic_t irq_semaphore; /* Used to signal all IRQs disabled */ +struct payload *data;/* The payload on which to act */ +volatile bool_t do_work; /* Signals work to do */ +volatile bool_t ready; /* Signals all CPUs synchronized */ +uint32_t cmd;/* Action request. XSPLICE_ACTION_* */ Now since you have a pointer to 'data' can't you follow that for the cmd? Or at least the 'data->state'? I moved cmd out of the payload and into xsplice_work since cmd is only needed when there is work to do. data->state contains the current state of the payload (i.e. before the action has been performed) so it provides no indication of what command needs to be performed. Missing full stops. +}; + +static DEFINE_SPINLOCK(xsplice_work_lock); +/* There can be only one outstanding patching action. */ +static struct xsplice_work xsplice_work; + static int load_module(struct payload *payload, uint8_t *raw, ssize_t len); static void free_module(struct payload *payload); +static int schedule_work(struct payload *data, uint32_t cmd); static const char *state2str(int32_t state) { @@ -341,28 +365,22 @@ static int xsplice_action(xen_sysctl_xsplice_action_t *action) case XSPLICE_ACTION_REVERT: if ( data->state == XSPLICE_STATE_APPLIED ) { -/* No implementation yet. */ -data->state = XSPLICE_STATE_CHECKED; -data->rc = 0; -rc = 0; +data->rc = -EAGAIN; +rc = schedule_work(data, action->cmd); } break; case XSPLICE_ACTION_APPLY: if ( (data->state == XSPLICE_STATE_CHECKED) ) { -/* No implementation yet. */ -data->state = XSPLICE_STATE_APPLIED; -data->rc = 0; -rc = 0; +data->rc = -EAGAIN; +rc = schedule_work(data, action->cmd); } break; case XSPLICE_ACTION_REPLACE: if ( data->state == XSPLICE_STATE_CHECKED ) { -/* No implementation yet. */ -data->state = XSPLICE_STATE_CHECKED; -data->rc = 0; -rc = 0; +data->rc = -EAGAIN; +rc = schedule_work(data, action->cmd); } break; default: @@ -637,6 +655,24 @@ static int perform_relocs(struct xsplice_elf *elf) return 0; } +static int find_special_sections(struct payload *payload, + struct xsplice_elf *elf) +{ +struct xsplice_elf_sec *sec; + +sec = xsplice_elf_sec_by_name(elf, ".xsplice.funcs"); +if ( !sec ) +{ +printk(XENLOG_ERR ".xsplice.funcs is missing\n"); +return -1; +} + +payload->funcs = (struct xsplice_patch_func *)sec->load_addr; +payload->nfuncs = sec->sec->sh_size / (sizeof *payload->funcs); + +return 0; +} That looks like it should belong to another patch? Why? The ar
Re: [Xen-devel] [PATCH v1 07/11] xsplice: Implement payload loading
On 11/05/2015 10:35 AM, Jan Beulich wrote: On 04.11.15 at 23:21, wrote: +int xsplice_perform_rela(struct xsplice_elf *elf, + struct xsplice_elf_sec *base, + struct xsplice_elf_sec *rela) +{ +Elf64_Rela *r; +int symndx, i; unsigned int +uint64_t val; +uint8_t *dest; + Can you double check that rela->sec-sh_entsize is not zero first? Perhaps not just not zero, but at least a certain minimum? Or even equaling some sizeof()? Well it only makes sense if rela->sec-sh_entsize == sizeof(Elf64_Rela) so that is what I shall check for. -- Ross Lagerwall ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v1 05/11] elf: Add relocation types to elfstructs.h
On 11/05/2015 10:38 AM, Jan Beulich wrote: On 03.11.15 at 19:16, wrote: --- a/xen/include/xen/elfstructs.h +++ b/xen/include/xen/elfstructs.h @@ -348,6 +348,27 @@ typedef struct { #define ELF64_R_TYPE(info) ((info) & 0x) #define ELF64_R_INFO(s,t) (((s) << 32) + (u_int32_t)(t)) +/* x86-64 relocation types */ +#define R_X86_64_NONE 0 /* No reloc */ +#define R_X86_64_641 /* Direct 64 bit */ +#define R_X86_64_PC32 2 /* PC relative 32 bit signed */ +#define R_X86_64_GOT32 3 /* 32 bit GOT entry */ +#define R_X86_64_PLT32 4 /* 32 bit PLT address */ +#define R_X86_64_COPY 5 /* Copy symbol at runtime */ +#define R_X86_64_GLOB_DAT 6 /* Create GOT entry */ +#define R_X86_64_JUMP_SLOT 7 /* Create PLT entry */ +#define R_X86_64_RELATIVE 8 /* Adjust by program base */ +#define R_X86_64_GOTPCREL 9 /* 32 bit signed pc relative + offset to GOT */ +#define R_X86_64_3210 /* Direct 32 bit zero extended */ +#define R_X86_64_32S 11 /* Direct 32 bit sign extended */ +#define R_X86_64_1612 /* Direct 16 bit zero extended */ +#define R_X86_64_PC16 13 /* 16 bit sign extended pc relative */ +#define R_X86_64_8 14 /* Direct 8 bit sign extended */ +#define R_X86_64_PC8 15 /* 8 bit sign extended pc relative */ + +#define R_X86_64_NUM 16 Since the set isn't complete anyway - any reason not to drop everything that's of no relevance to xSplice? I copied these definitions from Linux (wrongly) assuming that they were complete. I shall remove the unused ones. -- Ross Lagerwall ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] ocaml/xc: correct shutdown_reason enumeration
> On 5 Nov 2015, at 11:39, Simon Rowe wrote: > > As defined by the Xen public header the fifth value of > shutdown_reason is watchdog. I’ve always been a bit suspicious about having both “Poweroff” and “Halt” there. Perhaps there was some confusion between what could be written to ‘control/shutdown’ in xenstore and legal arguments to `xc_domain_shutdown` and `SCHEDOP_shutdown`? Anyway you’re clearly right, `Watchdog` is the 5th value. So I think this is fine. Acked-by: David Scott I happen to notice there’s a type with the same name in “xenopsd”[1], so I’ve cc:d xen-api@lists as a heads-up. Thanks, Dave [1] https://github.com/xapi-project/xenopsd/blob/7818ab896d9969c5f5462a2f0d0ae62703b104b6/xc/domain.ml#L268 > > Signed-off-by: Simon Rowe > --- > tools/ocaml/libs/xc/xenctrl.ml |2 +- > tools/ocaml/libs/xc/xenctrl.mli |2 +- > 2 files changed, 2 insertions(+), 2 deletions(-) > > diff --git a/tools/ocaml/libs/xc/xenctrl.ml b/tools/ocaml/libs/xc/xenctrl.ml > index b7ba8b7..beb95b8 100644 > --- a/tools/ocaml/libs/xc/xenctrl.ml > +++ b/tools/ocaml/libs/xc/xenctrl.ml > @@ -89,7 +89,7 @@ type compile_info = > compile_date : string; > } > > -type shutdown_reason = Poweroff | Reboot | Suspend | Crash | Halt > +type shutdown_reason = Poweroff | Reboot | Suspend | Crash | Watchdog > > type domain_create_flag = CDF_HVM | CDF_HAP > > diff --git a/tools/ocaml/libs/xc/xenctrl.mli b/tools/ocaml/libs/xc/xenctrl.mli > index bc4af56..8928a2e 100644 > --- a/tools/ocaml/libs/xc/xenctrl.mli > +++ b/tools/ocaml/libs/xc/xenctrl.mli > @@ -61,7 +61,7 @@ type compile_info = { > compile_domain : string; > compile_date : string; > } > -type shutdown_reason = Poweroff | Reboot | Suspend | Crash | Halt > +type shutdown_reason = Poweroff | Reboot | Suspend | Crash | Watchdog > > type domain_create_flag = CDF_HVM | CDF_HAP > > -- > 1.7.10.4 > ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v7 27/32] xen/x86: allow HVM guests to use hypercalls to bring up vCPUs
El 19/10/15 a les 17.48, Jan Beulich ha escrit: On 02.10.15 at 17:48, wrote: >> @@ -1176,6 +1177,190 @@ int arch_set_info_guest( >> #undef c >> } >> >> +/* Called by VCPUOP_initialise for HVM guests. */ >> +int arch_set_info_hvm_guest(struct vcpu *v, vcpu_hvm_context_t *ctx) > > const ... *ctx Sure. >> +{ >> +struct cpu_user_regs *uregs = &v->arch.user_regs; >> +struct segment_register cs, ds, ss, es, tr; >> + >> +switch ( ctx->mode ) >> +{ >> +default: >> +return -EINVAL; >> + >> +case VCPU_HVM_MODE_32B: >> +{ >> +const struct vcpu_hvm_x86_32 *regs = &ctx->cpu_regs.x86_32; >> +uint32_t limit; >> + >> +#define SEG(s, r) \ >> +(struct segment_register){ .sel = 0, .base = (r)->s ## _base, \ >> +.limit = (r)->s ## _limit, .attr.bytes = (r)->s ## _ar } >> +cs = SEG(cs, regs); >> +ds = SEG(ds, regs); >> +ss = SEG(ss, regs); >> +es = SEG(es, regs); >> +tr = SEG(tr, regs); >> +#undef SEG >> + >> +/* Basic sanity checks. */ >> +if ( cs.attr.fields.pad != 0 || ds.attr.fields.pad != 0 || >> + ss.attr.fields.pad != 0 || es.attr.fields.pad != 0 || >> + tr.attr.fields.pad != 0 ) >> +{ >> +gprintk(XENLOG_ERR, "Attribute bits 12-15 of the segments are >> not null\n"); >> +return -EINVAL; >> +} >> + >> +limit = cs.limit * (cs.attr.fields.g ? PAGE_SIZE : 1); >> +if ( regs->eip > limit ) >> +{ >> +gprintk(XENLOG_ERR, "EIP address is outside of the CS limit\n"); >> +return -EINVAL; >> +} >> + >> +if ( ds.attr.fields.dpl > cs.attr.fields.dpl ) > > Checks like this imo need to take into account cases where the effect > of a null selector loaded into the register is intended (in which case I > would assume DPL to not matter). Speaking of which - with all these > DPL checks done, what about non-code segments loaded into CS or > other illegal things? Question is whether the > hvm_set_segment_register() calls below could be made take care of > these instead of having to enumerate everything here. hvm_set_segment_register is just an inline wrapper around hvm_funcs.set_segment_register. I could turn that into a proper function with checks, but it's a shame because hvm_load_segment_selector also performs some of this checks, but it requires a valid GDT to be loaded in order to use it which we don't have. I don't mind adding some more checks to the current ones: - Check that all segments that are not null selectors have the 'present' bit set. - Check that CS.type matches a code segment. - Check that all segments except CS don't have the 'code' type. - Don't perform the DPL check if the segment is a null selector. I'm adding a small inline stub to do this checks. >> --- a/xen/common/compat/domain.c >> +++ b/xen/common/compat/domain.c >> @@ -10,6 +10,9 @@ >> #include >> #include >> #include >> +#ifdef CONFIG_X86 >> +#include >> +#endif > > I'd avoid such #if-s in this file, since it's only x86 that uses compat > code right now. OK, knowing that the compat code is only used in x86 helps to simplify some of this code also. >> --- a/xen/common/domain.c >> +++ b/xen/common/domain.c >> @@ -1207,11 +1207,35 @@ void unmap_vcpu_info(struct vcpu *v) >> put_page_and_type(mfn_to_page(mfn)); >> } >> >> +static int default_initialize_vcpu(struct vcpu *v, >> + XEN_GUEST_HANDLE_PARAM(void) arg) >> +{ >> +struct vcpu_guest_context *ctxt; >> +struct domain *d = v->domain; >> +int rc; >> + >> +if ( (ctxt = alloc_vcpu_guest_context()) == NULL ) >> +return -ENOMEM; >> + >> +if ( copy_from_guest(ctxt, arg, 1) ) >> +{ >> +free_vcpu_guest_context(ctxt); >> +return -EFAULT; >> +} >> + >> +domain_lock(d); >> +rc = v->is_initialised ? -EEXIST : arch_set_info_guest(v, ctxt); >> +domain_unlock(d); >> + >> +free_vcpu_guest_context(ctxt); >> + >> +return rc; >> +} >> + >> long do_vcpu_op(int cmd, unsigned int vcpuid, XEN_GUEST_HANDLE_PARAM(void) >> arg) >> { >> struct domain *d = current->domain; >> struct vcpu *v; >> -struct vcpu_guest_context *ctxt; >> long rc = 0; >> >> if ( vcpuid >= d->max_vcpus || (v = d->vcpu[vcpuid]) == NULL ) >> @@ -1223,20 +1247,28 @@ long do_vcpu_op(int cmd, unsigned int vcpuid, >> XEN_GUEST_HANDLE_PARAM(void) arg) >> if ( v->vcpu_info == &dummy_vcpu_info ) >> return -EINVAL; >> >> -if ( (ctxt = alloc_vcpu_guest_context()) == NULL ) >> -return -ENOMEM; >> - >> -if ( copy_from_guest(ctxt, arg, 1) ) >> +#if defined(CONFIG_X86) > > Looks like you went from one extreme to the other: Now there's no > per-arch function anymore, and hence you need this ugly #ifdef-ery. > Why don't you add default_initialize_
Re: [Xen-devel] [PATCH v1 07/11] xsplice: Implement payload loading
>>> On 05.11.15 at 12:51, wrote: > On 11/05/2015 10:35 AM, Jan Beulich wrote: > On 04.11.15 at 23:21, wrote: +int xsplice_perform_rela(struct xsplice_elf *elf, + struct xsplice_elf_sec *base, + struct xsplice_elf_sec *rela) +{ +Elf64_Rela *r; +int symndx, i; >>> >>> unsigned int >>> +uint64_t val; +uint8_t *dest; + >>> >>> Can you double check that rela->sec-sh_entsize is not zero first? >> >> Perhaps not just not zero, but at least a certain minimum? Or even >> equaling some sizeof()? >> > > Well it only makes sense if rela->sec-sh_entsize == sizeof(Elf64_Rela) > so that is what I shall check for. The question whether to use == or >= really depends on whether we expect (theoretical) additions to the structure to be backwards compatible. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH] x86/hvm: make sure stdvga cache cannot be re-enabled
As soon as the cache is disabled, it will become out-of-sync with the VGA device model and since no mechanism exists to acquire current VRAM state from the device model, re-enabling it leads to stale data being seen by the guest. The problem can be seen by deliberately crashing a Windows guest; the BSOD output is corrupted. This patch changes the existing 'cache' boolean in hvm_hw_stdvga into a tri-state enum and only allows the state to move from 'uninitialized' to 'enabled'. Once the cache state becomes 'disabled' it will remain so for the lifetime of the VM. Signed-off-by: Paul Durrant Cc: Keir Fraser Cc: Jan Beulich Cc: Andrew Cooper --- xen/arch/x86/hvm/save.c | 2 +- xen/arch/x86/hvm/stdvga.c| 50 xen/include/asm-x86/hvm/io.h | 8 ++- 3 files changed, 45 insertions(+), 15 deletions(-) diff --git a/xen/arch/x86/hvm/save.c b/xen/arch/x86/hvm/save.c index 4660beb..f7d4999 100644 --- a/xen/arch/x86/hvm/save.c +++ b/xen/arch/x86/hvm/save.c @@ -73,7 +73,7 @@ int arch_hvm_load(struct domain *d, struct hvm_save_header *hdr) d->arch.hvm_domain.sync_tsc = rdtsc(); /* VGA state is not saved/restored, so we nobble the cache. */ -d->arch.hvm_domain.stdvga.cache = 0; +d->arch.hvm_domain.stdvga.cache = STDVGA_CACHE_DISABLED; return 0; } diff --git a/xen/arch/x86/hvm/stdvga.c b/xen/arch/x86/hvm/stdvga.c index 02a97f9..246c629 100644 --- a/xen/arch/x86/hvm/stdvga.c +++ b/xen/arch/x86/hvm/stdvga.c @@ -101,6 +101,37 @@ static void vram_put(struct hvm_hw_stdvga *s, void *p) unmap_domain_page(p); } +static void stdvga_try_cache_enable(struct hvm_hw_stdvga *s) +{ +/* + * Caching mode can only be enabled if the the cache has + * never been used before. As soon as it is disabled, it will + * become out-of-sync with the VGA device model and since no + * mechanism exists to acquire current VRAM state from the + * device model, re-enabling it would lead to stale data being + * seen by the guest. + */ +if ( s->cache != STDVGA_CACHE_UNINITIALIZED ) +return; + +gdprintk(XENLOG_INFO, "entering caching mode\n"); +s->cache = STDVGA_CACHE_ENABLED; +} + +static void stdvga_cache_disable(struct hvm_hw_stdvga *s) +{ +if ( s->cache != STDVGA_CACHE_ENABLED ) +return; + +gdprintk(XENLOG_INFO, "leaving caching mode\n"); +s->cache = STDVGA_CACHE_DISABLED; +} + +static bool_t stdvga_cache_is_enabled(struct hvm_hw_stdvga *s) +{ +return s->cache == STDVGA_CACHE_ENABLED; +} + static int stdvga_outb(uint64_t addr, uint8_t val) { struct hvm_hw_stdvga *s = ¤t->domain->arch.hvm_domain.stdvga; @@ -139,12 +170,8 @@ static int stdvga_outb(uint64_t addr, uint8_t val) if ( !prev_stdvga && s->stdvga ) { -/* - * (Re)start caching of video buffer. - * XXX TODO: In case of a restart the cache could be unsynced. - */ -s->cache = 1; -gdprintk(XENLOG_INFO, "entering stdvga and caching modes\n"); +gdprintk(XENLOG_INFO, "entering stdvga mode\n"); +stdvga_try_cache_enable(s); } else if ( prev_stdvga && !s->stdvga ) { @@ -441,7 +468,7 @@ static int stdvga_mem_write(const struct hvm_io_handler *handler, }; struct hvm_ioreq_server *srv; -if ( !s->cache || !s->stdvga ) +if ( !stdvga_cache_is_enabled(s) || !s->stdvga ) goto done; /* Intercept mmio write */ @@ -515,15 +542,12 @@ static bool_t stdvga_mem_accept(const struct hvm_io_handler *handler, * not active since we can assert, when in stdvga mode, that writes * to VRAM have no side effect and thus we can try to buffer them. */ -if ( s->cache ) -{ -gdprintk(XENLOG_INFO, "leaving caching mode\n"); -s->cache = 0; -} +stdvga_cache_disable(s); goto reject; } -else if ( p->dir == IOREQ_READ && (!s->cache || !s->stdvga) ) +else if ( p->dir == IOREQ_READ && + (!stdvga_cache_is_enabled(s) || !s->stdvga) ) goto reject; /* s->lock intentionally held */ diff --git a/xen/include/asm-x86/hvm/io.h b/xen/include/asm-x86/hvm/io.h index 8585a1f..ceefa2e 100644 --- a/xen/include/asm-x86/hvm/io.h +++ b/xen/include/asm-x86/hvm/io.h @@ -128,13 +128,19 @@ void hvm_dpci_eoi(struct domain *d, unsigned int guest_irq, void msix_write_completion(struct vcpu *); void msixtbl_init(struct domain *d); +enum stdvga_cache_state { +STDVGA_CACHE_UNINITIALIZED, +STDVGA_CACHE_ENABLED, +STDVGA_CACHE_DISABLED +}; + struct hvm_hw_stdvga { uint8_t sr_index; uint8_t sr[8]; uint8_t gr_index; uint8_t gr[9]; bool_t stdvga; -bool_t cache; +enum stdvga_cache_state cache; uint32_t latch; struct page_info *vram_page[64]; /* shadow of 0xa-0xa */ spinlock_t lock; -- 2.1.4 ___ Xen-devel mailing list
Re: [Xen-devel] Getting the XSAVE size from userspace
On 11/05/2015 01:44 PM, Andrew Cooper wrote: > On 05/11/15 11:35, Andrei LUTAS wrote: >> The use-case is the following: whenever an EPT violation is triggered >> inside a monitored VM, the introspection logic needs to know how many >> bytes were accessed (read/written). This is done by inspecting the >> faulting instruction and directly inferring the size, which is not >> straight-forward for XSAVE/XRSTOR family. Using the maximum possible >> size is wrong, as in any given moment the OS may or may not desire to >> XSAVE/XRSTOR the entire state (and thinking that the instruction tries >> to access more than it actually does may yield undesired effects). >> Therefore, the size needed for the currently enabled features of the >> monitored guest is required instead. Normally, it could be done by >> running CPUID with eax = 0xD and ecx = i, where i >= 2 and XCR0[i] is >> 1 (XCR0 belongs to the monitored guest), but I am unsure if using >> CPUID this way would be safe/desired: will Xen expose the same CPUID >> features, for XSAVE related functionality, on all VMs? (using XCPUID >> with eax = 0xD and ecx = 0 would give us the needed size for the SVA, >> and like I said, using the maximum size would not be safe, even if >> it's the same across all VMs on a given host). Also, I'm unsure how >> this would get along with migration... > > Hmm yes - there is no way to do this currently. > > Xen's CPUID handling for xsave related things is broken in levelling and > migration scenarios, which is why it is *still* disabled by default in > XenServer. > > I am working on fixing it, and will take this usecase into account > (although I think I had already included enough for this usecase to work). > > At the point of the xsave/xrestor trap, you need to know xcr0 and be > able to perfom a cpuid instruction in the context of a target domain, to > make use of 0xD[0].ebx to get the "current size based on xcr0". So then the closest thing to what we need would be to add a size field to struct hvm_hw_cpu_xsave, and just assign the size variable to it in hvm_save_cpu_xsave_states (migration aside)? 2130 static int hvm_save_cpu_xsave_states(struct domain *d, hvm_domain_context_t *h) 2131 { 2132 struct vcpu *v; 2133 struct hvm_hw_cpu_xsave *ctxt; 2134 2135 if ( !cpu_has_xsave ) 2136 return 0; /* do nothing */ 2137 2138 for_each_vcpu ( d, v ) 2139 { 2140 unsigned int size = HVM_CPU_XSAVE_SIZE(v->arch.xcr0_accum); 2141 2142 if ( !xsave_enabled(v) ) 2143 continue; 2144 if ( _hvm_init_entry(h, CPU_XSAVE_CODE, v->vcpu_id, size) ) 2145 return 1; 2146 ctxt = (struct hvm_hw_cpu_xsave *)&h->data[h->cur]; 2147 h->cur += size; 2148 2149 ctxt->xfeature_mask = xfeature_mask; 2150 ctxt->xcr0 = v->arch.xcr0; 2151 ctxt->xcr0_accum = v->arch.xcr0_accum; 2152 memcpy(&ctxt->save_area, v->arch.xsave_area, 2153size - offsetof(struct hvm_hw_cpu_xsave, save_area)); 2154 } 2155 2156 return 0; 2157 } Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] x86/hvm: make sure stdvga cache cannot be re-enabled
On 05/11/15 12:17, Paul Durrant wrote: > As soon as the cache is disabled, it will become out-of-sync with the > VGA device model and since no mechanism exists to acquire current VRAM > state from the device model, re-enabling it leads to stale data > being seen by the guest. > > The problem can be seen by deliberately crashing a Windows guest; the > BSOD output is corrupted. > > This patch changes the existing 'cache' boolean in hvm_hw_stdvga into a > tri-state enum and only allows the state to move from 'uninitialized' to > 'enabled'. Once the cache state becomes 'disabled' it will remain so for > the lifetime of the VM. Should identify that this is a regression introduced by c/s 3bbaaec09b1b942f5624dee176da6e416d31f982 > > Signed-off-by: Paul Durrant > Cc: Keir Fraser > Cc: Jan Beulich > Cc: Andrew Cooper Reviewed-by: Andrew Cooper , with one small issue which could be fixed on commit... > --- > xen/arch/x86/hvm/save.c | 2 +- > xen/arch/x86/hvm/stdvga.c| 50 > > xen/include/asm-x86/hvm/io.h | 8 ++- > 3 files changed, 45 insertions(+), 15 deletions(-) > > diff --git a/xen/arch/x86/hvm/save.c b/xen/arch/x86/hvm/save.c > index 4660beb..f7d4999 100644 > --- a/xen/arch/x86/hvm/save.c > +++ b/xen/arch/x86/hvm/save.c > @@ -73,7 +73,7 @@ int arch_hvm_load(struct domain *d, struct hvm_save_header > *hdr) > d->arch.hvm_domain.sync_tsc = rdtsc(); > > /* VGA state is not saved/restored, so we nobble the cache. */ > -d->arch.hvm_domain.stdvga.cache = 0; > +d->arch.hvm_domain.stdvga.cache = STDVGA_CACHE_DISABLED; > > return 0; > } > diff --git a/xen/arch/x86/hvm/stdvga.c b/xen/arch/x86/hvm/stdvga.c > index 02a97f9..246c629 100644 > --- a/xen/arch/x86/hvm/stdvga.c > +++ b/xen/arch/x86/hvm/stdvga.c > @@ -101,6 +101,37 @@ static void vram_put(struct hvm_hw_stdvga *s, void *p) > unmap_domain_page(p); > } > > +static void stdvga_try_cache_enable(struct hvm_hw_stdvga *s) > +{ > +/* > + * Caching mode can only be enabled if the the cache has > + * never been used before. As soon as it is disabled, it will > + * become out-of-sync with the VGA device model and since no > + * mechanism exists to acquire current VRAM state from the > + * device model, re-enabling it would lead to stale data being > + * seen by the guest. > + */ > +if ( s->cache != STDVGA_CACHE_UNINITIALIZED ) > +return; > + > +gdprintk(XENLOG_INFO, "entering caching mode\n"); > +s->cache = STDVGA_CACHE_ENABLED; > +} > + > +static void stdvga_cache_disable(struct hvm_hw_stdvga *s) > +{ > +if ( s->cache != STDVGA_CACHE_ENABLED ) > +return; > + > +gdprintk(XENLOG_INFO, "leaving caching mode\n"); > +s->cache = STDVGA_CACHE_DISABLED; > +} > + > +static bool_t stdvga_cache_is_enabled(struct hvm_hw_stdvga *s) const struct hvm_hw_stdvga *s ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] x86/hvm: make sure stdvga cache cannot be re-enabled
> -Original Message- > From: Andrew Cooper [mailto:andrew.coop...@citrix.com] > Sent: 05 November 2015 12:32 > To: Paul Durrant; xen-de...@lists.xenproject.org > Cc: Keir (Xen.org); Jan Beulich > Subject: Re: [PATCH] x86/hvm: make sure stdvga cache cannot be re- > enabled > > On 05/11/15 12:17, Paul Durrant wrote: > > As soon as the cache is disabled, it will become out-of-sync with the > > VGA device model and since no mechanism exists to acquire current VRAM > > state from the device model, re-enabling it leads to stale data > > being seen by the guest. > > > > The problem can be seen by deliberately crashing a Windows guest; the > > BSOD output is corrupted. > > > > This patch changes the existing 'cache' boolean in hvm_hw_stdvga into a > > tri-state enum and only allows the state to move from 'uninitialized' to > > 'enabled'. Once the cache state becomes 'disabled' it will remain so for > > the lifetime of the VM. > > Should identify that this is a regression introduced by c/s > 3bbaaec09b1b942f5624dee176da6e416d31f982 > > > > > Signed-off-by: Paul Durrant > > Cc: Keir Fraser > > Cc: Jan Beulich > > Cc: Andrew Cooper > > Reviewed-by: Andrew Cooper , with one > small > issue which could be fixed on commit... > > > --- > > xen/arch/x86/hvm/save.c | 2 +- > > xen/arch/x86/hvm/stdvga.c| 50 > > > xen/include/asm-x86/hvm/io.h | 8 ++- > > 3 files changed, 45 insertions(+), 15 deletions(-) > > > > diff --git a/xen/arch/x86/hvm/save.c b/xen/arch/x86/hvm/save.c > > index 4660beb..f7d4999 100644 > > --- a/xen/arch/x86/hvm/save.c > > +++ b/xen/arch/x86/hvm/save.c > > @@ -73,7 +73,7 @@ int arch_hvm_load(struct domain *d, struct > hvm_save_header *hdr) > > d->arch.hvm_domain.sync_tsc = rdtsc(); > > > > /* VGA state is not saved/restored, so we nobble the cache. */ > > -d->arch.hvm_domain.stdvga.cache = 0; > > +d->arch.hvm_domain.stdvga.cache = STDVGA_CACHE_DISABLED; > > > > return 0; > > } > > diff --git a/xen/arch/x86/hvm/stdvga.c b/xen/arch/x86/hvm/stdvga.c > > index 02a97f9..246c629 100644 > > --- a/xen/arch/x86/hvm/stdvga.c > > +++ b/xen/arch/x86/hvm/stdvga.c > > @@ -101,6 +101,37 @@ static void vram_put(struct hvm_hw_stdvga *s, > void *p) > > unmap_domain_page(p); > > } > > > > +static void stdvga_try_cache_enable(struct hvm_hw_stdvga *s) > > +{ > > +/* > > + * Caching mode can only be enabled if the the cache has > > + * never been used before. As soon as it is disabled, it will > > + * become out-of-sync with the VGA device model and since no > > + * mechanism exists to acquire current VRAM state from the > > + * device model, re-enabling it would lead to stale data being > > + * seen by the guest. > > + */ > > +if ( s->cache != STDVGA_CACHE_UNINITIALIZED ) > > +return; > > + > > +gdprintk(XENLOG_INFO, "entering caching mode\n"); > > +s->cache = STDVGA_CACHE_ENABLED; > > +} > > + > > +static void stdvga_cache_disable(struct hvm_hw_stdvga *s) > > +{ > > +if ( s->cache != STDVGA_CACHE_ENABLED ) > > +return; > > + > > +gdprintk(XENLOG_INFO, "leaving caching mode\n"); > > +s->cache = STDVGA_CACHE_DISABLED; > > +} > > + > > +static bool_t stdvga_cache_is_enabled(struct hvm_hw_stdvga *s) > > const struct hvm_hw_stdvga *s > I'll re-spin with this fixed and regression-introducing commit mentioned in the message. Paul > ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v2] x86/hvm: make sure stdvga cache cannot be re-enabled
As soon as the cache is disabled, it will become out-of-sync with the VGA device model and since no mechanism exists to acquire current VRAM state from the device model, re-enabling it leads to stale data being seen by the guest. The problem was introduced by commit 3bbaaec0 ("x86/hvm: unify stdvga mmio intercept with standard mmio intercept") and can be seen by deliberately crashing a Windows guest; the BSOD output is corrupted. This patch changes the existing 'cache' boolean in hvm_hw_stdvga into a tri-state enum and only allows the state to move from 'uninitialized' to 'enabled'. Once the cache state becomes 'disabled' it will remain so for the lifetime of the VM. Signed-off-by: Paul Durrant Cc: Keir Fraser Cc: Jan Beulich Reviewed-by: Andrew Cooper --- xen/arch/x86/hvm/save.c | 2 +- xen/arch/x86/hvm/stdvga.c| 50 xen/include/asm-x86/hvm/io.h | 8 ++- 3 files changed, 45 insertions(+), 15 deletions(-) diff --git a/xen/arch/x86/hvm/save.c b/xen/arch/x86/hvm/save.c index 4660beb..f7d4999 100644 --- a/xen/arch/x86/hvm/save.c +++ b/xen/arch/x86/hvm/save.c @@ -73,7 +73,7 @@ int arch_hvm_load(struct domain *d, struct hvm_save_header *hdr) d->arch.hvm_domain.sync_tsc = rdtsc(); /* VGA state is not saved/restored, so we nobble the cache. */ -d->arch.hvm_domain.stdvga.cache = 0; +d->arch.hvm_domain.stdvga.cache = STDVGA_CACHE_DISABLED; return 0; } diff --git a/xen/arch/x86/hvm/stdvga.c b/xen/arch/x86/hvm/stdvga.c index 02a97f9..86c94d2 100644 --- a/xen/arch/x86/hvm/stdvga.c +++ b/xen/arch/x86/hvm/stdvga.c @@ -101,6 +101,37 @@ static void vram_put(struct hvm_hw_stdvga *s, void *p) unmap_domain_page(p); } +static void stdvga_try_cache_enable(struct hvm_hw_stdvga *s) +{ +/* + * Caching mode can only be enabled if the the cache has + * never been used before. As soon as it is disabled, it will + * become out-of-sync with the VGA device model and since no + * mechanism exists to acquire current VRAM state from the + * device model, re-enabling it would lead to stale data being + * seen by the guest. + */ +if ( s->cache != STDVGA_CACHE_UNINITIALIZED ) +return; + +gdprintk(XENLOG_INFO, "entering caching mode\n"); +s->cache = STDVGA_CACHE_ENABLED; +} + +static void stdvga_cache_disable(struct hvm_hw_stdvga *s) +{ +if ( s->cache != STDVGA_CACHE_ENABLED ) +return; + +gdprintk(XENLOG_INFO, "leaving caching mode\n"); +s->cache = STDVGA_CACHE_DISABLED; +} + +static bool_t stdvga_cache_is_enabled(const struct hvm_hw_stdvga *s) +{ +return s->cache == STDVGA_CACHE_ENABLED; +} + static int stdvga_outb(uint64_t addr, uint8_t val) { struct hvm_hw_stdvga *s = ¤t->domain->arch.hvm_domain.stdvga; @@ -139,12 +170,8 @@ static int stdvga_outb(uint64_t addr, uint8_t val) if ( !prev_stdvga && s->stdvga ) { -/* - * (Re)start caching of video buffer. - * XXX TODO: In case of a restart the cache could be unsynced. - */ -s->cache = 1; -gdprintk(XENLOG_INFO, "entering stdvga and caching modes\n"); +gdprintk(XENLOG_INFO, "entering stdvga mode\n"); +stdvga_try_cache_enable(s); } else if ( prev_stdvga && !s->stdvga ) { @@ -441,7 +468,7 @@ static int stdvga_mem_write(const struct hvm_io_handler *handler, }; struct hvm_ioreq_server *srv; -if ( !s->cache || !s->stdvga ) +if ( !stdvga_cache_is_enabled(s) || !s->stdvga ) goto done; /* Intercept mmio write */ @@ -515,15 +542,12 @@ static bool_t stdvga_mem_accept(const struct hvm_io_handler *handler, * not active since we can assert, when in stdvga mode, that writes * to VRAM have no side effect and thus we can try to buffer them. */ -if ( s->cache ) -{ -gdprintk(XENLOG_INFO, "leaving caching mode\n"); -s->cache = 0; -} +stdvga_cache_disable(s); goto reject; } -else if ( p->dir == IOREQ_READ && (!s->cache || !s->stdvga) ) +else if ( p->dir == IOREQ_READ && + (!stdvga_cache_is_enabled(s) || !s->stdvga) ) goto reject; /* s->lock intentionally held */ diff --git a/xen/include/asm-x86/hvm/io.h b/xen/include/asm-x86/hvm/io.h index 8585a1f..ceefa2e 100644 --- a/xen/include/asm-x86/hvm/io.h +++ b/xen/include/asm-x86/hvm/io.h @@ -128,13 +128,19 @@ void hvm_dpci_eoi(struct domain *d, unsigned int guest_irq, void msix_write_completion(struct vcpu *); void msixtbl_init(struct domain *d); +enum stdvga_cache_state { +STDVGA_CACHE_UNINITIALIZED, +STDVGA_CACHE_ENABLED, +STDVGA_CACHE_DISABLED +}; + struct hvm_hw_stdvga { uint8_t sr_index; uint8_t sr[8]; uint8_t gr_index; uint8_t gr[9]; bool_t stdvga; -bool_t cache; +enum stdvga_cache_state cache; uint32_t latch; struct page_info *vram_page[64]; /* shadow of 0
Re: [Xen-devel] [PATCH v9] run QEMU as non-root
On Tue, 3 Nov 2015, Ian Campbell wrote: > On Tue, 2015-11-03 at 16:49 +, Ian Campbell wrote: > > On Mon, 2015-11-02 at 12:30 +, Stefano Stabellini wrote: > > > Try to use "xen-qemudepriv-domid$domid" first, then > > > "xen-qemudepriv-shared" and root if everything else fails. > > > > > > The uids need to be manually created by the user or, more likely, by > > > the > > > xen package maintainer. > > > > > > Expose a device_model_user setting in libxl_domain_build_info, so that > > > opinionated callers, such as libvirt, can set any user they like. Do > > > not > > > fall back to root if device_model_user is set. Users can also set > > > device_model_user by hand in the xl domain config file. > > > > > > QEMU is going to setuid and setgid to the user ID and the group ID of > > > the specified user, soon after initialization, before starting to deal > > > with any guest IO. > > > > > > To actually secure QEMU when running in Dom0, we need at least to > > > deprivilege the privcmd and xenstore interfaces, this is just the first > > > step in that direction. > > > > > > Signed-off-by: Stefano Stabellini > > > > Acked-by: Ian Campbell > > There were some minor conflicts against some patches committed at the start > of October. I had fixed them up (I think) but then I noticed > that docs/misc/qemu-deprivilege.txt in my working tree wasn't actually > committed. > > Since this patch refers to it, but didn't include it I checked before > acking that it was already in tree some how, but didn't realise it wasn't > actually committed (somehow, not sure how). Was it supposed to be in this > patch or was it supposed to be in some earlier patch? > > In any case given something odd is clearly going on I don't want to just > commit some random version of that doc which I just found in my working > directory along with this patch. Please can you resubmit with that file > included (or in a precursor patch). Done, see v10 > Also please check the coding style of the comment in libxl.h, the "/*" > should be by itself. Sorry I forgot this change! Feel free to fix it as you commit if that's OK for you. > Thanks, > Ian. > > > > > (based on previous plus eyeballing only the changes from: > > > > > > Changes in v9: > > > - add a device_model_user option to the xl domain config file > > > > Ian. > > > > ___ > > Xen-devel mailing list > > Xen-devel@lists.xen.org > > http://lists.xen.org/xen-devel > ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v10] run QEMU as non-root
Try to use "xen-qemuuser-domid$domid" first, then "xen-qemuuser-shared" and root if everything else fails. The uids need to be manually created by the user or, more likely, by the xen package maintainer. Expose a device_model_user setting in libxl_domain_build_info, so that opinionated callers, such as libvirt, can set any user they like. Do not fall back to root if device_model_user is set. Users can also set device_model_user by hand in the xl domain config file. QEMU is going to setuid and setgid to the user ID and the group ID of the specified user, soon after initialization, before starting to deal with any guest IO. To actually secure QEMU when running in Dom0, we need at least to deprivilege the privcmd and xenstore interfaces, this is just the first step in that direction. Signed-off-by: Stefano Stabellini --- Changes in v10: - rebase - git add docs/misc/qemu-deprivilege.txt - fix commit message to reflect the names chosen (xen-qemudepriv -> xen-qemuuser) Changes in v9: - add a device_model_user option to the xl domain config file Changes in v8: - no need to pass the -runas option if the user requested for root - return ERROR_FAIL from libxl__dm_runas_helper in case of errors - return NULL from libxl__build_device_model_args_new if libxl__dm_runas_helper failed - fix line too long - remove setting errno - replace retry goto loop, with a while loop - const char * as argument to libxl__dm_runas_helper - fix comment Changes in v7: - do not fall back to root if the user explicitly set b_info->device_model_user. Changes in v6: - add device_model_user to libxl_domain_build_info - improve doc - improve wording in commit message Changes in v5: - improve wording in doc - fix wording in warning message - fix example in doc - drop xen-qemudepriv-$domname Changes in v4: - rename qemu-deprivilege to qemu-deprivilege.txt - add a note about qemu-deprivilege.txt to INSTALL - instead of xen-qemudepriv-base + $domid, try xen-qemudepriv-domid$domid - introduce libxl__dm_runas_helper to make the code nicer Changes in v3: - clarify doc - handle errno == ERANGE --- INSTALL|7 + docs/man/xl.cfg.pod.5 |5 +++ docs/misc/qemu-deprivilege.txt | 31 +++ tools/libxl/libxl.h|5 +++ tools/libxl/libxl_dm.c | 67 +++- tools/libxl/libxl_internal.h |5 +++ tools/libxl/libxl_types.idl|1 + tools/libxl/xl_cmdimpl.c |3 ++ 8 files changed, 123 insertions(+), 1 deletion(-) create mode 100644 docs/misc/qemu-deprivilege.txt diff --git a/INSTALL b/INSTALL index 56e2950..b7e426c 100644 --- a/INSTALL +++ b/INSTALL @@ -304,6 +304,13 @@ systemctl enable xendomains.service systemctl enable xen-watchdog.service +QEMU Deprivilege + +It is recommended to run QEMU as non-root. +See docs/misc/qemu-deprivilege.txt for an explanation on what you need +to do at installation time to run QEMU as a dedicated user. + + History of options == diff --git a/docs/man/xl.cfg.pod.5 b/docs/man/xl.cfg.pod.5 index b63846a..2aca8dd 100644 --- a/docs/man/xl.cfg.pod.5 +++ b/docs/man/xl.cfg.pod.5 @@ -1825,6 +1825,11 @@ Pass additional arbitrary options on the device-model command line for an HVM device model only. Each element in the list is passed as an option to the device-model. +=item B + +Run the device model as user "username", instead of +xen-qemudepriv-domid$domid or xen-qemudepriv-shared or root. + =back =head2 Keymaps diff --git a/docs/misc/qemu-deprivilege.txt b/docs/misc/qemu-deprivilege.txt new file mode 100644 index 000..dde74ab --- /dev/null +++ b/docs/misc/qemu-deprivilege.txt @@ -0,0 +1,31 @@ +For security reasons, libxl tries to pass a non-root username to QEMU as +argument. During initialization QEMU calls setuid and setgid with the +user ID and the group ID of the user passed as argument. +Libxl looks for the following users in this order: + +1) a user named "xen-qemuuser-domid$domid", +Where $domid is the domid of the domain being created. +This requires the reservation of 65535 uids from xen-qemuuser-domid1 +to xen-qemuuser-domid65535. To use this mechanism, you might want to +create a large number of users at installation time. For example: + +for ((i=1; i<65536; i++)) +do +adduser --no-create-home --system xen-qemuuser-domid$i +done + +You might want to consider passing --group to adduser to create a new +group for each new user. + + +2) a user named "xen-qemuuser-shared" +As a fall back if both 1) fails, libxl will use a single user for +all QEMU instances. The user is named xen-qemuuser-shared. This is +less secure but still better than running QEMU as root. Using this is as +simple as creating just one more user on your host: + +adduser --no-create-home --system xen-qemuuser-shared + + +3) root +As a last resort, libxl will start QEMU as root. diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h index 168fedd..5edeb30
Re: [Xen-devel] [VOTE] Release cycle scheme
At 13:47 + on 02 Nov (1446472041), Wei Liu wrote: > So I propose we use the following scheme: > > - 6 months release cycle from unstable branch. > - 4 months development. > - 2 months freeze. > - Eat into next cycle if doesn't release on time. > - Fixed cut-off date: the Fridays of the week in which the last day of > March and September falls. > - No more freeze exception, but heads-up mails about freeze will be > sent a few weeks before hand. > - Stable branch maintained for 18 months full support plus 18 months > security support. No mixed maintainership for stable trees. > > Please vote to ack or nack this proposal. This seems like a reasonable plan. Since I'm not actively involved in releases or large feature review, and I don't want to dictate things that really only affect other people, I vote 0. Cheers, Tim. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] Hackathon 2016 Location Preferences
Hi all, I wanted to do quick straw-poll regarding Hackathon Locations for next year. Before I do this though, I wanted to let you know that the 2016 Developer Summit will most likely be in Berlin in October (I am in the process of finalising space, budget and contract details which will need to be approved by the Advisory Board). We do have two options for a Hackathon: China (either Shanghai, Hangzhou or Beijing - details TBC) and Cambridge, UK. We are still in the early planning phase and the budget for the Hackathon has not yet been approved. Do let me know of your preference, and I will see whether I can work with the vendor(s) who are willing to host the 2016 Hackathon and choose a location, which suits a majority of developers. Best Regards Lars ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v11 5/5] xen/arm: account for stolen ticks
Register the runstate_memory_area with the hypervisor. Use pv_time_ops.steal_clock to account for stolen ticks. Signed-off-by: Stefano Stabellini --- Changes in v4: - don't use paravirt_steal_rq_enabled: we do not support retrieving stolen ticks for vcpus other than one we are running on. Changes in v3: - use BUG_ON and smp_processor_id. --- arch/arm/xen/enlighten.c | 21 + 1 file changed, 21 insertions(+) diff --git a/arch/arm/xen/enlighten.c b/arch/arm/xen/enlighten.c index fc7ea52..15621b1 100644 --- a/arch/arm/xen/enlighten.c +++ b/arch/arm/xen/enlighten.c @@ -14,7 +14,10 @@ #include #include #include +#include #include +#include +#include #include #include #include @@ -79,6 +82,19 @@ int xen_unmap_domain_gfn_range(struct vm_area_struct *vma, } EXPORT_SYMBOL_GPL(xen_unmap_domain_gfn_range); +static unsigned long long xen_stolen_accounting(int cpu) +{ + struct vcpu_runstate_info state; + + BUG_ON(cpu != smp_processor_id()); + + xen_get_runstate_snapshot(&state); + + WARN_ON(state.state != RUNSTATE_running); + + return state.time[RUNSTATE_runnable] + state.time[RUNSTATE_offline]; +} + static void xen_percpu_init(void) { struct vcpu_register_vcpu_info info; @@ -104,6 +120,8 @@ static void xen_percpu_init(void) BUG_ON(err); per_cpu(xen_vcpu, cpu) = vcpup; + xen_setup_runstate_info(cpu); + after_register_vcpu_info: enable_percpu_irq(xen_events_irq, 0); put_cpu(); @@ -271,6 +289,9 @@ static int __init xen_guest_init(void) register_cpu_notifier(&xen_cpu_notifier); + pv_time_ops.steal_clock = xen_stolen_accounting; + static_key_slow_inc(¶virt_steal_enabled); + return 0; } early_initcall(xen_guest_init); -- 1.7.10.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v11 2/5] missing include asm/paravirt.h in cputime.c
Add include asm/paravirt.h to cputime.c, as steal_account_process_tick calls paravirt_steal_clock, which is defined in asm/paravirt.h. The ifdef CONFIG_PARAVIRT is necessary because not all archs have an asm/paravirt.h to include. Signed-off-by: Stefano Stabellini CC: mi...@redhat.com CC: pet...@infradead.org --- Changes in v11: - add ifdef CONFIG_PARAVIRT to cputime.c, because not all architectures have an asm/paravirt.h header file to include - drop the removal of ifdef CONFIG_PARAVIRT from kernel/sched/core.c for the same reason --- kernel/sched/cputime.c |3 +++ 1 file changed, 3 insertions(+) diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c index 8cbc3db..c7a27c4 100644 --- a/kernel/sched/cputime.c +++ b/kernel/sched/cputime.c @@ -5,6 +5,9 @@ #include #include #include "sched.h" +#ifdef CONFIG_PARAVIRT +#include +#endif #ifdef CONFIG_IRQ_TIME_ACCOUNTING -- 1.7.10.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v11 3/5] arm: introduce CONFIG_PARAVIRT, PARAVIRT_TIME_ACCOUNTING and pv_time_ops
Introduce CONFIG_PARAVIRT and PARAVIRT_TIME_ACCOUNTING on ARM. The only paravirt interface supported is pv_time_ops.steal_clock, so no runtime pvops patching needed. This allows us to make use of steal_account_process_tick for stolen ticks accounting. Signed-off-by: Stefano Stabellini Acked-by: Christopher Covington Acked-by: Ian Campbell CC: li...@arm.linux.org.uk CC: will.dea...@arm.com CC: n...@linaro.org CC: marc.zyng...@arm.com CC: c...@codeaurora.org CC: a...@arndb.de CC: o...@lixom.net --- Changes in v10: - replace "---help---" with "help" Changes in v7: - ifdef CONFIG_PARAVIRT the content of paravirt.h. Changes in v3: - improve commit description and Kconfig help text; - no need to initialize pv_time_ops; - add PARAVIRT_TIME_ACCOUNTING. --- arch/arm/Kconfig| 20 arch/arm/include/asm/paravirt.h | 20 arch/arm/kernel/Makefile|1 + arch/arm/kernel/paravirt.c | 25 + 4 files changed, 66 insertions(+) create mode 100644 arch/arm/include/asm/paravirt.h create mode 100644 arch/arm/kernel/paravirt.c diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index f1ed110..60be104 100644 --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig @@ -1823,6 +1823,25 @@ config SWIOTLB config IOMMU_HELPER def_bool SWIOTLB +config PARAVIRT + bool "Enable paravirtualization code" + help + This changes the kernel so it can modify itself when it is run + under a hypervisor, potentially improving performance significantly + over full virtualization. + +config PARAVIRT_TIME_ACCOUNTING + bool "Paravirtual steal time accounting" + select PARAVIRT + default n + help + Select this option to enable fine granularity task steal time + accounting. Time spent executing other tasks in parallel with + the current vCPU is discounted from the vCPU power. To account for + that, there can be a small performance impact. + + If in doubt, say N here. + config XEN_DOM0 def_bool y depends on XEN @@ -1836,6 +1855,7 @@ config XEN select ARCH_DMA_ADDR_T_64BIT select ARM_PSCI select SWIOTLB_XEN + select PARAVIRT help Say Y if you want to run Linux in a Virtual Machine on Xen on ARM. diff --git a/arch/arm/include/asm/paravirt.h b/arch/arm/include/asm/paravirt.h new file mode 100644 index 000..8435ff59 --- /dev/null +++ b/arch/arm/include/asm/paravirt.h @@ -0,0 +1,20 @@ +#ifndef _ASM_ARM_PARAVIRT_H +#define _ASM_ARM_PARAVIRT_H + +#ifdef CONFIG_PARAVIRT +struct static_key; +extern struct static_key paravirt_steal_enabled; +extern struct static_key paravirt_steal_rq_enabled; + +struct pv_time_ops { + unsigned long long (*steal_clock)(int cpu); +}; +extern struct pv_time_ops pv_time_ops; + +static inline u64 paravirt_steal_clock(int cpu) +{ + return pv_time_ops.steal_clock(cpu); +} +#endif + +#endif diff --git a/arch/arm/kernel/Makefile b/arch/arm/kernel/Makefile index af9e59b..3e6e937 100644 --- a/arch/arm/kernel/Makefile +++ b/arch/arm/kernel/Makefile @@ -81,6 +81,7 @@ obj-$(CONFIG_VDSO)+= vdso.o ifneq ($(CONFIG_ARCH_EBSA110),y) obj-y+= io.o endif +obj-$(CONFIG_PARAVIRT) += paravirt.o head-y := head$(MMUEXT).o obj-$(CONFIG_DEBUG_LL) += debug.o diff --git a/arch/arm/kernel/paravirt.c b/arch/arm/kernel/paravirt.c new file mode 100644 index 000..53f371e --- /dev/null +++ b/arch/arm/kernel/paravirt.c @@ -0,0 +1,25 @@ +/* + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * Copyright (C) 2013 Citrix Systems + * + * Author: Stefano Stabellini + */ + +#include +#include +#include +#include + +struct static_key paravirt_steal_enabled; +struct static_key paravirt_steal_rq_enabled; + +struct pv_time_ops pv_time_ops; +EXPORT_SYMBOL_GPL(pv_time_ops); -- 1.7.10.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Getting the XSAVE size from userspace
On 11/05/2015 04:05 PM, Andrew Cooper wrote: > On 05/11/15 12:26, Razvan Cojocaru wrote: >> On 11/05/2015 01:44 PM, Andrew Cooper wrote: >>> On 05/11/15 11:35, Andrei LUTAS wrote: The use-case is the following: whenever an EPT violation is triggered inside a monitored VM, the introspection logic needs to know how many bytes were accessed (read/written). This is done by inspecting the faulting instruction and directly inferring the size, which is not straight-forward for XSAVE/XRSTOR family. Using the maximum possible size is wrong, as in any given moment the OS may or may not desire to XSAVE/XRSTOR the entire state (and thinking that the instruction tries to access more than it actually does may yield undesired effects). Therefore, the size needed for the currently enabled features of the monitored guest is required instead. Normally, it could be done by running CPUID with eax = 0xD and ecx = i, where i >= 2 and XCR0[i] is 1 (XCR0 belongs to the monitored guest), but I am unsure if using CPUID this way would be safe/desired: will Xen expose the same CPUID features, for XSAVE related functionality, on all VMs? (using XCPUID with eax = 0xD and ecx = 0 would give us the needed size for the SVA, and like I said, using the maximum size would not be safe, even if it's the same across all VMs on a given host). Also, I'm unsure how this would get along with migration... >>> Hmm yes - there is no way to do this currently. >>> >>> Xen's CPUID handling for xsave related things is broken in levelling and >>> migration scenarios, which is why it is *still* disabled by default in >>> XenServer. >>> >>> I am working on fixing it, and will take this usecase into account >>> (although I think I had already included enough for this usecase to work). >>> >>> At the point of the xsave/xrestor trap, you need to know xcr0 and be >>> able to perfom a cpuid instruction in the context of a target domain, to >>> make use of 0xD[0].ebx to get the "current size based on xcr0". >> So then the closest thing to what we need would be to add a size field >> to struct hvm_hw_cpu_xsave, and just assign the size variable to it in >> hvm_save_cpu_xsave_states (migration aside)? >> >> 2130 static int hvm_save_cpu_xsave_states(struct domain *d, >> hvm_domain_context_t *h) >> 2131 { >> 2132 struct vcpu *v; >> 2133 struct hvm_hw_cpu_xsave *ctxt; >> 2134 >> 2135 if ( !cpu_has_xsave ) >> 2136 return 0; /* do nothing */ >> 2137 >> 2138 for_each_vcpu ( d, v ) >> 2139 { >> 2140 unsigned int size = HVM_CPU_XSAVE_SIZE(v->arch.xcr0_accum); >> 2141 >> 2142 if ( !xsave_enabled(v) ) >> 2143 continue; >> 2144 if ( _hvm_init_entry(h, CPU_XSAVE_CODE, v->vcpu_id, size) ) >> 2145 return 1; >> 2146 ctxt = (struct hvm_hw_cpu_xsave *)&h->data[h->cur]; >> 2147 h->cur += size; >> 2148 >> 2149 ctxt->xfeature_mask = xfeature_mask; >> 2150 ctxt->xcr0 = v->arch.xcr0; >> 2151 ctxt->xcr0_accum = v->arch.xcr0_accum; >> 2152 memcpy(&ctxt->save_area, v->arch.xsave_area, >> 2153size - offsetof(struct hvm_hw_cpu_xsave, save_area)); >> 2154 } >> 2155 >> 2156 return 0; >> 2157 } > > I don't see any difference between this pasted code and the current > hvm_save_cpu_xsave_states(). What have you changed? I haven't changed anything, I was just pointing out what code I'm referring to (which size variable I'm talking about), sorry for not being as clear as possible. > You can't use this size value, and it is the accumulated xcr0 over the > life of the VM, not the xcr0 in use at the time of the intercepted > instruction. OK. > You also can't blindly modify the ctxt structure, or you will break > migration. Well, yes, not blindly, that assumes that something like a patch for mainline is agreed upon, or that migration is disabled for guests that need this, and so on. > The xcr0 -> size mapping is static, and won't change going forwards. > Your best bet is just to query each one and stash all the results. OK. Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v11 4/5] arm64: introduce CONFIG_PARAVIRT, PARAVIRT_TIME_ACCOUNTING and pv_time_ops
Introduce CONFIG_PARAVIRT and PARAVIRT_TIME_ACCOUNTING on ARM64. Necessary duplication of paravirt.h and paravirt.c with ARM. The only paravirt interface supported is pv_time_ops.steal_clock, so no runtime pvops patching needed. This allows us to make use of steal_account_process_tick for stolen ticks accounting. Signed-off-by: Stefano Stabellini Acked-by: Marc Zyngier CC: will.dea...@arm.com CC: n...@linaro.org CC: marc.zyng...@arm.com CC: c...@codeaurora.org CC: a...@arndb.de CC: o...@lixom.net CC: catalin.mari...@arm.com --- Changes in v10: - replace "---help---" with "help" Changes in v7: - ifdef CONFIG_PARAVIRT the content of paravirt.h. --- arch/arm64/Kconfig| 20 arch/arm64/include/asm/paravirt.h | 20 arch/arm64/kernel/Makefile|1 + arch/arm64/kernel/paravirt.c | 25 + 4 files changed, 66 insertions(+) create mode 100644 arch/arm64/include/asm/paravirt.h create mode 100644 arch/arm64/kernel/paravirt.c diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index 7b10647..659e286 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -533,6 +533,25 @@ config SECCOMP and the task is only allowed to execute a few safe syscalls defined by each seccomp mode. +config PARAVIRT + bool "Enable paravirtualization code" + help + This changes the kernel so it can modify itself when it is run + under a hypervisor, potentially improving performance significantly + over full virtualization. + +config PARAVIRT_TIME_ACCOUNTING + bool "Paravirtual steal time accounting" + select PARAVIRT + default n + help + Select this option to enable fine granularity task steal time + accounting. Time spent executing other tasks in parallel with + the current vCPU is discounted from the vCPU power. To account for + that, there can be a small performance impact. + + If in doubt, say N here. + config XEN_DOM0 def_bool y depends on XEN @@ -541,6 +560,7 @@ config XEN bool "Xen guest support on ARM64" depends on ARM64 && OF select SWIOTLB_XEN + select PARAVIRT help Say Y if you want to run Linux in a Virtual Machine on Xen on ARM64. diff --git a/arch/arm64/include/asm/paravirt.h b/arch/arm64/include/asm/paravirt.h new file mode 100644 index 000..fd5f428 --- /dev/null +++ b/arch/arm64/include/asm/paravirt.h @@ -0,0 +1,20 @@ +#ifndef _ASM_ARM64_PARAVIRT_H +#define _ASM_ARM64_PARAVIRT_H + +#ifdef CONFIG_PARAVIRT +struct static_key; +extern struct static_key paravirt_steal_enabled; +extern struct static_key paravirt_steal_rq_enabled; + +struct pv_time_ops { + unsigned long long (*steal_clock)(int cpu); +}; +extern struct pv_time_ops pv_time_ops; + +static inline u64 paravirt_steal_clock(int cpu) +{ + return pv_time_ops.steal_clock(cpu); +} +#endif + +#endif diff --git a/arch/arm64/kernel/Makefile b/arch/arm64/kernel/Makefile index 474691f..ca9fbe1 100644 --- a/arch/arm64/kernel/Makefile +++ b/arch/arm64/kernel/Makefile @@ -41,6 +41,7 @@ arm64-obj-$(CONFIG_EFI) += efi.o efi-entry.stub.o arm64-obj-$(CONFIG_PCI)+= pci.o arm64-obj-$(CONFIG_ARMV8_DEPRECATED) += armv8_deprecated.o arm64-obj-$(CONFIG_ACPI) += acpi.o +arm64-obj-$(CONFIG_PARAVIRT) += paravirt.o obj-y += $(arm64-obj-y) vdso/ obj-m += $(arm64-obj-m) diff --git a/arch/arm64/kernel/paravirt.c b/arch/arm64/kernel/paravirt.c new file mode 100644 index 000..53f371e --- /dev/null +++ b/arch/arm64/kernel/paravirt.c @@ -0,0 +1,25 @@ +/* + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * Copyright (C) 2013 Citrix Systems + * + * Author: Stefano Stabellini + */ + +#include +#include +#include +#include + +struct static_key paravirt_steal_enabled; +struct static_key paravirt_steal_rq_enabled; + +struct pv_time_ops pv_time_ops; +EXPORT_SYMBOL_GPL(pv_time_ops); -- 1.7.10.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v11 0/5] xen/arm/arm64: CONFIG_PARAVIRT and stolen ticks accounting
Hi all, I dusted off this series from Jan 2014. Patch #2 and #3 still need an ack. This patch series introduces stolen ticks accounting for Xen on ARM and ARM64. Stolen ticks are clocksource ticks that have been "stolen" from the cpu, typically because Linux is running in a virtual machine and the vcpu has been descheduled. To account for these ticks we introduce CONFIG_PARAVIRT and pv_time_ops so that we can make use of: kernel/sched/cputime.c:steal_account_process_tick Changes in v11: - add ifdef CONFIG_PARAVIRT to kernel/sched/cputime.c, because not all architectures have an asm/paravirt.h header file to include - drop the removal of ifdef CONFIG_PARAVIRT from kernel/sched/core.c for the same reason Stefano Stabellini (5): xen: move xen_setup_runstate_info and get_runstate_snapshot to drivers/xen/time.c missing include asm/paravirt.h in cputime.c arm: introduce CONFIG_PARAVIRT, PARAVIRT_TIME_ACCOUNTING and pv_time_ops arm64: introduce CONFIG_PARAVIRT, PARAVIRT_TIME_ACCOUNTING and pv_time_ops xen/arm: account for stolen ticks arch/arm/Kconfig | 20 arch/arm/include/asm/paravirt.h | 20 arch/arm/kernel/Makefile |1 + arch/arm/kernel/paravirt.c| 25 ++ arch/arm/xen/enlighten.c | 21 + arch/arm64/Kconfig| 20 arch/arm64/include/asm/paravirt.h | 20 arch/arm64/kernel/Makefile|1 + arch/arm64/kernel/paravirt.c | 25 ++ arch/x86/xen/time.c | 76 +-- drivers/xen/Makefile |2 +- drivers/xen/time.c| 91 + include/xen/xen-ops.h |5 ++ kernel/sched/cputime.c|3 ++ 14 files changed, 254 insertions(+), 76 deletions(-) create mode 100644 arch/arm/include/asm/paravirt.h create mode 100644 arch/arm/kernel/paravirt.c create mode 100644 arch/arm64/include/asm/paravirt.h create mode 100644 arch/arm64/kernel/paravirt.c create mode 100644 drivers/xen/time.c Cheers, Stefano ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v11 1/5] xen: move xen_setup_runstate_info and get_runstate_snapshot to drivers/xen/time.c
Signed-off-by: Stefano Stabellini Acked-by: Ian Campbell Reviewed-by: Konrad Rzeszutek Wilk CC: konrad.w...@oracle.com --- Changes in v10: - rebase --- arch/x86/xen/time.c | 76 + drivers/xen/Makefile |2 +- drivers/xen/time.c| 91 + include/xen/xen-ops.h |5 +++ 4 files changed, 98 insertions(+), 76 deletions(-) create mode 100644 drivers/xen/time.c diff --git a/arch/x86/xen/time.c b/arch/x86/xen/time.c index f1ba6a0..041d4cd 100644 --- a/arch/x86/xen/time.c +++ b/arch/x86/xen/time.c @@ -32,86 +32,12 @@ #define TIMER_SLOP 10 #define NS_PER_TICK(10LL / HZ) -/* runstate info updated by Xen */ -static DEFINE_PER_CPU(struct vcpu_runstate_info, xen_runstate); - /* snapshots of runstate info */ static DEFINE_PER_CPU(struct vcpu_runstate_info, xen_runstate_snapshot); /* unused ns of stolen time */ static DEFINE_PER_CPU(u64, xen_residual_stolen); -/* return an consistent snapshot of 64-bit time/counter value */ -static u64 get64(const u64 *p) -{ - u64 ret; - - if (BITS_PER_LONG < 64) { - u32 *p32 = (u32 *)p; - u32 h, l; - - /* -* Read high then low, and then make sure high is -* still the same; this will only loop if low wraps -* and carries into high. -* XXX some clean way to make this endian-proof? -*/ - do { - h = p32[1]; - barrier(); - l = p32[0]; - barrier(); - } while (p32[1] != h); - - ret = (((u64)h) << 32) | l; - } else - ret = *p; - - return ret; -} - -/* - * Runstate accounting - */ -static void get_runstate_snapshot(struct vcpu_runstate_info *res) -{ - u64 state_time; - struct vcpu_runstate_info *state; - - BUG_ON(preemptible()); - - state = this_cpu_ptr(&xen_runstate); - - /* -* The runstate info is always updated by the hypervisor on -* the current CPU, so there's no need to use anything -* stronger than a compiler barrier when fetching it. -*/ - do { - state_time = get64(&state->state_entry_time); - barrier(); - *res = *state; - barrier(); - } while (get64(&state->state_entry_time) != state_time); -} - -/* return true when a vcpu could run but has no real cpu to run on */ -bool xen_vcpu_stolen(int vcpu) -{ - return per_cpu(xen_runstate, vcpu).state == RUNSTATE_runnable; -} - -void xen_setup_runstate_info(int cpu) -{ - struct vcpu_register_runstate_memory_area area; - - area.addr.v = &per_cpu(xen_runstate, cpu); - - if (HYPERVISOR_vcpu_op(VCPUOP_register_runstate_memory_area, - cpu, &area)) - BUG(); -} - static void do_stolen_accounting(void) { struct vcpu_runstate_info state; @@ -119,7 +45,7 @@ static void do_stolen_accounting(void) s64 runnable, offline, stolen; cputime_t ticks; - get_runstate_snapshot(&state); + xen_get_runstate_snapshot(&state); WARN_ON(state.state != RUNSTATE_running); diff --git a/drivers/xen/Makefile b/drivers/xen/Makefile index aa8a7f7..9b7a35c 100644 --- a/drivers/xen/Makefile +++ b/drivers/xen/Makefile @@ -1,6 +1,6 @@ obj-$(CONFIG_HOTPLUG_CPU) += cpu_hotplug.o obj-$(CONFIG_X86) += fallback.o -obj-y += grant-table.o features.o balloon.o manage.o preempt.o +obj-y += grant-table.o features.o balloon.o manage.o preempt.o time.o obj-y += events/ obj-y += xenbus/ diff --git a/drivers/xen/time.c b/drivers/xen/time.c new file mode 100644 index 000..433fe24 --- /dev/null +++ b/drivers/xen/time.c @@ -0,0 +1,91 @@ +/* + * Xen stolen ticks accounting. + */ +#include +#include +#include +#include + +#include +#include + +#include +#include +#include +#include +#include + +/* runstate info updated by Xen */ +static DEFINE_PER_CPU(struct vcpu_runstate_info, xen_runstate); + +/* return an consistent snapshot of 64-bit time/counter value */ +static u64 get64(const u64 *p) +{ + u64 ret; + + if (BITS_PER_LONG < 64) { + u32 *p32 = (u32 *)p; + u32 h, l; + + /* +* Read high then low, and then make sure high is +* still the same; this will only loop if low wraps +* and carries into high. +* XXX some clean way to make this endian-proof? +*/ + do { + h = p32[1]; + barrier(); + l = p32[0]; + barrier(); + } while (p32[1] != h); + + ret = (((u64)h) << 32) | l; + } else + ret = *p; + +
Re: [Xen-devel] [PATCH 1/2] rwlock: add per-cpu reader-writer locks
On 05/11/15 13:48, Marcos E. Matsunaga wrote: > Hi Malcolm, > > I tried your patches against staging yesterday and as soon as I started a > guest, it panic. I have > lock_profile enabled and applied your patches against: I tested with a non debug version of Xen (because I was analysing the performance of Xen) and thus those ASSERTS were never run. The ASSERTS can be safely removed, the rwlock behaviour is slightly different in that it's possible for a writer to hold the write lock whilst a reader is progressing through the read critical section, this is safe because the writer is waiting for the percpu variables to clear before actually progressing through it's own critical section. I have an updated version of the patch series which fixes this. Do you want me to post it or are you happy to remove the ASSERTS yourself ( or switch to non-debug build of Xen) Sorry for not catching this before it hit the list. Malcolm > > 6f04de658574833688c3f9eab310e7834d56a9c0 x86: cleanup of early cpuid handling > > > > (XEN) HVM1 save: CPU > (XEN) HVM1 save: PIC > (XEN) HVM1 save: IOAPIC > (XEN) HVM1 save: LAPIC > (XEN) HVM1 save: LAPIC_REGS > (XEN) HVM1 save: PCI_IRQ > (XEN) HVM1 save: ISA_IRQ > (XEN) HVM1 save: PCI_LINK > (XEN) HVM1 save: PIT > (XEN) HVM1 save: RTC > (XEN) HVM1 save: HPET > (XEN) HVM1 save: PMTIMER > (XEN) HVM1 save: MTRR > (XEN) HVM1 save: VIRIDIAN_DOMAIN > (XEN) HVM1 save: CPU_XSAVE > (XEN) HVM1 save: VIRIDIAN_VCPU > (XEN) HVM1 save: VMCE_VCPU > (XEN) HVM1 save: TSC_ADJUST > (XEN) HVM1 restore: CPU 0 > [ 394.163143] loop: module loaded > (XEN) Assertion 'rw_is_locked(&t->lock)' failed at grant_table.c:215 > (XEN) [ Xen-4.7-unstable x86_64 debug=y Tainted:C ] > (XEN) CPU:0 > (XEN) RIP:e008:[] do_grant_table_op+0x63f/0x2e04 > (XEN) RFLAGS: 00010246 CONTEXT: hypervisor (d0v0) > (XEN) rax: rbx: 83400f9dc9e0 rcx: > (XEN) rdx: 0001 rsi: 82d080342b10 rdi: 83400819b784 > (XEN) rbp: 8300774ffef8 rsp: 8300774ffdf8 r8: 0002 > (XEN) r9: 0002 r10: 0002 r11: > (XEN) r12: r13: r14: 83400819b780 > (XEN) r15: 83400f9d cr0: 80050033 cr4: 001526e0 > (XEN) cr3: 01007f613000 cr2: 8800746182b8 > (XEN) ds: es: fs: gs: ss: e010 cs: e008 > (XEN) Xen stack trace from rsp=8300774ffdf8: > (XEN)8300774ffe08 82d0 8300774ffef8 82d08017fc9b > (XEN)82d080342b28 83400f9d8600 82d080342b10 > (XEN)83400f9dca20 8321 834008188000 0001 > (XEN)0001772ee000 8801e98d03e0 8300774ffe88 > (XEN) 8300774fff18 0021d0269c10 0001001a > (XEN)0001 0246 7ff7de45a407 > (XEN)0100 7ff7de45a407 0033 8300772ee000 > (XEN)8801eb0e3c00 880004bf57e8 8801e98d03e0 8801eb0a5938 > (XEN)7cff88b000c7 82d08023d952 8100128a 0014 > (XEN) 0001 8801f6e18388 81d3d740 > (XEN)8801efb7bd40 88000542e780 0282 > (XEN)8801e98d03a0 8801efe07000 0014 8100128a > (XEN)0001 8801e98d03e0 00010100 > (XEN)8100128a e033 0282 8801efb7bce0 > (XEN)e02b > (XEN) 8300772ee000 > (XEN) > (XEN) Xen call trace: > (XEN)[] do_grant_table_op+0x63f/0x2e04 > (XEN)[] lstar_enter+0xe2/0x13c > (XEN) > (XEN) > (XEN) > (XEN) Panic on CPU 0: > (XEN) Assertion 'rw_is_locked(&t->lock)' failed at grant_table.c:215 > (XEN) > (XEN) > (XEN) Manual reset required ('noreboot' specified) > > > Thanks for your help. > > On 11/03/2015 12:58 PM, Malcolm Crossley wrote: >> Per-cpu read-write locks allow for the fast path read case to have low >> overhead >> by only setting/clearing a per-cpu variable for using the read lock. >> The per-cpu read fast path also avoids locked compare swap operations which >> can >> be particularly slow on coherent multi-socket systems, particularly if there >> is >> heavy usage of the read lock itself. >> >> The per-cpu reader-writer lock uses a global variable to control the read >> lock >> fast path. This allows a writer to disable the fast path and ensure the >> readers >> use the underlying read-write lock implementation. >> >> Once the writer has taken the write lock and disabled the fast path, it must >> poll the per-cpu variable for all CPU's which have ente
Re: [Xen-devel] [xen-unstable test] 63540: regressions - FAIL
On Thu, 2015-11-05 at 03:49 -0700, Jan Beulich wrote: > > > > On 05.11.15 at 04:01, wrote: > > flight 63540 xen-unstable real [real] > > http://logs.test-lab.xenproject.org/osstest/logs/63540/ > > > > Regressions :-( > > > > Tests which did not succeed and are blocking, > > including tests which could not be run: > > test-amd64-amd64-xl-qemut-winxpsp3 6 xen-bootfail > > REGR. vs. 63475 > > Hmm, did there something go wrong during install? The first boot > after install appears to be a kernel booted natively, and then > nothing else. This is commonly a sign that the host has forgotten its boot order, the merlot machines have some form for this (merlot0 is currently out of rotation for this reason). Ian, do we need to look into merlot1 again too? BTW the osstest-admin@ alias goes to real people (Ian and myself) so it is useful to keep it in the reply in cases like this. Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v7 16/32] xen/x86: allow disabling the pmtimer
On 04/11/15 16:17, Jan Beulich wrote: On 04.11.15 at 17:05, wrote: >> El 03/11/15 a les 13.41, Jan Beulich ha escrit: >> On 03.11.15 at 11:57, wrote: On 03/11/15 07:21, Jan Beulich wrote: On 30.10.15 at 16:36, wrote: >> On 30/10/15 13:16, Jan Beulich wrote: >> On 30.10.15 at 13:50, wrote: El 14/10/15 a les 16.37, Jan Beulich ha escrit: On 02.10.15 at 17:48, wrote: >> Signed-off-by: Roger Pau Monné >> Cc: Jan Beulich >> Cc: Andrew Cooper >> --- >> Changes since v6: >> - Return ENODEV in pmtimer_load if the timer is disabled. >> - hvm_acpi_power_button and hvm_acpi_sleep_button become noops if >> the >>pmtimer is disabled. > But how are those two features connected? I don't think you can > assume absence of a PM block just because there's no PM timer. > Or if you want to tie them together for now, the predicate needs > to be renamed. > >> - Return ENODEV if pmtimer_change_ioport is called with the pmtimer >>disabled. > Same here. What about changing XEN_X86_EMU_PMTIMER into XEN_X86_EMU_PM and this flags disables all PM stuff? >>> Ah, right, that's a reasonable option. >> It still might be a nice idea to split them in two, given future work. >> >> To support hotplug properly (cpu, ram and pci), Xen needs to inject >> GPEs, which comes from part of the PM infrastructure. To support PCI >> devices in the future without the whole PM infrastructure, it would be >> nice to keep the split. > Coming back to this - I'm not sure: The hotplug aspect as you > mention it should matter for Dom0 only. DomU could (and perhaps > should) use a PV interface instead. I disagree. All PVH guests should use the same mechanism; making a split between dom0 and domU will only make our lives harder. Where reasonable, we should follow what happens on native; one of the underlying points of PVH is to have less of an impact on the guest side. In some cases it is indeed nasty, but has the advantage of being well understood. >>> What meaning would ACPI have to a PVH DomU? >>> > So I'd like to suggest quite the opposite: Don't call the thing PM, > but make it more general and call it ACPI. And instead of > separating HPET, we might have this fall under ACPI as well, or > we might have a second TIMER flag, requiring both to be set > for there to be a HPET and PMTMR. This leaves open the option > of Dom0 getting ACPI enabled (despite this then being "real", > not emulated ACPI), but TIMER left off. An HPET can exist independently of other features such as ACPI. It should have its own option. >>> Without ACPI there's no defined way to discover it. Doing what >>> Linux does - applying chipset knowledge - won't work on PVH either, >>> because there's no emulated chipset. Which would leave scanning >>> physical memory, but if there is none, none can be found. >>> +1 to having an ACPI option, but as indicated above, I expect it to be used in the longterm even for domU. >>> Again - why and how? >> I think that at this point in the design it's not so important to have >> all the XEN_X86_EMU_* properly defined. This is not a public interface, >> so we can expand/reduce them whenever we want. Would it be fine, for the >> time being to just have a XEN_X86_EMU_PM and control both the PM and the >> PMTMR? > I think so, yes. Also +1 for now. ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 1/2] rwlock: add per-cpu reader-writer locks
Hi Malcolm, If you can post the updated patches, that would be great. I think it would be better for me to test with your update. Thanks again. On 11/05/2015 10:20 AM, Malcolm Crossley wrote: On 05/11/15 13:48, Marcos E. Matsunaga wrote: Hi Malcolm, I tried your patches against staging yesterday and as soon as I started a guest, it panic. I have lock_profile enabled and applied your patches against: I tested with a non debug version of Xen (because I was analysing the performance of Xen) and thus those ASSERTS were never run. The ASSERTS can be safely removed, the rwlock behaviour is slightly different in that it's possible for a writer to hold the write lock whilst a reader is progressing through the read critical section, this is safe because the writer is waiting for the percpu variables to clear before actually progressing through it's own critical section. I have an updated version of the patch series which fixes this. Do you want me to post it or are you happy to remove the ASSERTS yourself ( or switch to non-debug build of Xen) Sorry for not catching this before it hit the list. Malcolm 6f04de658574833688c3f9eab310e7834d56a9c0 x86: cleanup of early cpuid handling (XEN) HVM1 save: CPU (XEN) HVM1 save: PIC (XEN) HVM1 save: IOAPIC (XEN) HVM1 save: LAPIC (XEN) HVM1 save: LAPIC_REGS (XEN) HVM1 save: PCI_IRQ (XEN) HVM1 save: ISA_IRQ (XEN) HVM1 save: PCI_LINK (XEN) HVM1 save: PIT (XEN) HVM1 save: RTC (XEN) HVM1 save: HPET (XEN) HVM1 save: PMTIMER (XEN) HVM1 save: MTRR (XEN) HVM1 save: VIRIDIAN_DOMAIN (XEN) HVM1 save: CPU_XSAVE (XEN) HVM1 save: VIRIDIAN_VCPU (XEN) HVM1 save: VMCE_VCPU (XEN) HVM1 save: TSC_ADJUST (XEN) HVM1 restore: CPU 0 [ 394.163143] loop: module loaded (XEN) Assertion 'rw_is_locked(&t->lock)' failed at grant_table.c:215 (XEN) [ Xen-4.7-unstable x86_64 debug=y Tainted:C ] (XEN) CPU:0 (XEN) RIP:e008:[] do_grant_table_op+0x63f/0x2e04 (XEN) RFLAGS: 00010246 CONTEXT: hypervisor (d0v0) (XEN) rax: rbx: 83400f9dc9e0 rcx: (XEN) rdx: 0001 rsi: 82d080342b10 rdi: 83400819b784 (XEN) rbp: 8300774ffef8 rsp: 8300774ffdf8 r8: 0002 (XEN) r9: 0002 r10: 0002 r11: (XEN) r12: r13: r14: 83400819b780 (XEN) r15: 83400f9d cr0: 80050033 cr4: 001526e0 (XEN) cr3: 01007f613000 cr2: 8800746182b8 (XEN) ds: es: fs: gs: ss: e010 cs: e008 (XEN) Xen stack trace from rsp=8300774ffdf8: (XEN)8300774ffe08 82d0 8300774ffef8 82d08017fc9b (XEN)82d080342b28 83400f9d8600 82d080342b10 (XEN)83400f9dca20 8321 834008188000 0001 (XEN)0001772ee000 8801e98d03e0 8300774ffe88 (XEN) 8300774fff18 0021d0269c10 0001001a (XEN)0001 0246 7ff7de45a407 (XEN)0100 7ff7de45a407 0033 8300772ee000 (XEN)8801eb0e3c00 880004bf57e8 8801e98d03e0 8801eb0a5938 (XEN)7cff88b000c7 82d08023d952 8100128a 0014 (XEN) 0001 8801f6e18388 81d3d740 (XEN)8801efb7bd40 88000542e780 0282 (XEN)8801e98d03a0 8801efe07000 0014 8100128a (XEN)0001 8801e98d03e0 00010100 (XEN)8100128a e033 0282 8801efb7bce0 (XEN)e02b (XEN) 8300772ee000 (XEN) (XEN) Xen call trace: (XEN)[] do_grant_table_op+0x63f/0x2e04 (XEN)[] lstar_enter+0xe2/0x13c (XEN) (XEN) (XEN) (XEN) Panic on CPU 0: (XEN) Assertion 'rw_is_locked(&t->lock)' failed at grant_table.c:215 (XEN) (XEN) (XEN) Manual reset required ('noreboot' specified) Thanks for your help. On 11/03/2015 12:58 PM, Malcolm Crossley wrote: Per-cpu read-write locks allow for the fast path read case to have low overhead by only setting/clearing a per-cpu variable for using the read lock. The per-cpu read fast path also avoids locked compare swap operations which can be particularly slow on coherent multi-socket systems, particularly if there is heavy usage of the read lock itself. The per-cpu reader-writer lock uses a global variable to control the read lock fast path. This allows a writer to disable the fast path and ensure the readers use the underlying read-write lock implementation. Once the writer has taken the write lock and disabled the fast path, it must poll the per-cpu variable for all CPU's which have entered the
Re: [Xen-devel] Getting the XSAVE size from userspace
On 05/11/15 12:26, Razvan Cojocaru wrote: > On 11/05/2015 01:44 PM, Andrew Cooper wrote: >> On 05/11/15 11:35, Andrei LUTAS wrote: >>> The use-case is the following: whenever an EPT violation is triggered >>> inside a monitored VM, the introspection logic needs to know how many >>> bytes were accessed (read/written). This is done by inspecting the >>> faulting instruction and directly inferring the size, which is not >>> straight-forward for XSAVE/XRSTOR family. Using the maximum possible >>> size is wrong, as in any given moment the OS may or may not desire to >>> XSAVE/XRSTOR the entire state (and thinking that the instruction tries >>> to access more than it actually does may yield undesired effects). >>> Therefore, the size needed for the currently enabled features of the >>> monitored guest is required instead. Normally, it could be done by >>> running CPUID with eax = 0xD and ecx = i, where i >= 2 and XCR0[i] is >>> 1 (XCR0 belongs to the monitored guest), but I am unsure if using >>> CPUID this way would be safe/desired: will Xen expose the same CPUID >>> features, for XSAVE related functionality, on all VMs? (using XCPUID >>> with eax = 0xD and ecx = 0 would give us the needed size for the SVA, >>> and like I said, using the maximum size would not be safe, even if >>> it's the same across all VMs on a given host). Also, I'm unsure how >>> this would get along with migration... >> Hmm yes - there is no way to do this currently. >> >> Xen's CPUID handling for xsave related things is broken in levelling and >> migration scenarios, which is why it is *still* disabled by default in >> XenServer. >> >> I am working on fixing it, and will take this usecase into account >> (although I think I had already included enough for this usecase to work). >> >> At the point of the xsave/xrestor trap, you need to know xcr0 and be >> able to perfom a cpuid instruction in the context of a target domain, to >> make use of 0xD[0].ebx to get the "current size based on xcr0". > So then the closest thing to what we need would be to add a size field > to struct hvm_hw_cpu_xsave, and just assign the size variable to it in > hvm_save_cpu_xsave_states (migration aside)? > > 2130 static int hvm_save_cpu_xsave_states(struct domain *d, > hvm_domain_context_t *h) > 2131 { > 2132 struct vcpu *v; > 2133 struct hvm_hw_cpu_xsave *ctxt; > 2134 > 2135 if ( !cpu_has_xsave ) > 2136 return 0; /* do nothing */ > 2137 > 2138 for_each_vcpu ( d, v ) > 2139 { > 2140 unsigned int size = HVM_CPU_XSAVE_SIZE(v->arch.xcr0_accum); > 2141 > 2142 if ( !xsave_enabled(v) ) > 2143 continue; > 2144 if ( _hvm_init_entry(h, CPU_XSAVE_CODE, v->vcpu_id, size) ) > 2145 return 1; > 2146 ctxt = (struct hvm_hw_cpu_xsave *)&h->data[h->cur]; > 2147 h->cur += size; > 2148 > 2149 ctxt->xfeature_mask = xfeature_mask; > 2150 ctxt->xcr0 = v->arch.xcr0; > 2151 ctxt->xcr0_accum = v->arch.xcr0_accum; > 2152 memcpy(&ctxt->save_area, v->arch.xsave_area, > 2153size - offsetof(struct hvm_hw_cpu_xsave, save_area)); > 2154 } > 2155 > 2156 return 0; > 2157 } I don't see any difference between this pasted code and the current hvm_save_cpu_xsave_states(). What have you changed? You can't use this size value, and it is the accumulated xcr0 over the life of the VM, not the xcr0 in use at the time of the intercepted instruction. You also can't blindly modify the ctxt structure, or you will break migration. The xcr0 -> size mapping is static, and won't change going forwards. Your best bet is just to query each one and stash all the results. ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] ovmf fail to compile
> -Original Message- > From: Wei Liu [mailto:wei.l...@citrix.com] > Sent: Wednesday, November 4, 2015 6:19 PM > To: Hao, Xudong > Cc: Wei Liu ; xen-devel@lists.xen.org > Subject: Re: [Xen-devel] ovmf fail to compile > > On Wed, Nov 04, 2015 at 08:27:56AM +, Hao, Xudong wrote: > > "git clean -fdx" doesn't change the error result with gcc 4.4.7. Gcc > > version is > "gcc-4.4.?" in Debian Jessie of yours? > > > > Debian Jessie's gcc-4.4 has the same version 4.4.7. > > As the other sub-thread suggests, can you try passing more f's to git? > > A somewhat related question, are you only interested in xen-unstable branch? > Have you tried latest OVMF from upstream? If that builds for you I can easily > send another patch to update Config.mk again. > I'm busy on other urgent today, will try the two above tomorrow and share the result later. -Xudong ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH V8 3/7] libxl: add pvusb API
On Wed, Nov 4, 2015 at 6:31 AM, Chun Yan Liu wrote: > Ian & George, any comments? Hey Chunyan, I did actually spend a chunk of time looking at this last week. Looking at the diff-of-diffs, it looks like you've addressed everything I asked you to address. I still want to take a longer look at it before giving it a reviewed-by. Unfortunately this will have to wait until next week. One thing that came up though in an offline discussion between IanJ and I was that we would like you to actually address the DEFINE_DEVICE_REMOVE_EXT code duplication issue before this is checked in. Let me know if you understand the request clearly; I'd be willing to send you a patch you can fold in if that would be helpful. IanJ said he has some more comments on the AO stuff as well. -George > On 10/21/2015 at 05:08 PM, in message > <1445418510-19614-4-git-send-email-cy...@suse.com>, Chunyan Liu > wrote: >> Add pvusb APIs, including: >> - attach/detach (create/destroy) virtual usb controller. >> - attach/detach usb device >> - list usb controller and usb devices >> - some other helper functions >> >> Signed-off-by: Chunyan Liu >> Signed-off-by: Simon Cao >> >> --- >> changes: >> - update COMPARE_USB to compare ctrl and port >> - add check in usb_add/remove to disable non-Dom0 backend so that >> not worring about codes which are effective on Dom0 but not >> compatible on non-Dom0 backend. >> - define READ_SUBPATH macro within functions >> - do not initialize rc but give it value in each return case >> - libxl__strdup gc or NOGC update, internal function using gc, >> external using NOGC. >> - address other comments from George and Ian J. >> >> tools/libxl/Makefile |2 +- >> tools/libxl/libxl.c | 53 ++ >> tools/libxl/libxl.h | 74 ++ >> tools/libxl/libxl_device.c |5 +- >> tools/libxl/libxl_internal.h | 18 + >> tools/libxl/libxl_osdeps.h | 13 + >> tools/libxl/libxl_pvusb.c| 1451 >> ++ >> tools/libxl/libxl_types.idl | 57 ++ >> tools/libxl/libxl_types_internal.idl |1 + >> tools/libxl/libxl_utils.c| 16 + >> tools/libxl/libxl_utils.h|5 + >> 11 files changed, 1693 insertions(+), 2 deletions(-) >> create mode 100644 tools/libxl/libxl_pvusb.c >> >> diff --git a/tools/libxl/Makefile b/tools/libxl/Makefile >> index c5ecec1..ef9ccd3 100644 >> --- a/tools/libxl/Makefile >> +++ b/tools/libxl/Makefile >> @@ -103,7 +103,7 @@ LIBXL_OBJS = flexarray.o libxl.o libxl_create.o >> libxl_dm.o libxl_pci.o \ >> libxl_stream_read.o libxl_stream_write.o \ >> libxl_save_callout.o _libxl_save_msgs_callout.o \ >> libxl_qmp.o libxl_event.o libxl_fork.o \ >> - libxl_dom_suspend.o $(LIBXL_OBJS-y) >> + libxl_dom_suspend.o libxl_pvusb.o $(LIBXL_OBJS-y) >> LIBXL_OBJS += libxl_genid.o >> LIBXL_OBJS += _libxl_types.o libxl_flask.o _libxl_types_internal.o >> >> diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c >> index dacfaae..a050e8b 100644 >> --- a/tools/libxl/libxl.c >> +++ b/tools/libxl/libxl.c >> @@ -4218,11 +4218,54 @@ DEFINE_DEVICE_REMOVE(vtpm, destroy, 1) >> >> >> / >> **/ >> >> +/* Macro for defining device remove/destroy functions for usbctrl */ >> +/* Follo:wing functions are defined: >> + * libxl_device_usbctrl_remove >> + * libxl_device_usbctrl_destroy >> + */ >> + >> +#define DEFINE_DEVICE_REMOVE_EXT(type, removedestroy, f)\ >> +int libxl_device_##type##_##removedestroy(libxl_ctx *ctx, \ >> +uint32_t domid, libxl_device_##type *type, \ >> +const libxl_asyncop_how *ao_how)\ >> +{ \ >> +AO_CREATE(ctx, domid, ao_how); \ >> +libxl__device *device; \ >> +libxl__ao_device *aodev;\ >> +int rc; \ >> +\ >> +GCNEW(device); \ >> +rc = libxl__device_from_##type(gc, domid, type, device);\ >> +if (rc != 0) goto out; \ >> +\ >> +GCNEW(aodev); \ >> +libxl__prepare_ao_device(ao, aodev);\ >> +aodev->action = LIBXL__DEVICE_ACTION_REMOVE;\ >> +aodev->dev = device;
Re: [Xen-devel] [PATCH v7 16/32] xen/x86: allow disabling the pmtimer
On 03/11/15 12:41, Jan Beulich wrote: On 03.11.15 at 11:57, wrote: >> On 03/11/15 07:21, Jan Beulich wrote: >> On 30.10.15 at 16:36, wrote: On 30/10/15 13:16, Jan Beulich wrote: On 30.10.15 at 13:50, wrote: >> El 14/10/15 a les 16.37, Jan Beulich ha escrit: >> On 02.10.15 at 17:48, wrote: Signed-off-by: Roger Pau Monné Cc: Jan Beulich Cc: Andrew Cooper --- Changes since v6: - Return ENODEV in pmtimer_load if the timer is disabled. - hvm_acpi_power_button and hvm_acpi_sleep_button become noops if the pmtimer is disabled. >>> But how are those two features connected? I don't think you can >>> assume absence of a PM block just because there's no PM timer. >>> Or if you want to tie them together for now, the predicate needs >>> to be renamed. >>> - Return ENODEV if pmtimer_change_ioport is called with the pmtimer disabled. >>> Same here. >> What about changing XEN_X86_EMU_PMTIMER into XEN_X86_EMU_PM and this >> flags disables all PM stuff? > Ah, right, that's a reasonable option. It still might be a nice idea to split them in two, given future work. To support hotplug properly (cpu, ram and pci), Xen needs to inject GPEs, which comes from part of the PM infrastructure. To support PCI devices in the future without the whole PM infrastructure, it would be nice to keep the split. >>> Coming back to this - I'm not sure: The hotplug aspect as you >>> mention it should matter for Dom0 only. DomU could (and perhaps >>> should) use a PV interface instead. >> I disagree. >> >> All PVH guests should use the same mechanism; making a split between >> dom0 and domU will only make our lives harder. >> >> Where reasonable, we should follow what happens on native; one of the >> underlying points of PVH is to have less of an impact on the guest >> side. In some cases it is indeed nasty, but has the advantage of being >> well understood. > What meaning would ACPI have to a PVH DomU? Whatever is covered in the tables provided. For hotplug, this is at minimum a PM block which can be used to inject GPEs. > >>> So I'd like to suggest quite the opposite: Don't call the thing PM, >>> but make it more general and call it ACPI. And instead of >>> separating HPET, we might have this fall under ACPI as well, or >>> we might have a second TIMER flag, requiring both to be set >>> for there to be a HPET and PMTMR. This leaves open the option >>> of Dom0 getting ACPI enabled (despite this then being "real", >>> not emulated ACPI), but TIMER left off. >> An HPET can exist independently of other features such as ACPI. It >> should have its own option. > Without ACPI there's no defined way to discover it. Doing what > Linux does - applying chipset knowledge - won't work on PVH either, > because there's no emulated chipset. Which would leave scanning > physical memory, but if there is none, none can be found. In reality, the legacy HPET always lives at 0xfed0, so only a single MMIO read is required to locate one. As for the Linux chipset behaviour, that reminds me that I need to do something similar in Xen to deny MMIO access. At the moment, if the legacy HPET is not exposed in the ACPI tables, Xen doesn't find the HPET but Linux does, and attempts to play with interrupts. It doesn't get very far, but the kexec environment finds itself without a timesource, as Linux disables legacy broadcast mode. ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v4 9/9] libxc: create p2m list outside of kernel mapping if supported
In case the kernel of a new pv-domU indicates it is supporting a p2m list outside the initial kernel mapping by specifying INIT_P2M, let the domain builder allocate the memory for the p2m list from physical guest memory only and map it to the address the kernel is expecting. This will enable loading pv-domUs larger than 512 GB. Signed-off-by: Juergen Gross --- tools/libxc/include/xc_dom.h | 1 + tools/libxc/xc_dom_core.c| 15 +++- tools/libxc/xc_dom_x86.c | 56 ++-- 3 files changed, 64 insertions(+), 8 deletions(-) diff --git a/tools/libxc/include/xc_dom.h b/tools/libxc/include/xc_dom.h index 7c157c3..ad8e47e 100644 --- a/tools/libxc/include/xc_dom.h +++ b/tools/libxc/include/xc_dom.h @@ -238,6 +238,7 @@ struct xc_dom_arch { char *native_protocol; int page_shift; int sizeof_pfn; +int p2m_base_supported; int arch_private_size; struct xc_dom_arch *next; diff --git a/tools/libxc/xc_dom_core.c b/tools/libxc/xc_dom_core.c index ad91b35..5d6c3ba 100644 --- a/tools/libxc/xc_dom_core.c +++ b/tools/libxc/xc_dom_core.c @@ -777,6 +777,7 @@ struct xc_dom_image *xc_dom_allocate(xc_interface *xch, dom->parms.virt_hypercall = UNSET_ADDR; dom->parms.virt_hv_start_low = UNSET_ADDR; dom->parms.elf_paddr_offset = UNSET_ADDR; +dom->parms.p2m_base = UNSET_ADDR; dom->alloc_malloc += sizeof(*dom); return dom; @@ -1096,7 +1097,11 @@ int xc_dom_build_image(struct xc_dom_image *dom) } /* allocate other pages */ -if ( dom->arch_hooks->alloc_p2m_list && +if ( !dom->arch_hooks->p2m_base_supported || + dom->parms.p2m_base >= dom->parms.virt_base || + (dom->parms.p2m_base & (XC_DOM_PAGE_SIZE(dom) - 1)) ) +dom->parms.p2m_base = UNSET_ADDR; +if ( dom->arch_hooks->alloc_p2m_list && dom->parms.p2m_base == UNSET_ADDR && dom->arch_hooks->alloc_p2m_list(dom) != 0 ) goto err; if ( dom->arch_hooks->alloc_magic_pages(dom) != 0 ) @@ -1124,6 +1129,14 @@ int xc_dom_build_image(struct xc_dom_image *dom) dom->initrd_len = page_size * dom->ramdisk_seg.pages; } +/* Allocate p2m list if outside of initial kernel mapping. */ +if ( dom->arch_hooks->alloc_p2m_list && dom->parms.p2m_base != UNSET_ADDR ) +{ +if ( dom->arch_hooks->alloc_p2m_list(dom) != 0 ) +goto err; +dom->p2m_seg.vstart = dom->parms.p2m_base; +} + return 0; err: diff --git a/tools/libxc/xc_dom_x86.c b/tools/libxc/xc_dom_x86.c index 497aa55..147468c 100644 --- a/tools/libxc/xc_dom_x86.c +++ b/tools/libxc/xc_dom_x86.c @@ -69,6 +69,7 @@ #define bits_to_mask(bits) (((xen_vaddr_t)1 << (bits))-1) #define round_down(addr, mask) ((addr) & ~(mask)) #define round_up(addr, mask) ((addr) | (mask)) +#define round_pg_up(addr) (((addr) + PAGE_SIZE_X86 - 1) & ~(PAGE_SIZE_X86 - 1)) struct xc_dom_params { unsigned levels; @@ -90,7 +91,7 @@ struct xc_dom_x86_mapping { struct xc_dom_image_x86 { unsigned n_mappings; -#define MAPPING_MAX 1 +#define MAPPING_MAX 2 struct xc_dom_x86_mapping maps[MAPPING_MAX]; struct xc_dom_params *params; }; @@ -483,11 +484,8 @@ static int setup_pgtables_x86_64(struct xc_dom_image *dom) /* */ -static int alloc_p2m_list(struct xc_dom_image *dom) +static int alloc_p2m_list(struct xc_dom_image *dom, size_t p2m_alloc_size) { -size_t p2m_alloc_size = dom->p2m_size * dom->arch_hooks->sizeof_pfn; - -/* allocate phys2mach table */ if ( xc_dom_alloc_segment(dom, &dom->p2m_seg, "phys2mach", 0, p2m_alloc_size) ) return -1; @@ -498,6 +496,40 @@ static int alloc_p2m_list(struct xc_dom_image *dom) return 0; } +static int alloc_p2m_list_x86_32(struct xc_dom_image *dom) +{ +size_t p2m_alloc_size = dom->p2m_size * dom->arch_hooks->sizeof_pfn; + +p2m_alloc_size = round_pg_up(p2m_alloc_size); +return alloc_p2m_list(dom, p2m_alloc_size); +} + +static int alloc_p2m_list_x86_64(struct xc_dom_image *dom) +{ +struct xc_dom_image_x86 *domx86 = dom->arch_private; +struct xc_dom_x86_mapping *map = domx86->maps + domx86->n_mappings; +size_t p2m_alloc_size = dom->p2m_size * dom->arch_hooks->sizeof_pfn; +xen_vaddr_t from, to; +unsigned lvl; + +p2m_alloc_size = round_pg_up(p2m_alloc_size); +if ( dom->parms.p2m_base != UNSET_ADDR ) +{ +from = dom->parms.p2m_base; +to = from + p2m_alloc_size - 1; +if ( count_pgtables(dom, from, to, dom->pfn_alloc_end) ) +return -1; + +map->area.pfn = dom->pfn_alloc_end; +for ( lvl = 0; lvl < 4; lvl++ ) +map->lvls[lvl].pfn += p2m_alloc_size >> PAGE_SHIFT_X86; +domx86->n_mappings++; +p2m_alloc_size += map->area.pgtables << PAGE_SHIFT_X86; +} + +return alloc_p2m_list(dom, p2m_alloc_size); +} + /*
[Xen-devel] [PATCH v4 7/9] libxc: split p2m allocation in domain builder from other magic pages
Carve out the p2m list allocation from the .alloc_magic_pages hook of the domain builder in order to prepare allocating the p2m list outside of the initial kernel mapping. This will be needed to support loading domains with huge memory (>512 GB). Signed-off-by: Juergen Gross Acked-by: Ian Campbell Acked-by: Wei Liu --- tools/libxc/include/xc_dom.h | 1 + tools/libxc/xc_dom_core.c| 3 +++ tools/libxc/xc_dom_x86.c | 11 ++- 3 files changed, 14 insertions(+), 1 deletion(-) diff --git a/tools/libxc/include/xc_dom.h b/tools/libxc/include/xc_dom.h index 2358012..7c157c3 100644 --- a/tools/libxc/include/xc_dom.h +++ b/tools/libxc/include/xc_dom.h @@ -221,6 +221,7 @@ struct xc_dom_arch { /* pagetable setup */ int (*alloc_magic_pages) (struct xc_dom_image * dom); int (*alloc_pgtables) (struct xc_dom_image * dom); +int (*alloc_p2m_list) (struct xc_dom_image * dom); int (*setup_pgtables) (struct xc_dom_image * dom); /* arch-specific data structs setup */ diff --git a/tools/libxc/xc_dom_core.c b/tools/libxc/xc_dom_core.c index 7b48b1f..ad91b35 100644 --- a/tools/libxc/xc_dom_core.c +++ b/tools/libxc/xc_dom_core.c @@ -1096,6 +1096,9 @@ int xc_dom_build_image(struct xc_dom_image *dom) } /* allocate other pages */ +if ( dom->arch_hooks->alloc_p2m_list && + dom->arch_hooks->alloc_p2m_list(dom) != 0 ) +goto err; if ( dom->arch_hooks->alloc_magic_pages(dom) != 0 ) goto err; if ( dom->arch_hooks->alloc_pgtables(dom) != 0 ) diff --git a/tools/libxc/xc_dom_x86.c b/tools/libxc/xc_dom_x86.c index 3c6bb9c..dd448cb 100644 --- a/tools/libxc/xc_dom_x86.c +++ b/tools/libxc/xc_dom_x86.c @@ -475,7 +475,7 @@ pfn_error: /* */ -static int alloc_magic_pages(struct xc_dom_image *dom) +static int alloc_p2m_list(struct xc_dom_image *dom) { size_t p2m_alloc_size = dom->p2m_size * dom->arch_hooks->sizeof_pfn; @@ -487,6 +487,13 @@ static int alloc_magic_pages(struct xc_dom_image *dom) if ( dom->p2m_guest == NULL ) return -1; +return 0; +} + +/* */ + +static int alloc_magic_pages(struct xc_dom_image *dom) +{ /* allocate special pages */ dom->start_info_pfn = xc_dom_alloc_page(dom, "start info"); dom->xenstore_pfn = xc_dom_alloc_page(dom, "xenstore"); @@ -1667,6 +1674,7 @@ static struct xc_dom_arch xc_dom_32_pae = { .arch_private_size = sizeof(struct xc_dom_image_x86), .alloc_magic_pages = alloc_magic_pages, .alloc_pgtables = alloc_pgtables_x86_32_pae, +.alloc_p2m_list = alloc_p2m_list, .setup_pgtables = setup_pgtables_x86_32_pae, .start_info = start_info_x86_32, .shared_info = shared_info_x86_32, @@ -1684,6 +1692,7 @@ static struct xc_dom_arch xc_dom_64 = { .arch_private_size = sizeof(struct xc_dom_image_x86), .alloc_magic_pages = alloc_magic_pages, .alloc_pgtables = alloc_pgtables_x86_64, +.alloc_p2m_list = alloc_p2m_list, .setup_pgtables = setup_pgtables_x86_64, .start_info = start_info_x86_64, .shared_info = shared_info_x86_64, -- 2.1.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v4 6/9] libxc: create unmapped initrd in domain builder if supported
In case the kernel of a new pv-domU indicates it is supporting an unmapped initrd, don't waste precious virtual space for the initrd, but allocate only guest physical memory for it. Signed-off-by: Juergen Gross Acked-by: Wei Liu --- tools/libxc/include/xc_dom.h | 5 + tools/libxc/xc_dom_core.c| 19 +-- tools/libxc/xc_dom_x86.c | 8 3 files changed, 26 insertions(+), 6 deletions(-) diff --git a/tools/libxc/include/xc_dom.h b/tools/libxc/include/xc_dom.h index 0ba9821..2358012 100644 --- a/tools/libxc/include/xc_dom.h +++ b/tools/libxc/include/xc_dom.h @@ -94,6 +94,11 @@ struct xc_dom_image { xen_pfn_t pfn_alloc_end; xen_vaddr_t virt_alloc_end; xen_vaddr_t bsd_symtab_start; + +/* initrd parameters as specified in start_info page */ +unsigned long initrd_start; +unsigned long initrd_len; + unsigned int alloc_bootstack; xen_vaddr_t virt_pgtab_end; diff --git a/tools/libxc/xc_dom_core.c b/tools/libxc/xc_dom_core.c index 3a31222..7b48b1f 100644 --- a/tools/libxc/xc_dom_core.c +++ b/tools/libxc/xc_dom_core.c @@ -1041,6 +1041,7 @@ static int xc_dom_build_ramdisk(struct xc_dom_image *dom) int xc_dom_build_image(struct xc_dom_image *dom) { unsigned int page_size; +bool unmapped_initrd; DOMPRINTF_CALLED(dom->xch); @@ -1064,11 +1065,15 @@ int xc_dom_build_image(struct xc_dom_image *dom) if ( dom->kernel_loader->loader(dom) != 0 ) goto err; -/* load ramdisk */ -if ( dom->ramdisk_blob ) +/* Don't load ramdisk now if no initial mapping required. */ +unmapped_initrd = dom->parms.unmapped_initrd && !dom->ramdisk_seg.vstart; + +if ( dom->ramdisk_blob && !unmapped_initrd ) { if ( xc_dom_build_ramdisk(dom) != 0 ) goto err; +dom->initrd_start = dom->ramdisk_seg.vstart; +dom->initrd_len = dom->ramdisk_seg.vend - dom->ramdisk_seg.vstart; } /* load devicetree */ @@ -1106,6 +,16 @@ int xc_dom_build_image(struct xc_dom_image *dom) if ( dom->virt_pgtab_end && xc_dom_alloc_pad(dom, dom->virt_pgtab_end) ) return -1; +/* Load ramdisk if no initial mapping required. */ +if ( dom->ramdisk_blob && unmapped_initrd ) +{ +if ( xc_dom_build_ramdisk(dom) != 0 ) +goto err; +dom->flags |= SIF_MOD_START_PFN; +dom->initrd_start = dom->ramdisk_seg.pfn; +dom->initrd_len = page_size * dom->ramdisk_seg.pages; +} + return 0; err: diff --git a/tools/libxc/xc_dom_x86.c b/tools/libxc/xc_dom_x86.c index aba50df..3c6bb9c 100644 --- a/tools/libxc/xc_dom_x86.c +++ b/tools/libxc/xc_dom_x86.c @@ -663,8 +663,8 @@ static int start_info_x86_32(struct xc_dom_image *dom) if ( dom->ramdisk_blob ) { -start_info->mod_start = dom->ramdisk_seg.vstart; -start_info->mod_len = dom->ramdisk_seg.vend - dom->ramdisk_seg.vstart; +start_info->mod_start = dom->initrd_start; +start_info->mod_len = dom->initrd_len; } if ( dom->cmdline ) @@ -710,8 +710,8 @@ static int start_info_x86_64(struct xc_dom_image *dom) if ( dom->ramdisk_blob ) { -start_info->mod_start = dom->ramdisk_seg.vstart; -start_info->mod_len = dom->ramdisk_seg.vend - dom->ramdisk_seg.vstart; +start_info->mod_start = dom->initrd_start; +start_info->mod_len = dom->initrd_len; } if ( dom->cmdline ) -- 2.1.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v4 8/9] libxc: rework of domain builder's page table handler
In order to prepare a p2m list outside of the initial kernel mapping do a rework of the domain builder's page table handler. The goal is to be able to use common helpers for page table allocation and setup for initial kernel page tables and page tables mapping the p2m list. This is achieved by supporting multiple mapping areas. The mapped virtual addresses of the single areas must not overlap, while the page tables of a new area added might already be partially present. Especially the top level page table is existing only once, of course. Currently restrict the number of mappings to 1 because the only mapping now is the initial mapping created by toolstack. There should not be behaviour change and guest visible change introduced. Signed-off-by: Juergen Gross --- tools/libxc/xc_dom_x86.c | 478 --- tools/libxc/xg_private.h | 39 +--- 2 files changed, 251 insertions(+), 266 deletions(-) diff --git a/tools/libxc/xc_dom_x86.c b/tools/libxc/xc_dom_x86.c index dd448cb..497aa55 100644 --- a/tools/libxc/xc_dom_x86.c +++ b/tools/libxc/xc_dom_x86.c @@ -26,6 +26,7 @@ #include #include #include +#include #include #include @@ -69,13 +70,29 @@ #define round_down(addr, mask) ((addr) & ~(mask)) #define round_up(addr, mask) ((addr) | (mask)) -struct xc_dom_image_x86 { -/* initial page tables */ +struct xc_dom_params { +unsigned levels; +xen_vaddr_t vaddr_mask; +x86_pgentry_t lvl_prot[4]; +}; + +struct xc_dom_x86_mapping_lvl { +xen_vaddr_t from; +xen_vaddr_t to; +xen_pfn_t pfn; unsigned int pgtables; -unsigned int pg_l4; -unsigned int pg_l3; -unsigned int pg_l2; -unsigned int pg_l1; +}; + +struct xc_dom_x86_mapping { +struct xc_dom_x86_mapping_lvl area; +struct xc_dom_x86_mapping_lvl lvls[4]; +}; + +struct xc_dom_image_x86 { +unsigned n_mappings; +#define MAPPING_MAX 1 +struct xc_dom_x86_mapping maps[MAPPING_MAX]; +struct xc_dom_params *params; }; /* get guest IO ABI protocol */ @@ -105,102 +122,159 @@ const char *xc_domain_get_native_protocol(xc_interface *xch, return protocol; } -static unsigned long -nr_page_tables(struct xc_dom_image *dom, - xen_vaddr_t start, xen_vaddr_t end, unsigned long bits) +static int count_pgtables(struct xc_dom_image *dom, xen_vaddr_t from, + xen_vaddr_t to, xen_pfn_t pfn) { -xen_vaddr_t mask = bits_to_mask(bits); -int tables; +struct xc_dom_image_x86 *domx86 = dom->arch_private; +struct xc_dom_x86_mapping *map, *map_cmp; +xen_pfn_t pfn_end; +xen_vaddr_t mask; +unsigned bits; +int l, m; -if ( bits == 0 ) -return 0; /* unused */ +if ( domx86->n_mappings == MAPPING_MAX ) +{ +xc_dom_panic(dom->xch, XC_OUT_OF_MEMORY, + "%s: too many mappings\n", __FUNCTION__); +return -ENOMEM; +} +map = domx86->maps + domx86->n_mappings; -if ( bits == (8 * sizeof(unsigned long)) ) +pfn_end = pfn + ((to - from) >> PAGE_SHIFT_X86); +if ( pfn_end >= dom->p2m_size ) { -/* must be pgd, need one */ -start = 0; -end = -1; -tables = 1; +xc_dom_panic(dom->xch, XC_OUT_OF_MEMORY, + "%s: not enough memory for initial mapping (%#"PRIpfn" > %#"PRIpfn")", + __FUNCTION__, pfn_end, dom->p2m_size); +return -ENOMEM; } -else +for ( m = 0; m < domx86->n_mappings; m++ ) { -start = round_down(start, mask); -end = round_up(end, mask); -tables = ((end - start) >> bits) + 1; +map_cmp = domx86->maps + m; +if ( from < map_cmp->area.to && to > map_cmp->area.from ) +{ +xc_dom_panic(dom->xch, XC_INTERNAL_ERROR, + "%s: overlapping mappings\n", __FUNCTION__); +return -1; +} } -DOMPRINTF("%s: 0x%016" PRIx64 "/%ld: 0x%016" PRIx64 - " -> 0x%016" PRIx64 ", %d table(s)", - __FUNCTION__, mask, bits, start, end, tables); -return tables; +memset(map, 0, sizeof(*map)); +map->area.from = from & domx86->params->vaddr_mask; +map->area.to = to & domx86->params->vaddr_mask; + +for ( l = domx86->params->levels - 1; l >= 0; l-- ) +{ +map->lvls[l].pfn = pfn + map->area.pgtables; +if ( l == domx86->params->levels - 1 ) +{ +if ( domx86->n_mappings == 0 ) +{ +map->lvls[l].from = 0; +map->lvls[l].to = domx86->params->vaddr_mask; +map->lvls[l].pgtables = 1; +map->area.pgtables++; +} +continue; +} + +bits = PAGE_SHIFT_X86 + (l + 1) * PGTBL_LEVEL_SHIFT_X86; +mask = bits_to_mask(bits); +map->lvls[l].from = map->area.from & ~mask; +map->lvls[l].to = map->area.to | mask; + +if ( domx86->params->levels == 3 && domx86->n_
[Xen-devel] [linux-3.4 test] 63567: regressions - FAIL
flight 63567 linux-3.4 real [real] http://logs.test-lab.xenproject.org/osstest/logs/63567/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: test-amd64-i386-rumpuserxen-i386 6 xen-boot fail REGR. vs. 62277 test-amd64-i386-qemuu-rhel6hvm-intel 6 xen-boot fail REGR. vs. 62277 test-amd64-i386-xl-qemuu-debianhvm-amd64 6 xen-boot fail REGR. vs. 62277 test-amd64-i386-xl-qemuu-debianhvm-amd64-xsm 6 xen-boot fail REGR. vs. 62277 test-amd64-amd64-xl-qemut-debianhvm-amd64-xsm 6 xen-boot fail REGR. vs. 62277 test-amd64-i386-xl-qemuu-ovmf-amd64 6 xen-boot fail REGR. vs. 62277 test-amd64-i386-xl6 xen-boot fail REGR. vs. 62277 test-amd64-i386-freebsd10-amd64 6 xen-boot fail REGR. vs. 62277 test-amd64-amd64-xl-xsm 6 xen-boot fail REGR. vs. 62277 test-amd64-i386-xl-qemut-debianhvm-amd64 6 xen-boot fail REGR. vs. 62277 test-amd64-amd64-xl-multivcpu 6 xen-boot fail REGR. vs. 62277 test-amd64-amd64-xl-qemuu-debianhvm-amd64 6 xen-boot fail REGR. vs. 62277 test-amd64-i386-qemut-rhel6hvm-intel 6 xen-boot fail REGR. vs. 62277 test-amd64-amd64-xl-qemuu-ovmf-amd64 6 xen-boot fail REGR. vs. 62277 test-amd64-amd64-xl-qemut-winxpsp3 6 xen-bootfail REGR. vs. 62277 test-amd64-i386-xl-qemuu-winxpsp3 6 xen-boot fail REGR. vs. 62277 Tests which are failing intermittently (not blocking): test-amd64-amd64-amd64-pvgrub 3 host-install(3) broken in 63294 pass in 63567 test-amd64-i386-qemuu-rhel6hvm-amd 3 host-install(3) broken in 63294 pass in 63567 test-amd64-amd64-xl-qemuu-winxpsp3 3 host-install(3) broken in 63294 pass in 63567 test-amd64-i386-xl-xsm3 host-install(3) broken in 63310 pass in 63567 test-amd64-amd64-xl-qcow2 3 host-install(3) broken in 63310 pass in 63567 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 3 host-install(3) broken in 63310 pass in 63567 test-amd64-amd64-xl-qemut-winxpsp3 3 host-install(3) broken in 63310 pass in 63567 test-amd64-amd64-xl-credit2 3 host-install(3) broken in 63324 pass in 63567 test-amd64-i386-xl-raw3 host-install(3) broken in 63324 pass in 63567 test-amd64-i386-xl-qemut-stubdom-debianhvm-amd64-xsm 3 host-install(3) broken in 63324 pass in 63567 test-amd64-i386-xl-qemut-debianhvm-amd64-xsm 3 host-install(3) broken in 63324 pass in 63567 test-amd64-i386-qemut-rhel6hvm-amd 3 host-install(3) broken in 63324 pass in 63567 test-amd64-amd64-xl-qemut-stubdom-debianhvm-amd64-xsm 13 guest-localmigrate fail in 63324 pass in 63567 test-amd64-i386-xl-qemuu-winxpsp3-vcpus1 9 windows-install fail in 63485 pass in 63567 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 6 xen-boot fail pass in 63228 test-amd64-amd64-xl-rtds 6 xen-bootfail pass in 63228 test-amd64-amd64-i386-pvgrub 6 xen-bootfail pass in 63294 test-amd64-amd64-xl-qemut-stubdom-debianhvm-amd64-xsm 15 guest-localmigrate.2 fail pass in 63294 test-amd64-i386-pair 10 xen-boot/dst_host fail pass in 63310 test-amd64-i386-pair 9 xen-boot/src_host fail pass in 63310 test-amd64-amd64-libvirt-pair 10 xen-boot/dst_host fail pass in 63310 test-amd64-amd64-libvirt-pair 9 xen-boot/src_host fail pass in 63310 test-amd64-amd64-amd64-pvgrub 6 xen-boot fail pass in 63324 test-amd64-amd64-xl-qemuu-debianhvm-amd64-xsm 6 xen-boot fail pass in 63324 test-amd64-amd64-xl-qcow2 6 xen-bootfail pass in 63338 test-amd64-i386-libvirt-pair 10 xen-boot/dst_host fail pass in 63374 test-amd64-i386-libvirt-pair 9 xen-boot/src_host fail pass in 63374 test-amd64-amd64-pair10 xen-boot/dst_host fail pass in 63404 test-amd64-amd64-pair 9 xen-boot/src_host fail pass in 63404 test-amd64-i386-qemut-rhel6hvm-amd 6 xen-boot fail pass in 63485 Regressions which are regarded as allowable (not blocking): test-amd64-i386-libvirt-xsm 6 xen-boot fail REGR. vs. 62277 test-amd64-amd64-libvirt-xsm 6 xen-boot fail REGR. vs. 62277 test-amd64-i386-xl-qemut-stubdom-debianhvm-amd64-xsm 15 guest-localmigrate.2 fail in 63228 blocked in 62277 test-amd64-amd64-rumpuserxen-amd64 15 rumpuserxen-demo-xenstorels/xenstorels.repeat fail like 62277 test-amd64-amd64-xl-qemut-win7-amd64 17 guest-stop fail like 62277 test-amd64-amd64-xl-qemuu-win7-amd64 17 guest-stop fail like 62277 test-amd64-i386-xl-qemut-win7-amd64 17 guest-stop fail like 62277 Tests which did not succeed, but are not blocking: test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 10 migrate-support-check fail in 63228 never pass test-amd64-amd64-xl-pvh-intel 11 guest-start fail never pass test-amd64-amd64-xl-pvh-
[Xen-devel] [PATCH v4 5/9] libxc: use domain builder architecture private data for x86 pv domains
Move some data private to the x86 domain builder to the private data section. Remove extra_pages as they are used nowhere. Signed-off-by: Juergen Gross Acked-by: Wei Liu --- tools/libxc/include/xc_dom.h | 8 tools/libxc/xc_dom_x86.c | 48 +--- 2 files changed, 32 insertions(+), 24 deletions(-) diff --git a/tools/libxc/include/xc_dom.h b/tools/libxc/include/xc_dom.h index 09f73cd..0ba9821 100644 --- a/tools/libxc/include/xc_dom.h +++ b/tools/libxc/include/xc_dom.h @@ -94,15 +94,7 @@ struct xc_dom_image { xen_pfn_t pfn_alloc_end; xen_vaddr_t virt_alloc_end; xen_vaddr_t bsd_symtab_start; - -/* initial page tables */ -unsigned int pgtables; -unsigned int pg_l4; -unsigned int pg_l3; -unsigned int pg_l2; -unsigned int pg_l1; unsigned int alloc_bootstack; -unsigned int extra_pages; xen_vaddr_t virt_pgtab_end; /* other state info */ diff --git a/tools/libxc/xc_dom_x86.c b/tools/libxc/xc_dom_x86.c index ea32b00..aba50df 100644 --- a/tools/libxc/xc_dom_x86.c +++ b/tools/libxc/xc_dom_x86.c @@ -69,6 +69,15 @@ #define round_down(addr, mask) ((addr) & ~(mask)) #define round_up(addr, mask) ((addr) | (mask)) +struct xc_dom_image_x86 { +/* initial page tables */ +unsigned int pgtables; +unsigned int pg_l4; +unsigned int pg_l3; +unsigned int pg_l2; +unsigned int pg_l1; +}; + /* get guest IO ABI protocol */ const char *xc_domain_get_native_protocol(xc_interface *xch, uint32_t domid) @@ -132,9 +141,9 @@ static int alloc_pgtables(struct xc_dom_image *dom, int pae, int pages, extra_pages; xen_vaddr_t try_virt_end; xen_pfn_t try_pfn_end; +struct xc_dom_image_x86 *domx86 = dom->arch_private; extra_pages = dom->alloc_bootstack ? 1 : 0; -extra_pages += dom->extra_pages; extra_pages += 128; /* 512kB padding */ pages = extra_pages; for ( ; ; ) @@ -152,29 +161,30 @@ static int alloc_pgtables(struct xc_dom_image *dom, int pae, return -ENOMEM; } -dom->pg_l4 = +domx86->pg_l4 = nr_page_tables(dom, dom->parms.virt_base, try_virt_end, l4_bits); -dom->pg_l3 = +domx86->pg_l3 = nr_page_tables(dom, dom->parms.virt_base, try_virt_end, l3_bits); -dom->pg_l2 = +domx86->pg_l2 = nr_page_tables(dom, dom->parms.virt_base, try_virt_end, l2_bits); -dom->pg_l1 = +domx86->pg_l1 = nr_page_tables(dom, dom->parms.virt_base, try_virt_end, l1_bits); if (pae && try_virt_end < 0xc000) { DOMPRINTF("%s: PAE: extra l2 page table for l3#3", __FUNCTION__); -dom->pg_l2++; +domx86->pg_l2++; } -dom->pgtables = dom->pg_l4 + dom->pg_l3 + dom->pg_l2 + dom->pg_l1; -pages = dom->pgtables + extra_pages; +domx86->pgtables = domx86->pg_l4 + domx86->pg_l3 + + domx86->pg_l2 + domx86->pg_l1; +pages = domx86->pgtables + extra_pages; if ( dom->virt_alloc_end + pages * PAGE_SIZE_X86 <= try_virt_end + 1 ) break; } dom->virt_pgtab_end = try_virt_end + 1; return xc_dom_alloc_segment(dom, &dom->pgtables_seg, "page tables", 0, -dom->pgtables * PAGE_SIZE_X86); +domx86->pgtables * PAGE_SIZE_X86); } /* */ @@ -262,9 +272,10 @@ static xen_pfn_t move_l3_below_4G(struct xc_dom_image *dom, static int setup_pgtables_x86_32_pae(struct xc_dom_image *dom) { +struct xc_dom_image_x86 *domx86 = dom->arch_private; xen_pfn_t l3pfn = dom->pgtables_seg.pfn; -xen_pfn_t l2pfn = l3pfn + dom->pg_l3; -xen_pfn_t l1pfn = l2pfn + dom->pg_l2; +xen_pfn_t l2pfn = l3pfn + domx86->pg_l3; +xen_pfn_t l1pfn = l2pfn + domx86->pg_l2; l3_pgentry_64_t *l3tab; l2_pgentry_64_t *l2tab = NULL; l1_pgentry_64_t *l1tab = NULL; @@ -373,10 +384,11 @@ static int alloc_pgtables_x86_64(struct xc_dom_image *dom) static int setup_pgtables_x86_64(struct xc_dom_image *dom) { +struct xc_dom_image_x86 *domx86 = dom->arch_private; xen_pfn_t l4pfn = dom->pgtables_seg.pfn; -xen_pfn_t l3pfn = l4pfn + dom->pg_l4; -xen_pfn_t l2pfn = l3pfn + dom->pg_l3; -xen_pfn_t l1pfn = l2pfn + dom->pg_l2; +xen_pfn_t l3pfn = l4pfn + domx86->pg_l4; +xen_pfn_t l2pfn = l3pfn + domx86->pg_l3; +xen_pfn_t l1pfn = l2pfn + domx86->pg_l2; l4_pgentry_64_t *l4tab = xc_dom_pfn_to_ptr(dom, l4pfn, 1); l3_pgentry_64_t *l3tab = NULL; l2_pgentry_64_t *l2tab = NULL; @@ -619,6 +631,7 @@ static int alloc_magic_pages_hvm(struct xc_dom_image *dom) static int start_info_x86_32(struct xc_dom_image *dom) { +struct xc_dom_image_x86 *domx86 = dom->arch_private; start_info_x86_32_t *star
[Xen-devel] [PATCH v4 4/9] libxc: introduce domain builder architecture specific data
Reorganize struct xc_dom_image to contain a pointer to domain builder architecture specific private data. This will abstract the architecture or domain type specific data from the general used data. The new area is allocated as soon as the domain type is known. Signed-off-by: Juergen Gross Acked-by: Wei Liu --- stubdom/grub/kexec.c | 6 +- tools/libxc/include/xc_dom.h | 6 +- tools/libxc/xc_dom_core.c| 27 +++ 3 files changed, 29 insertions(+), 10 deletions(-) diff --git a/stubdom/grub/kexec.c b/stubdom/grub/kexec.c index 2300318..8fd9ff9 100644 --- a/stubdom/grub/kexec.c +++ b/stubdom/grub/kexec.c @@ -272,7 +272,11 @@ void kexec(void *kernel, long kernel_size, void *module, long module_size, char #endif /* equivalent of xc_dom_mem_init */ -dom->arch_hooks = xc_dom_find_arch_hooks(xc_handle, dom->guest_type); +if (xc_dom_set_arch_hooks(dom)) { +grub_printf("xc_dom_set_arch_hooks failed\n"); +errnum = ERR_EXEC_FORMAT; +goto out; +} dom->total_pages = start_info.nr_pages; /* equivalent of arch_setup_meminit */ diff --git a/tools/libxc/include/xc_dom.h b/tools/libxc/include/xc_dom.h index 19d45f4..09f73cd 100644 --- a/tools/libxc/include/xc_dom.h +++ b/tools/libxc/include/xc_dom.h @@ -175,6 +175,9 @@ struct xc_dom_image { unsigned int *vnode_to_pnode; unsigned int nr_vnodes; +/* domain type/architecture specific data */ +void *arch_private; + /* kernel loader */ struct xc_dom_arch *arch_hooks; /* allocate up to pfn_alloc_end */ @@ -237,6 +240,7 @@ struct xc_dom_arch { char *native_protocol; int page_shift; int sizeof_pfn; +int arch_private_size; struct xc_dom_arch *next; }; @@ -290,7 +294,7 @@ int xc_dom_devicetree_mem(struct xc_dom_image *dom, const void *mem, size_t memsize); int xc_dom_parse_image(struct xc_dom_image *dom); -struct xc_dom_arch *xc_dom_find_arch_hooks(xc_interface *xch, char *guest_type); +int xc_dom_set_arch_hooks(struct xc_dom_image *dom); int xc_dom_build_image(struct xc_dom_image *dom); int xc_dom_update_guest_p2m(struct xc_dom_image *dom); diff --git a/tools/libxc/xc_dom_core.c b/tools/libxc/xc_dom_core.c index 74de3c3..3a31222 100644 --- a/tools/libxc/xc_dom_core.c +++ b/tools/libxc/xc_dom_core.c @@ -710,19 +710,30 @@ void xc_dom_register_arch_hooks(struct xc_dom_arch *hooks) first_hook = hooks; } -struct xc_dom_arch *xc_dom_find_arch_hooks(xc_interface *xch, char *guest_type) +int xc_dom_set_arch_hooks(struct xc_dom_image *dom) { struct xc_dom_arch *hooks = first_hook; while ( hooks != NULL ) { -if ( !strcmp(hooks->guest_type, guest_type)) -return hooks; +if ( !strcmp(hooks->guest_type, dom->guest_type) ) +{ +if ( hooks->arch_private_size ) +{ +dom->arch_private = malloc(hooks->arch_private_size); +if ( dom->arch_private == NULL ) +return -1; +memset(dom->arch_private, 0, hooks->arch_private_size); +dom->alloc_malloc += hooks->arch_private_size; +} +dom->arch_hooks = hooks; +return 0; +} hooks = hooks->next; } -xc_dom_panic(xch, XC_INVALID_KERNEL, - "%s: not found (type %s)", __FUNCTION__, guest_type); -return NULL; +xc_dom_panic(dom->xch, XC_INVALID_KERNEL, + "%s: not found (type %s)", __FUNCTION__, dom->guest_type); +return -1; } /* */ @@ -734,6 +745,7 @@ void xc_dom_release(struct xc_dom_image *dom) if ( dom->phys_pages ) xc_dom_unmap_all(dom); xc_dom_free_all(dom); +free(dom->arch_private); free(dom); } @@ -924,8 +936,7 @@ int xc_dom_mem_init(struct xc_dom_image *dom, unsigned int mem_mb) unsigned int page_shift; xen_pfn_t nr_pages; -dom->arch_hooks = xc_dom_find_arch_hooks(dom->xch, dom->guest_type); -if ( dom->arch_hooks == NULL ) +if ( xc_dom_set_arch_hooks(dom) ) { xc_dom_panic(dom->xch, XC_INTERNAL_ERROR, "%s: arch hooks not set", __FUNCTION__); -- 2.1.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v4 0/9] libxc: support building large pv-domains
The Xen hypervisor supports starting a dom0 with large memory (up to the TB range) by not including the initrd and p2m list in the initial kernel mapping. Especially the p2m list can grow larger than the available virtual space in the initial mapping. The started kernel is indicating the support of each feature via elf notes. This series enables the domain builder in libxc to do the same as the hypervisor. This enables starting of huge pv-domUs via xl. Unmapped initrd is supported for 64 and 32 bit domains, omitting the p2m from initial kernel mapping is possible for 64 bit domains only. Tested with: - 32 bit domU (kernel not supporting unmapped initrd) - 32 bit domU (kernel supporting unmapped initrd) - 1 GB 64 bit domU (kernel supporting unmapped initrd, not p2m) - 1 GB 64 bit domU (kernel supporting unmapped initrd and p2m) - 900GB 64 bit domU (kernel supporting unmapped initrd and p2m) - HVM domU Changes in v4: - updated patch 1 as suggested by Wei Liu (comment and variable name) - modify comment in patch 6 as suggested by Wei Liu - rework of patch 8 reducing line count by nearly 100 - added some additional plausibility checks to patch 8 as suggested by Wei Liu - renamed round_pg() to round_pg_up() in patch 9 as suggested by Wei Liu Changes in v3: - Rebased the complete series to new staging (hvm builder patches by Roger Pau Monne) - Removed old patch 1 as it broke stubdom build - Introduced new Patch 1 to make allocation of guest memory more clear regarding virtual/physical memory allocation (requested by Ian Campbell) - Change name of flag to indicate support of unmapped initrd in patch 2 (requested by Ian Campbell) - Introduce new patches 3, 4, 5 ("rename domain builder count_pgtables to alloc_pgtables", "introduce domain builder architecture specific data", "use domain builder architecture private data for x86 pv domains") to assist later page table work - don't fiddle with initrd virtual address in patch 6 (was patch 3 in v2), add explicit initrd parameters for start_info in struct xc_dom_image instead (requested by Ian Campbell) - Introduce new patch 8 ("rework of domain builder's page table handler") to be able to use common helpers for unmapped p2m list (requested by Ian Campbell) - use now known segment size in pages for p2m list in patch 9 (was patch 5 in v2) instead of fiddling with segment end address (requested by Ian Campbell) - split alloc_p2m_list() in patch 9 (was patch 5 in v2) to 32/64 bit variants (requested by Ian Campbell) Changes in v2: - patch 2 has been removed as it has been applied already - introduced new patch 2 as suggested by Ian Campbell: add a flag indicating support of an unmapped initrd to the parsed elf data of the elf_dom_parms structure - updated patch description of patch 3 as requested by Ian Campbell Juergen Gross (9): libxc: reorganize domain builder guest memory allocator xen: add generic flag to elf_dom_parms indicating support of unmapped initrd libxc: rename domain builder count_pgtables to alloc_pgtables libxc: introduce domain builder architecture specific data libxc: use domain builder architecture private data for x86 pv domains libxc: create unmapped initrd in domain builder if supported libxc: split p2m allocation in domain builder from other magic pages libxc: rework of domain builder's page table handler libxc: create p2m list outside of kernel mapping if supported stubdom/grub/kexec.c | 12 +- tools/libxc/include/xc_dom.h | 34 +-- tools/libxc/xc_dom_arm.c | 6 +- tools/libxc/xc_dom_core.c | 180 tools/libxc/xc_dom_x86.c | 563 + tools/libxc/xg_private.h | 39 +-- xen/arch/x86/domain_build.c| 4 +- xen/common/libelf/libelf-dominfo.c | 3 + xen/include/xen/libelf.h | 1 + 9 files changed, 490 insertions(+), 352 deletions(-) -- 2.1.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v4 2/9] xen: add generic flag to elf_dom_parms indicating support of unmapped initrd
Support of an unmapped initrd is indicated by the kernel of the domain via elf notes. In order not to have to use raw elf data in the tools for support of an unmapped initrd add a flag to the parsed data area to indicate the kernel supporting this feature. Switch using this flag in the hypervisor domain builder. Cc: andrew.coop...@citrix.com Cc: jbeul...@suse.com Cc: k...@xen.org Suggested-by: Ian Campbell Signed-off-by: Juergen Gross Acked-by: Jan Beulich --- xen/arch/x86/domain_build.c| 4 ++-- xen/common/libelf/libelf-dominfo.c | 3 +++ xen/include/xen/libelf.h | 1 + 3 files changed, 6 insertions(+), 2 deletions(-) diff --git a/xen/arch/x86/domain_build.c b/xen/arch/x86/domain_build.c index c2ef87a..d02dc4b 100644 --- a/xen/arch/x86/domain_build.c +++ b/xen/arch/x86/domain_build.c @@ -353,7 +353,7 @@ static unsigned long __init compute_dom0_nr_pages( vstart = parms->virt_base; vend = round_pgup(parms->virt_kend); -if ( !parms->elf_notes[XEN_ELFNOTE_MOD_START_PFN].data.num ) +if ( !parms->unmapped_initrd ) vend += round_pgup(initrd_len); end = vend + nr_pages * sizeof_long; @@ -1037,7 +1037,7 @@ int __init construct_dom0( v_start = parms.virt_base; vkern_start = parms.virt_kstart; vkern_end= parms.virt_kend; -if ( parms.elf_notes[XEN_ELFNOTE_MOD_START_PFN].data.num ) +if ( parms.unmapped_initrd ) { vinitrd_start = vinitrd_end = 0; vphysmap_start = round_pgup(vkern_end); diff --git a/xen/common/libelf/libelf-dominfo.c b/xen/common/libelf/libelf-dominfo.c index 3de1c23..c9243e4 100644 --- a/xen/common/libelf/libelf-dominfo.c +++ b/xen/common/libelf/libelf-dominfo.c @@ -190,6 +190,9 @@ elf_errorstatus elf_xen_parse_note(struct elf_binary *elf, case XEN_ELFNOTE_INIT_P2M: parms->p2m_base = val; break; +case XEN_ELFNOTE_MOD_START_PFN: +parms->unmapped_initrd = !!val; +break; case XEN_ELFNOTE_PADDR_OFFSET: parms->elf_paddr_offset = val; break; diff --git a/xen/include/xen/libelf.h b/xen/include/xen/libelf.h index de788c7..6da4cc0 100644 --- a/xen/include/xen/libelf.h +++ b/xen/include/xen/libelf.h @@ -423,6 +423,7 @@ struct elf_dom_parms { char loader[16]; enum xen_pae_type pae; bool bsd_symtab; +bool unmapped_initrd; uint64_t virt_base; uint64_t virt_entry; uint64_t virt_hypercall; -- 2.1.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH] tools: pygrub: if partition table is empty, try treating as a whole disk
pygrub (in identify_disk_image()) detects a DOS style partition table via the presence of the 0xaa55 signature at the end of the first sector of the disk. However this signature is also present in whole-disk configurations when there is an MBR on the disk. Many filesystems (e.g. ext[234]) include leading padding in their on disk format specifically to enable this. So if we think we have a DOS partition table but do not find any actual partition table entries we may as well try looking at it as a whole disk image. Worst case is we probe and find there isn't anything there. This was reported by Sjors Gielen in Debian bug #745419. The fix was inspired by a patch by Adi Kriegisch in https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=745419#27 Tested by genext2fs'ing my /boot into a new raw image (works) and then: dd if=/usr/lib/grub/i386-pc/g2ldr.mbr of=img conv=notrunc bs=512 count=1 to add an MBR (with 0xaa55 signature) to it, which after this patch also works. Signed-off-by: Ian Campbell Cc: 745419-forwar...@bugs.debian.org --- tools/pygrub/src/pygrub | 5 + 1 file changed, 5 insertions(+) diff --git a/tools/pygrub/src/pygrub b/tools/pygrub/src/pygrub index e4aedda..40f9584 100755 --- a/tools/pygrub/src/pygrub +++ b/tools/pygrub/src/pygrub @@ -156,6 +156,11 @@ def get_partition_offsets(file): else: part_offs.append(offset) +# We thought we had a DOS partition table, but didn't find any +# actual valid partition entries. This can happen because an MBR +# (e.g. grubs) may contain the same signature. +if not part_offs: part_offs = [0] + return part_offs class GrubLineEditor(curses.textpad.Textbox): -- 2.1.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v4 3/9] libxc: rename domain builder count_pgtables to alloc_pgtables
Rename the count_pgtables hook of the domain builder to alloc_pgtables and do the allocation of the guest memory for page tables inside this hook. This will remove the need for accessing the x86 specific pgtables member of struct xc_dom_image in the generic domain builder code. Signed-off-by: Juergen Gross Acked-by: Wei Liu --- tools/libxc/include/xc_dom.h | 2 +- tools/libxc/xc_dom_arm.c | 6 +++--- tools/libxc/xc_dom_core.c| 11 ++- tools/libxc/xc_dom_x86.c | 26 +- 4 files changed, 23 insertions(+), 22 deletions(-) diff --git a/tools/libxc/include/xc_dom.h b/tools/libxc/include/xc_dom.h index 68d6848..19d45f4 100644 --- a/tools/libxc/include/xc_dom.h +++ b/tools/libxc/include/xc_dom.h @@ -220,7 +220,7 @@ void xc_dom_register_loader(struct xc_dom_loader *loader); struct xc_dom_arch { /* pagetable setup */ int (*alloc_magic_pages) (struct xc_dom_image * dom); -int (*count_pgtables) (struct xc_dom_image * dom); +int (*alloc_pgtables) (struct xc_dom_image * dom); int (*setup_pgtables) (struct xc_dom_image * dom); /* arch-specific data structs setup */ diff --git a/tools/libxc/xc_dom_arm.c b/tools/libxc/xc_dom_arm.c index 397eef0..d9a6371 100644 --- a/tools/libxc/xc_dom_arm.c +++ b/tools/libxc/xc_dom_arm.c @@ -49,7 +49,7 @@ const char *xc_domain_get_native_protocol(xc_interface *xch, * arm guests are hybrid and start off with paging disabled, therefore no * pagetables and nothing to do here. */ -static int count_pgtables_arm(struct xc_dom_image *dom) +static int alloc_pgtables_arm(struct xc_dom_image *dom) { DOMPRINTF_CALLED(dom->xch); return 0; @@ -534,7 +534,7 @@ static struct xc_dom_arch xc_dom_32 = { .page_shift = PAGE_SHIFT_ARM, .sizeof_pfn = 8, .alloc_magic_pages = alloc_magic_pages, -.count_pgtables = count_pgtables_arm, +.alloc_pgtables = alloc_pgtables_arm, .setup_pgtables = setup_pgtables_arm, .start_info = start_info_arm, .shared_info = shared_info_arm, @@ -550,7 +550,7 @@ static struct xc_dom_arch xc_dom_64 = { .page_shift = PAGE_SHIFT_ARM, .sizeof_pfn = 8, .alloc_magic_pages = alloc_magic_pages, -.count_pgtables = count_pgtables_arm, +.alloc_pgtables = alloc_pgtables_arm, .setup_pgtables = setup_pgtables_arm, .start_info = start_info_arm, .shared_info = shared_info_arm, diff --git a/tools/libxc/xc_dom_core.c b/tools/libxc/xc_dom_core.c index a14d477..74de3c3 100644 --- a/tools/libxc/xc_dom_core.c +++ b/tools/libxc/xc_dom_core.c @@ -1082,15 +1082,8 @@ int xc_dom_build_image(struct xc_dom_image *dom) /* allocate other pages */ if ( dom->arch_hooks->alloc_magic_pages(dom) != 0 ) goto err; -if ( dom->arch_hooks->count_pgtables ) -{ -if ( dom->arch_hooks->count_pgtables(dom) != 0 ) -goto err; -if ( (dom->pgtables > 0) && - (xc_dom_alloc_segment(dom, &dom->pgtables_seg, "page tables", 0, - dom->pgtables * page_size) != 0) ) -goto err; -} +if ( dom->arch_hooks->alloc_pgtables(dom) != 0 ) +goto err; if ( dom->alloc_bootstack ) dom->bootstack_pfn = xc_dom_alloc_page(dom, "boot stack"); DOMPRINTF("%-20s: virt_alloc_end : 0x%" PRIx64 "", diff --git a/tools/libxc/xc_dom_x86.c b/tools/libxc/xc_dom_x86.c index ed43c28..ea32b00 100644 --- a/tools/libxc/xc_dom_x86.c +++ b/tools/libxc/xc_dom_x86.c @@ -126,7 +126,7 @@ nr_page_tables(struct xc_dom_image *dom, return tables; } -static int count_pgtables(struct xc_dom_image *dom, int pae, +static int alloc_pgtables(struct xc_dom_image *dom, int pae, int l4_bits, int l3_bits, int l2_bits, int l1_bits) { int pages, extra_pages; @@ -172,7 +172,9 @@ static int count_pgtables(struct xc_dom_image *dom, int pae, break; } dom->virt_pgtab_end = try_virt_end + 1; -return 0; + +return xc_dom_alloc_segment(dom, &dom->pgtables_seg, "page tables", 0, +dom->pgtables * PAGE_SIZE_X86); } /* */ @@ -182,9 +184,9 @@ static int count_pgtables(struct xc_dom_image *dom, int pae, #define L2_PROT (_PAGE_PRESENT|_PAGE_RW|_PAGE_ACCESSED|_PAGE_DIRTY|_PAGE_USER) #define L3_PROT (_PAGE_PRESENT) -static int count_pgtables_x86_32_pae(struct xc_dom_image *dom) +static int alloc_pgtables_x86_32_pae(struct xc_dom_image *dom) { -return count_pgtables(dom, 1, 0, 32, +return alloc_pgtables(dom, 1, 0, 32, L3_PAGETABLE_SHIFT_PAE, L2_PAGETABLE_SHIFT_PAE); } @@ -355,9 +357,9 @@ pfn_error: /* */ /* x86_64 pagetables*/ -static int count_pgtables_x86_64(struct xc_dom_image *dom) +static int alloc_pgtables_x86_64(struct xc_dom_image *dom)
Re: [Xen-devel] Hackathon 2016 Location Preferences
On Thu, Nov 05, 2015 at 03:21:18PM +, Lars Kurth wrote: > Hi all, > > I wanted to do quick straw-poll regarding Hackathon Locations for next > year. Before I do this though, I wanted to let you know that the 2016 > Developer Summit will most likely be in Berlin in October (I am in the > process of finalising space, budget and contract details which will > need to be approved by the Advisory Board). > > We do have two options for a Hackathon: China (either Shanghai, > Hangzhou or Beijing - details TBC) and Cambridge, UK. We are still in > the early planning phase and the budget for the Hackathon has not yet > been approved. > I lived in Hangzhou for a while -- it is a nice city in my humble opinion. :-) Wei. > Do let me know of your preference, and I will see whether I can work > with the vendor(s) who are willing to host the 2016 Hackathon and > choose a location, which suits a majority of developers. > > Best Regards Lars > > > ___ Xen-devel mailing list > Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] ovmf fail to compile
On Thu, Nov 05, 2015 at 02:07:26PM +, Hao, Xudong wrote: > > -Original Message- > > From: Wei Liu [mailto:wei.l...@citrix.com] > > Sent: Wednesday, November 4, 2015 6:19 PM > > To: Hao, Xudong > > Cc: Wei Liu ; xen-devel@lists.xen.org > > Subject: Re: [Xen-devel] ovmf fail to compile > > > > On Wed, Nov 04, 2015 at 08:27:56AM +, Hao, Xudong wrote: > > > "git clean -fdx" doesn't change the error result with gcc 4.4.7. Gcc > > > version is > > "gcc-4.4.?" in Debian Jessie of yours? > > > > > > > Debian Jessie's gcc-4.4 has the same version 4.4.7. > > > > As the other sub-thread suggests, can you try passing more f's to git? > > > > A somewhat related question, are you only interested in xen-unstable branch? > > Have you tried latest OVMF from upstream? If that builds for you I can > > easily > > send another patch to update Config.mk again. > > > > I'm busy on other urgent today, will try the two above tomorrow and share the > result later. > No worries. Wei. > -Xudong ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [Xen-API] Hackathon 2016 Location Preferences
On 5 Nov 2015, at 15:21, Lars Kurth wrote: > > Hi all, > > I wanted to do quick straw-poll regarding Hackathon Locations for next year. > Before I do this though, I wanted to let you know that the 2016 Developer > Summit will most likely be in Berlin in October (I am in the process of > finalising space, budget and contract details which will need to be approved > by the Advisory Board). > > We do have two options for a Hackathon: China (either Shanghai, Hangzhou or > Beijing - details TBC) and Cambridge, UK. We are still in the early planning > phase and the budget for the Hackathon has not yet been approved. A lot of unikernel hackers could show up if it's in Cambridge, but unfortunately not if it's in China (despite being a much more exciting location!). best, Anil ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [win-pv-devel] Hackathon 2016 Location Preferences
> -Original Message- > From: win-pv-devel-boun...@lists.xenproject.org [mailto:win-pv-devel- > boun...@lists.xenproject.org] On Behalf Of Lars Kurth > Sent: 05 November 2015 15:21 > To: Xen-devel; mirageos-devel; xen-...@lists.xenproject.org; Win-pv- > de...@lists.xenproject.org > Subject: [win-pv-devel] Hackathon 2016 Location Preferences > > Hi all, > > I wanted to do quick straw-poll regarding Hackathon Locations for next year. > Before I do this though, I wanted to let you know that the 2016 Developer > Summit will most likely be in Berlin in October (I am in the process of > finalising > space, budget and contract details which will need to be approved by the > Advisory Board). > > We do have two options for a Hackathon: China (either Shanghai, Hangzhou > or Beijing - details TBC) and Cambridge, UK. We are still in the early > planning > phase and the budget for the Hackathon has not yet been approved. > > Do let me know of your preference, and I will see whether I can work with > the vendor(s) who are willing to host the 2016 Hackathon and choose a > location, which suits a majority of developers. > Since this year's was in Shanghai, my vote would be for Cambridge. Paul > Best Regards > Lars > > > ___ > win-pv-devel mailing list > win-pv-de...@lists.xenproject.org > http://lists.xenproject.org/cgi-bin/mailman/listinfo/win-pv-devel ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v11 2/5] missing include asm/paravirt.h in cputime.c
How can this be missing? Things compile fine now, right? So please better explain why we do this change. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v4 1/9] libxc: reorganize domain builder guest memory allocator
Guest memory allocation in the domain builder of libxc is done via virtual addresses only. In order to be able to support preallocated areas not virtually mapped reorganize the memory allocator to keep track of allocated pages globally and in allocated segments. This requires an interface change of the allocate callback of the domain builder which currently is using the last mapped virtual address as a parameter. This is no problem as the only user of this callback is stubdom/grub/kexec.c using this virtual address to calculate the last used pfn. Signed-off-by: Juergen Gross --- stubdom/grub/kexec.c | 6 +-- tools/libxc/include/xc_dom.h | 13 +++--- tools/libxc/xc_dom_core.c| 107 --- 3 files changed, 79 insertions(+), 47 deletions(-) diff --git a/stubdom/grub/kexec.c b/stubdom/grub/kexec.c index 0b2f4f3..2300318 100644 --- a/stubdom/grub/kexec.c +++ b/stubdom/grub/kexec.c @@ -100,9 +100,9 @@ static void do_exchange(struct xc_dom_image *dom, xen_pfn_t target_pfn, xen_pfn_ dom->p2m_host[target_pfn] = source_mfn; } -int kexec_allocate(struct xc_dom_image *dom, xen_vaddr_t up_to) +int kexec_allocate(struct xc_dom_image *dom) { -unsigned long new_allocated = (up_to - dom->parms.virt_base) / PAGE_SIZE; +unsigned long new_allocated = dom->pfn_alloc_end - dom->rambase_pfn; unsigned long i; pages = realloc(pages, new_allocated * sizeof(*pages)); @@ -319,8 +319,6 @@ void kexec(void *kernel, long kernel_size, void *module, long module_size, char /* Make sure the bootstrap page table does not RW-map any of our current * page table frames */ -kexec_allocate(dom, dom->virt_pgtab_end); - if ( (rc = xc_dom_update_guest_p2m(dom))) { grub_printf("xc_dom_update_guest_p2m returned %d\n", rc); errnum = ERR_BOOT_FAILURE; diff --git a/tools/libxc/include/xc_dom.h b/tools/libxc/include/xc_dom.h index ccc5926..68d6848 100644 --- a/tools/libxc/include/xc_dom.h +++ b/tools/libxc/include/xc_dom.h @@ -29,6 +29,7 @@ struct xc_dom_seg { xen_vaddr_t vstart; xen_vaddr_t vend; xen_pfn_t pfn; +xen_pfn_t pages; }; struct xc_dom_mem { @@ -90,6 +91,7 @@ struct xc_dom_image { xen_pfn_t xenstore_pfn; xen_pfn_t shared_info_pfn; xen_pfn_t bootstack_pfn; +xen_pfn_t pfn_alloc_end; xen_vaddr_t virt_alloc_end; xen_vaddr_t bsd_symtab_start; @@ -175,8 +177,8 @@ struct xc_dom_image { /* kernel loader */ struct xc_dom_arch *arch_hooks; -/* allocate up to virt_alloc_end */ -int (*allocate) (struct xc_dom_image * dom, xen_vaddr_t up_to); +/* allocate up to pfn_alloc_end */ +int (*allocate) (struct xc_dom_image * dom); /* Container type (HVM or PV). */ enum { @@ -360,14 +362,11 @@ static inline void *xc_dom_seg_to_ptr_pages(struct xc_dom_image *dom, struct xc_dom_seg *seg, xen_pfn_t *pages_out) { -xen_vaddr_t segsize = seg->vend - seg->vstart; -unsigned int page_size = XC_DOM_PAGE_SIZE(dom); -xen_pfn_t pages = (segsize + page_size - 1) / page_size; void *retval; -retval = xc_dom_pfn_to_ptr(dom, seg->pfn, pages); +retval = xc_dom_pfn_to_ptr(dom, seg->pfn, seg->pages); -*pages_out = retval ? pages : 0; +*pages_out = retval ? seg->pages : 0; return retval; } diff --git a/tools/libxc/xc_dom_core.c b/tools/libxc/xc_dom_core.c index fbe4464..a14d477 100644 --- a/tools/libxc/xc_dom_core.c +++ b/tools/libxc/xc_dom_core.c @@ -535,56 +535,75 @@ void *xc_dom_pfn_to_ptr_retcount(struct xc_dom_image *dom, xen_pfn_t pfn, return phys->ptr; } -int xc_dom_alloc_segment(struct xc_dom_image *dom, - struct xc_dom_seg *seg, char *name, - xen_vaddr_t start, xen_vaddr_t size) +static int xc_dom_chk_alloc_pages(struct xc_dom_image *dom, char *name, + xen_pfn_t pages) { unsigned int page_size = XC_DOM_PAGE_SIZE(dom); -xen_pfn_t pages = (size + page_size - 1) / page_size; -xen_pfn_t pfn; -void *ptr; -if ( start == 0 ) -start = dom->virt_alloc_end; +if ( pages > dom->total_pages || /* multiple test avoids overflow probs */ + dom->pfn_alloc_end - dom->rambase_pfn > dom->total_pages || + pages > dom->total_pages - dom->pfn_alloc_end + dom->rambase_pfn ) +{ +xc_dom_panic(dom->xch, XC_OUT_OF_MEMORY, + "%s: segment %s too large (0x%"PRIpfn" > " + "0x%"PRIpfn" - 0x%"PRIpfn" pages)", __FUNCTION__, name, + pages, dom->total_pages, + dom->pfn_alloc_end - dom->rambase_pfn); +return -1; +} + +dom->pfn_alloc_end += pages; +dom->virt_alloc_end += pages * page_size; + +return 0; +} -if ( start & (page_size - 1) ) +static int xc_dom_alloc_pad(struct xc_dom_image *dom, xen_vaddr_t boundary) +{ +
Re: [Xen-devel] [PATCH] x86/PoD: tighten conditions for checking super page
>>> On 02.11.15 at 17:29, wrote: > * steal_for_cache may now be wrong. I realize that since now ram == 0 > that all the subsequent "steal_for_cache" expressions will end up as > "false" anyway, but leaving invariants in an invalid state is sort of > asking for trouble. > > I'd prefer you just update steal_for_cache; but if not, at least leave a > comment there saying that it may be wrong and why it doesn't matter. I've just done the other things, but I don't think steal_for_cache can have changed at this point: p2m_pod_cache_add() increments p2m->pod.count by the same value by which p2m_pod_zero_check_superpage() bumps p2m->pod.entry_count right after having called p2m_pod_cache_add(). I could leave a comment of ASSERT() to that effect, unless I'm overlooking something. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v11 1/5] xen: move xen_setup_runstate_info and get_runstate_snapshot to drivers/xen/time.c
Hi, > +static u64 get64(const u64 *p) > +{ > + u64 ret; > + > + if (BITS_PER_LONG < 64) { > + u32 *p32 = (u32 *)p; > + u32 h, l; > + > + /* > + * Read high then low, and then make sure high is > + * still the same; this will only loop if low wraps > + * and carries into high. > + * XXX some clean way to make this endian-proof? > + */ > + do { > + h = p32[1]; > + barrier(); > + l = p32[0]; > + barrier(); > + } while (p32[1] != h); I realise this is simply a move of existing code, but it may be better to instead have: do { h = READ_ONCE(p32[1]); l = READ_ONCE(p32[0]); } while (READ_ONCE(p32[1] != h); Which ensures that each load is a single access (though it almost certainly would be anyway), and prevents the compiler from having to reload any other memory locations (which the current barrier() usage forces). > + > + ret = (((u64)h) << 32) | l; > + } else > + ret = *p; Likewise, this would be better as READ_ONCE(*p), to force a single access. > + > + return ret; > +} > + do { > + state_time = get64(&state->state_entry_time); > + barrier(); > + *res = *state; > + barrier(); You can also have: *res = READ_ONCE(*state); That will which will handle the barriers implicitly. Thanks, Mark. > + } while (get64(&state->state_entry_time) != state_time); > +} ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH 0/2] wallclock time on arm
Hi all, this small series enables the wallclock time on arm and it consists mostly in code movement from x86 to common. Stefano Stabellini (2): xen: move wallclock functions from x86 to common arm: export platform_op XENPF_settime xen/arch/arm/Makefile |1 + xen/arch/arm/domain.c |3 ++ xen/arch/arm/platform_hypercall.c | 62 xen/arch/arm/time.c |5 -- xen/arch/arm/traps.c |1 + xen/arch/x86/time.c | 92 +--- xen/common/time.c | 94 + xen/include/xsm/dummy.h | 12 ++--- xen/include/xsm/xsm.h | 13 ++--- 9 files changed, 175 insertions(+), 108 deletions(-) create mode 100644 xen/arch/arm/platform_hypercall.c Cheers, Stefano ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH 2/2] arm: export platform_op XENPF_settime
Call update_domain_wallclock_time at domain initialization. Signed-off-by: Stefano Stabellini Signed-off-by: Ian Campbell --- xen/arch/arm/Makefile |1 + xen/arch/arm/domain.c |3 ++ xen/arch/arm/platform_hypercall.c | 62 + xen/arch/arm/traps.c |1 + xen/include/xsm/dummy.h | 12 +++ xen/include/xsm/xsm.h | 13 6 files changed, 80 insertions(+), 12 deletions(-) create mode 100644 xen/arch/arm/platform_hypercall.c diff --git a/xen/arch/arm/Makefile b/xen/arch/arm/Makefile index 1ef39f7..240aa29 100644 --- a/xen/arch/arm/Makefile +++ b/xen/arch/arm/Makefile @@ -23,6 +23,7 @@ obj-y += percpu.o obj-y += guestcopy.o obj-y += physdev.o obj-y += platform.o +obj-y += platform_hypercall.o obj-y += setup.o obj-y += bootfdt.o obj-y += time.o diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c index b2bfc7d..ac9b1b3 100644 --- a/xen/arch/arm/domain.c +++ b/xen/arch/arm/domain.c @@ -742,6 +742,9 @@ int arch_set_info_guest( v->arch.ttbr1 = ctxt->ttbr1; v->arch.ttbcr = ctxt->ttbcr; +if ( v->vcpu_id == 0 ) +update_domain_wallclock_time(v->domain); + v->is_initialised = 1; if ( ctxt->flags & VGCF_online ) diff --git a/xen/arch/arm/platform_hypercall.c b/xen/arch/arm/platform_hypercall.c new file mode 100644 index 000..f60d7b3 --- /dev/null +++ b/xen/arch/arm/platform_hypercall.c @@ -0,0 +1,62 @@ +/** + * platform_hypercall.c + * + * Hardware platform operations. Intended for use by domain-0 kernel. + * + * Copyright (c) 2015, Citrix + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include + +DEFINE_SPINLOCK(xenpf_lock); + +long do_platform_op(XEN_GUEST_HANDLE_PARAM(xen_platform_op_t) u_xenpf_op) +{ +long ret; +struct xen_platform_op curop, *op = &curop; + +if ( copy_from_guest(op, u_xenpf_op, 1) ) +return -EFAULT; + +if ( op->interface_version != XENPF_INTERFACE_VERSION ) +return -EACCES; + +ret = xsm_platform_op(XSM_PRIV, op->cmd); +if ( ret ) +return ret; + +spin_lock(&xenpf_lock); + +switch ( op->cmd ) +{ +case XENPF_settime32: +do_settime(op->u.settime32.secs, + op->u.settime32.nsecs, + op->u.settime32.system_time); +break; + +case XENPF_settime64: +if ( likely(!op->u.settime64.mbz) ) +do_settime(op->u.settime64.secs, + op->u.settime64.nsecs, + op->u.settime64.system_time); +else +ret = -EINVAL; +break; + +default: +ret = -ENOSYS; +break; +} + +spin_unlock(&xenpf_lock); +return ret; +} diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c index 9d2bd6a..c49bd3f 100644 --- a/xen/arch/arm/traps.c +++ b/xen/arch/arm/traps.c @@ -1233,6 +1233,7 @@ static arm_hypercall_t arm_hypercall_table[] = { HYPERCALL(hvm_op, 2), HYPERCALL(grant_table_op, 3), HYPERCALL(multicall, 2), +HYPERCALL(platform_op, 1), HYPERCALL_ARM(vcpu_op, 3), }; diff --git a/xen/include/xsm/dummy.h b/xen/include/xsm/dummy.h index 9fe372c..aec5a9b 100644 --- a/xen/include/xsm/dummy.h +++ b/xen/include/xsm/dummy.h @@ -583,6 +583,12 @@ static XSM_INLINE int xsm_mem_sharing(XSM_DEFAULT_ARG struct domain *d) return xsm_default_action(action, current->domain, d); } #endif + +static XSM_INLINE int xsm_platform_op(XSM_DEFAULT_ARG uint32_t op) +{ +XSM_ASSERT_ACTION(XSM_PRIV); +return xsm_default_action(action, current->domain, NULL); +} #ifdef CONFIG_X86 static XSM_INLINE int xsm_do_mca(XSM_DEFAULT_VOID) @@ -639,12 +645,6 @@ static XSM_INLINE int xsm_apic(XSM_DEFAULT_ARG struct domain *d, int cmd) return xsm_default_action(action, d, NULL); } -static XSM_INLINE int xsm_platform_op(XSM_DEFAULT_ARG uint32_t op) -{ -XSM_ASSERT_ACTION(XSM_PRIV); -return xsm_default_action(action, current->domain, NULL); -} - static XSM_INLINE int xsm_machine_memory_map(XSM_DEFAULT_VOID) { XSM_ASSERT_ACTION(XSM_PRIV); diff --git a/xen/include/xsm/xsm.h b/xen/include/xsm/xsm.h index ba3caed..f48cf60 100644 --- a/xen/include/xsm/xsm.h +++ b/xen/include/xsm/xsm.h @@ -164,6 +164,8 @@ struct xsm_operations { int (*mem_sharing) (struct domain *d); #endif +int (*platform_op) (uint32_t cmd); + #ifdef CONFIG_X86 int (*do_mca) (void); int (*shadow_control) (struct domain *d, uint32_t op); @@ -175,7 +177,6 @@ struct xsm_operations { int (*mem_sharing_op) (struct domain *d, struct domain *cd, int op); int (*apic) (struct domain *d, int cmd); int (*memtype) (uint32_t access); -int (*platform_op) (uint32_t cmd); int (*machine_memory_map) (void); int (*domain_memory_map) (struct domain *d); #define XSM_MMU_UPDATE_READ 1 @@ -6
[Xen-devel] [PATCH 1/2] xen: move wallclock functions from x86 to common
Remove dummy arm implementation of wallclock_time. Use shared_info() in common code rather than x86-ism to access it. Signed-off-by: Stefano Stabellini Signed-off-by: Ian Campbell --- xen/arch/arm/time.c |5 --- xen/arch/x86/time.c | 92 + xen/common/time.c | 94 +++ 3 files changed, 95 insertions(+), 96 deletions(-) diff --git a/xen/arch/arm/time.c b/xen/arch/arm/time.c index 5ded30c..6207615 100644 --- a/xen/arch/arm/time.c +++ b/xen/arch/arm/time.c @@ -280,11 +280,6 @@ void domain_set_time_offset(struct domain *d, int64_t time_offset_seconds) /* XXX update guest visible wallclock time */ } -struct tm wallclock_time(uint64_t *ns) -{ -return (struct tm) { 0 }; -} - /* * Local variables: * mode: C diff --git a/xen/arch/x86/time.c b/xen/arch/x86/time.c index bbb7e6c..764d7dc 100644 --- a/xen/arch/x86/time.c +++ b/xen/arch/x86/time.c @@ -47,9 +47,6 @@ string_param("clocksource", opt_clocksource); unsigned long __read_mostly cpu_khz; /* CPU clock frequency in kHz. */ DEFINE_SPINLOCK(rtc_lock); unsigned long pit0_ticks; -static unsigned long wc_sec; /* UTC time at last 'time update'. */ -static unsigned int wc_nsec; -static DEFINE_SPINLOCK(wc_lock); struct cpu_time { u64 local_tsc_stamp; @@ -900,37 +897,6 @@ void force_update_vcpu_system_time(struct vcpu *v) __update_vcpu_system_time(v, 1); } -void update_domain_wallclock_time(struct domain *d) -{ -uint32_t *wc_version; -unsigned long sec; - -spin_lock(&wc_lock); - -wc_version = &shared_info(d, wc_version); -*wc_version = version_update_begin(*wc_version); -wmb(); - -sec = wc_sec + d->time_offset_seconds; -if ( likely(!has_32bit_shinfo(d)) ) -{ -d->shared_info->native.wc_sec= sec; -d->shared_info->native.wc_nsec = wc_nsec; -d->shared_info->native.wc_sec_hi = sec >> 32; -} -else -{ -d->shared_info->compat.wc_sec = sec; -d->shared_info->compat.wc_nsec= wc_nsec; -d->shared_info->compat.arch.wc_sec_hi = sec >> 32; -} - -wmb(); -*wc_version = version_update_end(*wc_version); - -spin_unlock(&wc_lock); -} - static void update_domain_rtc(void) { struct domain *d; @@ -988,27 +954,6 @@ int cpu_frequency_change(u64 freq) return 0; } -/* Set clock to after 00:00:00 UTC, 1 January, 1970. */ -void do_settime(unsigned long secs, unsigned int nsecs, u64 system_time_base) -{ -u64 x; -u32 y; -struct domain *d; - -x = SECONDS(secs) + nsecs - system_time_base; -y = do_div(x, 10); - -spin_lock(&wc_lock); -wc_sec = x; -wc_nsec = y; -spin_unlock(&wc_lock); - -rcu_read_lock(&domlist_read_lock); -for_each_domain ( d ) -update_domain_wallclock_time(d); -rcu_read_unlock(&domlist_read_lock); -} - /* Per-CPU communication between rendezvous IRQ and softirq handler. */ struct cpu_calibration { u64 local_tsc_stamp; @@ -1608,25 +1553,6 @@ void send_timer_event(struct vcpu *v) send_guest_vcpu_virq(v, VIRQ_TIMER); } -/* Return secs after 00:00:00 localtime, 1 January, 1970. */ -unsigned long get_localtime(struct domain *d) -{ -return wc_sec + (wc_nsec + NOW()) / 10ULL -+ d->time_offset_seconds; -} - -/* Return microsecs after 00:00:00 localtime, 1 January, 1970. */ -uint64_t get_localtime_us(struct domain *d) -{ -return (SECONDS(wc_sec + d->time_offset_seconds) + wc_nsec + NOW()) - / 1000UL; -} - -unsigned long get_sec(void) -{ -return wc_sec + (wc_nsec + NOW()) / 10ULL; -} - /* "cmos_utc_offset" is the difference between UTC time and CMOS time. */ static long cmos_utc_offset; /* in seconds */ @@ -1635,7 +1561,7 @@ int time_suspend(void) if ( smp_processor_id() == 0 ) { cmos_utc_offset = -get_cmos_time(); -cmos_utc_offset += (wc_sec + (wc_nsec + NOW()) / 10ULL); +cmos_utc_offset += get_sec(); kill_timer(&calibration_timer); /* Sync platform timer stamps. */ @@ -1715,22 +1641,6 @@ int hwdom_pit_access(struct ioreq *ioreq) return 0; } -struct tm wallclock_time(uint64_t *ns) -{ -uint64_t seconds, nsec; - -if ( !wc_sec ) -return (struct tm) { 0 }; - -seconds = NOW() + SECONDS(wc_sec) + wc_nsec; -nsec = do_div(seconds, 10); - -if ( ns ) -*ns = nsec; - -return gmtime(seconds); -} - /* * PV SoftTSC Emulation. */ diff --git a/xen/common/time.c b/xen/common/time.c index 29fdf52..306c5dc 100644 --- a/xen/common/time.c +++ b/xen/common/time.c @@ -16,7 +16,13 @@ */ #include +#include +#include +#include #include +#include +#include + /* Nonzero if YEAR is a leap year (every 4 years, except every 100th isn't, and every 400th is). */ @@ -34,6 +40,10 @@ const unsigned short int __mon_lengths[2][12] = { #define SECS_PER_HOUR (60 * 60) #define
Re: [Xen-devel] [PATCH v11 5/5] xen/arm: account for stolen ticks
> static void xen_percpu_init(void) > { > struct vcpu_register_vcpu_info info; > @@ -104,6 +120,8 @@ static void xen_percpu_init(void) > BUG_ON(err); > per_cpu(xen_vcpu, cpu) = vcpup; > > + xen_setup_runstate_info(cpu); Does the runstate memory area get unregsitered when a kernel tears things down, or is kexec somehow inhibited for xen guests? i couldn't spot either happening, but I may have missed it. Mark. > + > after_register_vcpu_info: > enable_percpu_irq(xen_events_irq, 0); > put_cpu(); > @@ -271,6 +289,9 @@ static int __init xen_guest_init(void) > > register_cpu_notifier(&xen_cpu_notifier); > > + pv_time_ops.steal_clock = xen_stolen_accounting; > + static_key_slow_inc(¶virt_steal_enabled); > + > return 0; > } > early_initcall(xen_guest_init); > -- > 1.7.10.4 > > > ___ > linux-arm-kernel mailing list > linux-arm-ker...@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel > ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] Design doc of adding ACPI support for arm64 on Xen - version 6
This document is going to explain the design details of Xen booting with ACPI on ARM. Any comments are welcome. Changes v5->v6: * add a new node "uefi" under /hypervisor to pass UEFI informations to Dom0 instead of the nodes under /chosen. * change creation of MADT table, get the information from domain->arch.vgic struct * Reuse grant table region which will not be used by Dom0 when booting through ACPI to store the new created ACPI tables. Changes v4->v5: * change the description of section 4 to make it more generic * place EFI and ACPI tables at non-RAM space of Dom0 Changes v3->v4: * add explanation for minimal DT and the properties * drop "linux," prefix of the properties * add explanation for the event channel flag * create RSDP table since the "xsdt_physical_address" is changed * since it uses hypervisor_id introduced by ACPI 6.0 to notify Dom0 the hypervisor ID, so it needs to limit minimum supported ACPI version for Xen on ARM to 6.0. Changes v2->v3: * remove the two HVM_PARAMs for grant table and let linux kernel use xlated_setup_gnttab_pages() to setup grant table. * don't modify GTDT table * add definition of event-channel interrupt flag * state that route all Xen unused interrupt to Dom0 * state that reusing existing PCI bus_notifier for PCI devices MMIO * mapping To Xen itself booting with ACPI, this is similar to Linux kernel except that Xen doesn't parse DSDT table. So I'll skip this part and focus on how Xen prepares ACPI tables for Dom0 and how Xen passes them to Dom0. 1. Create minimal DT to pass required informations to Dom0 -- When booting in UEFI mode on ARM64, it needs to pass some UEFI informations to Dom0. The necessary informations is the address of EFI System table and EFI Memory Descriptor table, the size of EFI Memory Descriptor table, the size of EFI Memory Descriptor and the version of EFI Memory Descriptor. Here it passes these informations through the "uefi" node under hypervisor of this minimal DT. Dom0 should parse this DT to get Xen UEFI informations like the way Linux kernel getting normal UEFI informations. Also, it should check if the DT contains only the /hypervisor and /chosen nodes to know whether it boots with DT or ACPI. In addition, Dom0 should parse DT to know whether it runs on Xen hypervisor, then it should execute a Xen UEFI specific routine to initialize UEFI. An example of the minimal DT: / { #address-cells = <2>; #size-cells = <2>; hypervisor { compatible = "xen,xen-4.3", "xen,xen"; reg = <0 0xb000 0 0x2>; /* Only need for booting without ACPI */ interrupts = <1 15 0xf08>; /* Only need for booting without ACPI */ uefi { xen,uefi-system-table = <0x>; xen,uefi-mmap-start = <0x>; xen,uefi-mmap-size = <0x>; xen,uefi-mmap-desc-size = <0x>; xen,uefi-mmap-desc-ver = <0x>; }; }; chosen { bootargs = "kernel=Image console=hvc0 earlycon=pl011,0x1c09 root=/dev/vda2 rw rootfstype=ext4 init=/bin/sh acpi=force"; linux,initrd-start = <0x>; linux,initrd-end = <0x>; }; }; For details loook at(this will be updated by a patch of Linux kernel) https://github.com/torvalds/linux/blob/master/Documentation/devicetree/bindings/arm/xen.txt 2. Copy and change some EFI and ACPI tables --- a) Create EFI_SYSTEM_TABLE table Create a new EFI System table. Copy the table header from host original EFI System table. Change the value of HeaderSize, CRC32 and Revision fields in this EFI System table header. Assign new values for FirmwareVendor and FirmwareRevision fields of EFI System table. Create one ConfigurationTable and assign the value of VendorGuid field to ACPI_20_TABLE_GUID, the value of VendorTable field to the address of ACPI RSDP table. This EFI System Table will be passed to Dom0 through the property "uefi-system-table" in the above minimal DT. So Dom0 could get ACPI root table address through the ConfigurationTable. b) Create EFI_MEMORY_DESCRIPTOR table It needs to notify Dom0 where are the RAM regions. Add memory start and size information of Dom0 in this table. It's passed to Dom0 through the properties "uefi-mmap-start", "uefi-mmap-size", "uefi-mmap-desc-size" and "uefi-mmap-desc-ver" of the minimal DT. Then Dom0 will get the memory information through this EFI table. c) Create FADT table Firstly copy the contents of host FADT table to the new created FADT table. Then change the value of arm_boot_flags to enable PSCI and HVC. d) Create MADT table It needs to change MADT table to restrict the number of vCPUs. Firstly copy the contents of host MADT table except the interrupt controller structures to the new created MADT table. For GICv2, it needs to add dom0_max_vcpus number of GICC entries and one GICD entry. For GICv3, it needs to add one GICD
Re: [Xen-devel] [PATCH 2/2] arm: export platform_op XENPF_settime
On 05/11/15 16:57, Stefano Stabellini wrote: > +case XENPF_settime32: > +do_settime(op->u.settime32.secs, > + op->u.settime32.nsecs, > + op->u.settime32.system_time); > +break; I don't think you want to provide this hypercall -- only provide the XENPF_settime64 one. David ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH RFC] vmalloc/vzalloc: Add memflags parameter.
>>> On 02.11.15 at 18:12, wrote: > --- a/xen/common/domain.c > +++ b/xen/common/domain.c > @@ -1223,7 +1223,7 @@ long do_vcpu_op(int cmd, unsigned int vcpuid, > XEN_GUEST_HANDLE_PARAM(void) arg) > if ( v->vcpu_info == &dummy_vcpu_info ) > return -EINVAL; > > -if ( (ctxt = alloc_vcpu_guest_context()) == NULL ) > +if ( (ctxt = alloc_vcpu_guest_context(MEMF_node(domain_to_node(d > == NULL ) This one's a temporary allocation that gets freed a few lines down. Hence best performance would be achieved by using the current CPU's node, which iiuc will result if you pass just zero here. > --- a/xen/common/domctl.c > +++ b/xen/common/domctl.c > @@ -492,7 +492,7 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) > u_domctl) > < sizeof(struct compat_vcpu_guest_context)); > #endif > ret = -ENOMEM; > -if ( (c.nat = alloc_vcpu_guest_context()) == NULL ) > +if ( (c.nat = > alloc_vcpu_guest_context(MEMF_node(domain_to_node(d == NULL ) Same here. > --- a/xen/include/asm-x86/domain.h > +++ b/xen/include/asm-x86/domain.h > @@ -577,9 +577,9 @@ void domain_cpuid(struct domain *d, > > #define domain_max_vcpus(d) (is_hvm_domain(d) ? HVM_MAX_VCPUS : > MAX_VIRT_CPUS) > > -static inline struct vcpu_guest_context *alloc_vcpu_guest_context(void) > +static inline struct vcpu_guest_context *alloc_vcpu_guest_context(unsigned > int memflags) > { > -return vmalloc(sizeof(struct vcpu_guest_context)); > +return vmalloc(sizeof(struct vcpu_guest_context), memflags); With the above you won't need to add a parameter to the function anymore, but if for some reason you did you'd need to mirror this to ARM code. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH 0/3] Xen wallclock on arm and arm64
Hi all, this series introduces PV wallclock time support on arm and arm64. Stefano Stabellini (3): xen/arm: introduce xen_read_wallclock xen/arm: introduce HYPERVISOR_dom0_op on arm and arm64 xen/arm: set the system time in Xen via the XENPF_settime hypercall arch/arm/Kconfig |1 + arch/arm/include/asm/xen/hypercall.h |2 + arch/arm/xen/enlighten.c | 82 ++ arch/arm/xen/hypercall.S |1 + arch/arm64/xen/hypercall.S |1 + 5 files changed, 87 insertions(+) Cheers, Stefano ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH 3/3] xen/arm: set the system time in Xen via the XENPF_settime hypercall
If Linux is running as dom0, call XENPF_settime to update the system time in Xen on pvclock_gtod notifications. Signed-off-by: Stefano Stabellini Signed-off-by: Ian Campbell --- arch/arm/xen/enlighten.c | 52 +- 1 file changed, 51 insertions(+), 1 deletion(-) diff --git a/arch/arm/xen/enlighten.c b/arch/arm/xen/enlighten.c index b6aea9c..0176db0 100644 --- a/arch/arm/xen/enlighten.c +++ b/arch/arm/xen/enlighten.c @@ -28,6 +28,7 @@ #include #include #include +#include #include #include @@ -123,6 +124,50 @@ static void xen_read_wallclock(struct timespec *ts) set_normalized_timespec(ts, now.tv_sec, now.tv_nsec); } +static int xen_pvclock_gtod_notify(struct notifier_block *nb, + unsigned long was_set, void *priv) +{ + /* Protected by the calling core code serialization */ + static struct timespec next_sync; + + struct xen_platform_op op; + struct timespec now; + + now = __current_kernel_time(); + + /* +* We only take the expensive HV call when the clock was set +* or when the 11 minutes RTC synchronization time elapsed. +*/ + if (!was_set && timespec_compare(&now, &next_sync) < 0) + return NOTIFY_OK; + + op.interface_version = XENPF_INTERFACE_VERSION; + op.cmd = XENPF_settime; + op.u.settime.secs = now.tv_sec; + op.u.settime.nsecs = now.tv_nsec; + op.u.settime.system_time = arch_timer_read_counter(); + printk("GTOD: Setting to %ld.%ld at %lld\n", + (long)op.u.settime.secs, + (long)op.u.settime.nsecs, + (long long)op.u.settime.system_time); + (void)HYPERVISOR_dom0_op(&op); + + /* +* Move the next drift compensation time 11 minutes +* ahead. That's emulating the sync_cmos_clock() update for +* the hardware RTC. +*/ + next_sync = now; + next_sync.tv_sec += 11 * 60; + + return NOTIFY_OK; +} + +static struct notifier_block xen_pvclock_gtod_notifier = { + .notifier_call = xen_pvclock_gtod_notify, +}; + static void xen_percpu_init(void) { struct vcpu_register_vcpu_info info; @@ -321,7 +366,12 @@ static int __init xen_guest_init(void) pv_time_ops.steal_clock = xen_stolen_accounting; static_key_slow_inc(¶virt_steal_enabled); xen_read_wallclock(&ts); - do_settimeofday(&ts); + if (xen_initial_domain()) + pvclock_gtod_register_notifier(&xen_pvclock_gtod_notifier); + else { + xen_read_wallclock(&ts); + do_settimeofday(&ts); + } return 0; } -- 1.7.10.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH 2/3] xen/arm: introduce HYPERVISOR_dom0_op on arm and arm64
Signed-off-by: Stefano Stabellini --- arch/arm/include/asm/xen/hypercall.h |2 ++ arch/arm/xen/enlighten.c |1 + arch/arm/xen/hypercall.S |1 + arch/arm64/xen/hypercall.S |1 + 4 files changed, 5 insertions(+) diff --git a/arch/arm/include/asm/xen/hypercall.h b/arch/arm/include/asm/xen/hypercall.h index 712b50e..7a8ee15 100644 --- a/arch/arm/include/asm/xen/hypercall.h +++ b/arch/arm/include/asm/xen/hypercall.h @@ -35,6 +35,7 @@ #include #include +#include long privcmd_call(unsigned call, unsigned long a1, unsigned long a2, unsigned long a3, @@ -49,6 +50,7 @@ int HYPERVISOR_memory_op(unsigned int cmd, void *arg); int HYPERVISOR_physdev_op(int cmd, void *arg); int HYPERVISOR_vcpu_op(int cmd, int vcpuid, void *extra_args); int HYPERVISOR_tmem_op(void *arg); +int HYPERVISOR_dom0_op(void *arg); int HYPERVISOR_multicall(struct multicall_entry *calls, uint32_t nr); static inline int diff --git a/arch/arm/xen/enlighten.c b/arch/arm/xen/enlighten.c index f07383d..b6aea9c 100644 --- a/arch/arm/xen/enlighten.c +++ b/arch/arm/xen/enlighten.c @@ -359,5 +359,6 @@ EXPORT_SYMBOL_GPL(HYPERVISOR_memory_op); EXPORT_SYMBOL_GPL(HYPERVISOR_physdev_op); EXPORT_SYMBOL_GPL(HYPERVISOR_vcpu_op); EXPORT_SYMBOL_GPL(HYPERVISOR_tmem_op); +EXPORT_SYMBOL_GPL(HYPERVISOR_dom0_op); EXPORT_SYMBOL_GPL(HYPERVISOR_multicall); EXPORT_SYMBOL_GPL(privcmd_call); diff --git a/arch/arm/xen/hypercall.S b/arch/arm/xen/hypercall.S index 10fd99c..89db58f 100644 --- a/arch/arm/xen/hypercall.S +++ b/arch/arm/xen/hypercall.S @@ -89,6 +89,7 @@ HYPERCALL2(memory_op); HYPERCALL2(physdev_op); HYPERCALL3(vcpu_op); HYPERCALL1(tmem_op); +HYPERCALL1(dom0_op); HYPERCALL2(multicall); ENTRY(privcmd_call) diff --git a/arch/arm64/xen/hypercall.S b/arch/arm64/xen/hypercall.S index 8bbe940..3840b1a 100644 --- a/arch/arm64/xen/hypercall.S +++ b/arch/arm64/xen/hypercall.S @@ -80,6 +80,7 @@ HYPERCALL2(memory_op); HYPERCALL2(physdev_op); HYPERCALL3(vcpu_op); HYPERCALL1(tmem_op); +HYPERCALL1(dom0_op); HYPERCALL2(multicall); ENTRY(privcmd_call) -- 1.7.10.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH 1/3] xen/arm: introduce xen_read_wallclock
Read the wallclock from the shared info page at boot time. Signed-off-by: Stefano Stabellini --- arch/arm/Kconfig |1 + arch/arm/xen/enlighten.c | 31 +++ 2 files changed, 32 insertions(+) diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index 60be104..a9de420 100644 --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig @@ -1852,6 +1852,7 @@ config XEN depends on CPU_V7 && !CPU_V6 depends on !GENERIC_ATOMIC64 depends on MMU + depends on HAVE_ARM_ARCH_TIMER select ARCH_DMA_ADDR_T_64BIT select ARM_PSCI select SWIOTLB_XEN diff --git a/arch/arm/xen/enlighten.c b/arch/arm/xen/enlighten.c index 15621b1..f07383d 100644 --- a/arch/arm/xen/enlighten.c +++ b/arch/arm/xen/enlighten.c @@ -28,6 +28,8 @@ #include #include #include +#include +#include #include @@ -95,6 +97,32 @@ static unsigned long long xen_stolen_accounting(int cpu) return state.time[RUNSTATE_runnable] + state.time[RUNSTATE_offline]; } +static void xen_read_wallclock(struct timespec *ts) +{ + u32 version; + u64 delta; + struct timespec now; + struct shared_info *s = HYPERVISOR_shared_info; + struct pvclock_wall_clock *wall_clock = &(s->wc); + + /* get wallclock at system boot */ + do { + version = wall_clock->version; + rmb(); /* fetch version before time */ + now.tv_sec = wall_clock->sec; + now.tv_nsec = wall_clock->nsec; + rmb(); /* fetch time before checking version */ + } while ((wall_clock->version & 1) || (version != wall_clock->version)); + + delta = arch_timer_read_counter(); /* time since system boot */ + delta += now.tv_sec * (u64)NSEC_PER_SEC + now.tv_nsec; + + now.tv_nsec = do_div(delta, NSEC_PER_SEC); + now.tv_sec = delta; + + set_normalized_timespec(ts, now.tv_sec, now.tv_nsec); +} + static void xen_percpu_init(void) { struct vcpu_register_vcpu_info info; @@ -218,6 +246,7 @@ static int __init xen_guest_init(void) struct shared_info *shared_info_page = NULL; struct resource res; phys_addr_t grant_frames; + struct timespec ts; if (!xen_domain()) return 0; @@ -291,6 +320,8 @@ static int __init xen_guest_init(void) pv_time_ops.steal_clock = xen_stolen_accounting; static_key_slow_inc(¶virt_steal_enabled); + xen_read_wallclock(&ts); + do_settimeofday(&ts); return 0; } -- 1.7.10.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH RFC] domain: Compile with lock_profile=y enabled.
>>> On 02.11.15 at 18:12, wrote: > --- a/xen/arch/x86/domain.c > +++ b/xen/arch/x86/domain.c > @@ -237,6 +237,7 @@ struct domain *alloc_domain_struct(void) > #ifdef CONFIG_BIGMEM > const unsigned int bits = 0; > #else > +int order = get_order_from_bytes(sizeof(*d)); unsigned int > @@ -247,10 +248,12 @@ struct domain *alloc_domain_struct(void) > bits = _domain_struct_bits(); > #endif > > -BUILD_BUG_ON(sizeof(*d) > PAGE_SIZE); Not unconditionally (i.e. at least non-debug builds should continue to have this). > -d = alloc_xenheap_pages(0, MEMF_bits(bits)); > +d = alloc_xenheap_pages(order, MEMF_bits(bits)); > if ( d != NULL ) > -clear_page(d); > +{ > +for ( ; order >= 0; order-- ) > +clear_page((void *)d + PAGE_SIZE*order); This loop works for orders 0 and 1, but not anything else (not clearing all of the pages). Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [MirageOS-devel] Hackathon 2016 Location Preferences
On 5 November 2015 at 16:24, Wei Liu wrote: >> We do have two options for a Hackathon: China (either Shanghai, >> Hangzhou or Beijing - details TBC) and Cambridge, UK. We are still in >> the early planning phase and the budget for the Hackathon has not yet >> been approved. > > I lived in Hangzhou for a while -- it is a nice city in my humble > opinion. :-) I have visited Hangzhou and it is certainly a nice city! But Cambridge would get my vote for convenience I'm afraid :) -- Richard Mortier m...@cantab.net ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 1/2] xen: move wallclock functions from x86 to common
>>> On 05.11.15 at 17:57, wrote: > --- a/xen/common/time.c > +++ b/xen/common/time.c > @@ -16,7 +16,13 @@ > */ > > #include > +#include > +#include > +#include > #include > +#include > +#include > + > > /* Nonzero if YEAR is a leap year (every 4 years, Stray blank line being added. Also please take the opportunity to remove xen/config.h here. > @@ -85,3 +95,87 @@ struct tm gmtime(unsigned long t) > > return tbuf; > } > + > +/* Explicitly OR with 1 just in case version number gets out of sync. */ > +#define version_update_begin(v) (((v)+1)|1) > +#define version_update_end(v) ((v)+1) This should be moved to a header instead of getting defined a second time here. Also please add spaces to match our coding style. > +struct tm wallclock_time(uint64_t *ns) > +{ > +uint64_t seconds, nsec; > + > +if ( !wc_sec ) > +return (struct tm) { 0 }; > + > +seconds = NOW() + SECONDS(wc_sec) + wc_nsec; > +nsec = do_div(seconds, 10); > + > +if ( ns ) > +*ns = nsec; > + > +return gmtime(seconds); > +} > + > + Stray blank lines again. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 1/2] xen: move wallclock functions from x86 to common
Hi, You forgot to CC the x86 maintainers. Regards, On 05/11/15 16:57, Stefano Stabellini wrote: > Remove dummy arm implementation of wallclock_time. > Use shared_info() in common code rather than x86-ism to access it. > > Signed-off-by: Stefano Stabellini > Signed-off-by: Ian Campbell > --- > xen/arch/arm/time.c |5 --- > xen/arch/x86/time.c | 92 + > xen/common/time.c | 94 > +++ > 3 files changed, 95 insertions(+), 96 deletions(-) > > diff --git a/xen/arch/arm/time.c b/xen/arch/arm/time.c > index 5ded30c..6207615 100644 > --- a/xen/arch/arm/time.c > +++ b/xen/arch/arm/time.c > @@ -280,11 +280,6 @@ void domain_set_time_offset(struct domain *d, int64_t > time_offset_seconds) > /* XXX update guest visible wallclock time */ > } > > -struct tm wallclock_time(uint64_t *ns) > -{ > -return (struct tm) { 0 }; > -} > - > /* > * Local variables: > * mode: C > diff --git a/xen/arch/x86/time.c b/xen/arch/x86/time.c > index bbb7e6c..764d7dc 100644 > --- a/xen/arch/x86/time.c > +++ b/xen/arch/x86/time.c > @@ -47,9 +47,6 @@ string_param("clocksource", opt_clocksource); > unsigned long __read_mostly cpu_khz; /* CPU clock frequency in kHz. */ > DEFINE_SPINLOCK(rtc_lock); > unsigned long pit0_ticks; > -static unsigned long wc_sec; /* UTC time at last 'time update'. */ > -static unsigned int wc_nsec; > -static DEFINE_SPINLOCK(wc_lock); > > struct cpu_time { > u64 local_tsc_stamp; > @@ -900,37 +897,6 @@ void force_update_vcpu_system_time(struct vcpu *v) > __update_vcpu_system_time(v, 1); > } > > -void update_domain_wallclock_time(struct domain *d) > -{ > -uint32_t *wc_version; > -unsigned long sec; > - > -spin_lock(&wc_lock); > - > -wc_version = &shared_info(d, wc_version); > -*wc_version = version_update_begin(*wc_version); > -wmb(); > - > -sec = wc_sec + d->time_offset_seconds; > -if ( likely(!has_32bit_shinfo(d)) ) > -{ > -d->shared_info->native.wc_sec= sec; > -d->shared_info->native.wc_nsec = wc_nsec; > -d->shared_info->native.wc_sec_hi = sec >> 32; > -} > -else > -{ > -d->shared_info->compat.wc_sec = sec; > -d->shared_info->compat.wc_nsec= wc_nsec; > -d->shared_info->compat.arch.wc_sec_hi = sec >> 32; > -} > - > -wmb(); > -*wc_version = version_update_end(*wc_version); > - > -spin_unlock(&wc_lock); > -} > - > static void update_domain_rtc(void) > { > struct domain *d; > @@ -988,27 +954,6 @@ int cpu_frequency_change(u64 freq) > return 0; > } > > -/* Set clock to after 00:00:00 UTC, 1 January, 1970. */ > -void do_settime(unsigned long secs, unsigned int nsecs, u64 system_time_base) > -{ > -u64 x; > -u32 y; > -struct domain *d; > - > -x = SECONDS(secs) + nsecs - system_time_base; > -y = do_div(x, 10); > - > -spin_lock(&wc_lock); > -wc_sec = x; > -wc_nsec = y; > -spin_unlock(&wc_lock); > - > -rcu_read_lock(&domlist_read_lock); > -for_each_domain ( d ) > -update_domain_wallclock_time(d); > -rcu_read_unlock(&domlist_read_lock); > -} > - > /* Per-CPU communication between rendezvous IRQ and softirq handler. */ > struct cpu_calibration { > u64 local_tsc_stamp; > @@ -1608,25 +1553,6 @@ void send_timer_event(struct vcpu *v) > send_guest_vcpu_virq(v, VIRQ_TIMER); > } > > -/* Return secs after 00:00:00 localtime, 1 January, 1970. */ > -unsigned long get_localtime(struct domain *d) > -{ > -return wc_sec + (wc_nsec + NOW()) / 10ULL > -+ d->time_offset_seconds; > -} > - > -/* Return microsecs after 00:00:00 localtime, 1 January, 1970. */ > -uint64_t get_localtime_us(struct domain *d) > -{ > -return (SECONDS(wc_sec + d->time_offset_seconds) + wc_nsec + NOW()) > - / 1000UL; > -} > - > -unsigned long get_sec(void) > -{ > -return wc_sec + (wc_nsec + NOW()) / 10ULL; > -} > - > /* "cmos_utc_offset" is the difference between UTC time and CMOS time. */ > static long cmos_utc_offset; /* in seconds */ > > @@ -1635,7 +1561,7 @@ int time_suspend(void) > if ( smp_processor_id() == 0 ) > { > cmos_utc_offset = -get_cmos_time(); > -cmos_utc_offset += (wc_sec + (wc_nsec + NOW()) / 10ULL); > +cmos_utc_offset += get_sec(); > kill_timer(&calibration_timer); > > /* Sync platform timer stamps. */ > @@ -1715,22 +1641,6 @@ int hwdom_pit_access(struct ioreq *ioreq) > return 0; > } > > -struct tm wallclock_time(uint64_t *ns) > -{ > -uint64_t seconds, nsec; > - > -if ( !wc_sec ) > -return (struct tm) { 0 }; > - > -seconds = NOW() + SECONDS(wc_sec) + wc_nsec; > -nsec = do_div(seconds, 10); > - > -if ( ns ) > -*ns = nsec; > - > -return gmtime(seconds); > -} > - > /* > * PV SoftTSC Emulation. > */ > diff --git a/xen/
Re: [Xen-devel] [PATCH 1/2] rwlock: add per-cpu reader-writer locks
Hi Malcolm, I tried your patches against staging yesterday and as soon as I started a guest, it panic. I have lock_profile enabled and applied your patches against: 6f04de658574833688c3f9eab310e7834d56a9c0 x86: cleanup of early cpuid handling (XEN) HVM1 save: CPU (XEN) HVM1 save: PIC (XEN) HVM1 save: IOAPIC (XEN) HVM1 save: LAPIC (XEN) HVM1 save: LAPIC_REGS (XEN) HVM1 save: PCI_IRQ (XEN) HVM1 save: ISA_IRQ (XEN) HVM1 save: PCI_LINK (XEN) HVM1 save: PIT (XEN) HVM1 save: RTC (XEN) HVM1 save: HPET (XEN) HVM1 save: PMTIMER (XEN) HVM1 save: MTRR (XEN) HVM1 save: VIRIDIAN_DOMAIN (XEN) HVM1 save: CPU_XSAVE (XEN) HVM1 save: VIRIDIAN_VCPU (XEN) HVM1 save: VMCE_VCPU (XEN) HVM1 save: TSC_ADJUST (XEN) HVM1 restore: CPU 0 [ 394.163143] loop: module loaded (XEN) Assertion 'rw_is_locked(&t->lock)' failed at grant_table.c:215 (XEN) [ Xen-4.7-unstable x86_64 debug=y Tainted:C ] (XEN) CPU:0 (XEN) RIP:e008:[] do_grant_table_op+0x63f/0x2e04 (XEN) RFLAGS: 00010246 CONTEXT: hypervisor (d0v0) (XEN) rax: rbx: 83400f9dc9e0 rcx: (XEN) rdx: 0001 rsi: 82d080342b10 rdi: 83400819b784 (XEN) rbp: 8300774ffef8 rsp: 8300774ffdf8 r8: 0002 (XEN) r9: 0002 r10: 0002 r11: (XEN) r12: r13: r14: 83400819b780 (XEN) r15: 83400f9d cr0: 80050033 cr4: 001526e0 (XEN) cr3: 01007f613000 cr2: 8800746182b8 (XEN) ds: es: fs: gs: ss: e010 cs: e008 (XEN) Xen stack trace from rsp=8300774ffdf8: (XEN)8300774ffe08 82d0 8300774ffef8 82d08017fc9b (XEN)82d080342b28 83400f9d8600 82d080342b10 (XEN)83400f9dca20 8321 834008188000 0001 (XEN)0001772ee000 8801e98d03e0 8300774ffe88 (XEN) 8300774fff18 0021d0269c10 0001001a (XEN)0001 0246 7ff7de45a407 (XEN)0100 7ff7de45a407 0033 8300772ee000 (XEN)8801eb0e3c00 880004bf57e8 8801e98d03e0 8801eb0a5938 (XEN)7cff88b000c7 82d08023d952 8100128a 0014 (XEN) 0001 8801f6e18388 81d3d740 (XEN)8801efb7bd40 88000542e780 0282 (XEN)8801e98d03a0 8801efe07000 0014 8100128a (XEN)0001 8801e98d03e0 00010100 (XEN)8100128a e033 0282 8801efb7bce0 (XEN)e02b (XEN) 8300772ee000 (XEN) (XEN) Xen call trace: (XEN)[] do_grant_table_op+0x63f/0x2e04 (XEN)[] lstar_enter+0xe2/0x13c (XEN) (XEN) (XEN) (XEN) Panic on CPU 0: (XEN) Assertion 'rw_is_locked(&t->lock)' failed at grant_table.c:215 (XEN) (XEN) (XEN) Manual reset required ('noreboot' specified) Thanks for your help. On 11/03/2015 12:58 PM, Malcolm Crossley wrote: Per-cpu read-write locks allow for the fast path read case to have low overhead by only setting/clearing a per-cpu variable for using the read lock. The per-cpu read fast path also avoids locked compare swap operations which can be particularly slow on coherent multi-socket systems, particularly if there is heavy usage of the read lock itself. The per-cpu reader-writer lock uses a global variable to control the read lock fast path. This allows a writer to disable the fast path and ensure the readers use the underlying read-write lock implementation. Once the writer has taken the write lock and disabled the fast path, it must poll the per-cpu variable for all CPU's which have entered the critical section for the specific read-write lock the writer is attempting to take. This design allows for a single per-cpu variable to be used for read/write locks belonging to seperate data structures as long as multiple per-cpu read locks are not simultaneously held by one particular cpu. This also means per-cpu reader-writer locks are not recursion safe. Slow path readers which are unblocked set the per-cpu variable and drop the read lock. This simplifies the implementation and allows for fairness in the underlying read-write lock to be taken advantage of. There may be slightly more overhead on the per-cpu write lock path due to checking each CPUs fast path read variable but this overhead is likely be hidden by the required delay of waiting for readers to exit the critical section. The loop is optimised to only iterate over the per-cpu data of active readers of the rwlock. Signed-off-by: Malcolm Crossley --- xen/common/spinlock.c
Re: [Xen-devel] [PATCH 2/3] xen/arm: introduce HYPERVISOR_dom0_op on arm and arm64
>>> On 05.11.15 at 18:09, wrote: > --- a/arch/arm/xen/hypercall.S > +++ b/arch/arm/xen/hypercall.S > @@ -89,6 +89,7 @@ HYPERCALL2(memory_op); > HYPERCALL2(physdev_op); > HYPERCALL3(vcpu_op); > HYPERCALL1(tmem_op); > +HYPERCALL1(dom0_op); Assuming this somehow tries to mirror x86 naming - time to rename it there? I don't see why you'd want to introduce a dom0_op when it has been renamed to platform_op many years ago - see public/dom0_ops.h. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Linux 4.4 MW: Boot under Xen fails with CONFIG_DEBUG_WX enabled: RIP: ptdump_walk_pgd_level_core
On 11/05/2015 04:13 AM, Sander Eikelenboom wrote: It makes "cat /sys/kernel/debug/kernel_page_tables" work and prevents a kernel with CONFIG_DEBUG_WX=y from crashing at boot. Great. Our nightly runs also failed spectacularly due to this bug. It now does give a warning about an insecure W+X mapping, so CONFIG_DEBUG_WX=y seems to be working. No idea how to interpret it though (and if it's a legit warning). -- Sander [ 19.034706] Freeing unused kernel memory: 1104K (822fc000 - 8241) [ 19.041339] Write protecting the kernel read-only data: 18432k [ 19.052596] Freeing unused kernel memory: 1144K (880001ae2000 - 880001c0) [ 19.060285] Freeing unused kernel memory: 1560K (88000207a000 - 88000220) [ 19.067079] [ cut here ] [ 19.073931] WARNING: CPU: 5 PID: 1 at arch/x86/mm/dump_pagetables.c:225 note_page+0x619/0x7e0() Yes, this apparently is a known issue: https://lkml.org/lkml/2015/11/4/476 -boris ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [VOTE] Release cycle scheme
On Mon, 2 Nov 2015, Wei Liu wrote: > Hi committers, I am not a xen.git committer, but I am the qemu-xen.git committer, and since I maintain all the qemu-xen stable trees and releases, I think that my vote should count, at least for this proposal. > There doesn't seem to be consensus on how release cycle should be > managed. In the survey [0] about release cycle there were following > proposed schemes: > > #1. 6 months release cycle + current stable release scheme > #2. 6 months release cycle + LTS scheme > #3. 6 months release cycle + extended security support > #4. 9 months release cycle + current stable release scheme (no change > at all) > > And the tally: > > #1 #2 #3 #4 > George+1 +2 -2 > Dario +1 +2 -2 > Stefano +1 +2 -2 > Ian C +1 +1 +1 -1 > Olaf +1 0 +1 0 > Juergen0 -1 +1 > Ian J +2 +1 +1 -2 > Andrew+1 +1 -1 > Jan -1 -1 0 +1 > > > There are comments made by individuals that couldn't be clearly > represent in tally. The most acceptable option to stable tree > maintainers is #1. > > So I propose we use the following scheme: > > - 6 months release cycle from unstable branch. > - 4 months development. > - 2 months freeze. > - Eat into next cycle if doesn't release on time. +2 > - Fixed cut-off date: the Fridays of the week in which the last day of > March and September falls. +1 > - No more freeze exception, but heads-up mails about freeze will be > sent a few weeks before hand. +1 > - Stable branch maintained for 18 months full support plus 18 months > security support. No mixed maintainership for stable trees. -1 If I need to give an overall vote, I'll give +1. > Please vote to ack or nack this proposal. > > > Thanks > Wei. > > [0]: <20151012173222.ge2...@zion.uk.xensource.com> > > ___ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel > ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Linux 4.4 MW: Boot under Xen fails with CONFIG_DEBUG_WX enabled: RIP: ptdump_walk_pgd_level_core
Thursday, November 5, 2015, 2:53:40 PM, you wrote: > On 11/05/2015 04:13 AM, Sander Eikelenboom wrote: >> >> It makes "cat /sys/kernel/debug/kernel_page_tables" work and >> prevents a kernel with CONFIG_DEBUG_WX=y from crashing at boot. > Great. Our nightly runs also failed spectacularly due to this bug. >> >> It now does give a warning about an insecure W+X mapping, so >> CONFIG_DEBUG_WX=y >> seems to be working. No idea how to interpret it though (and if it's a >> legit >> warning). >> >> -- >> Sander >> >> [ 19.034706] Freeing unused kernel memory: 1104K (822fc000 - >> 8241) >> [ 19.041339] Write protecting the kernel read-only data: 18432k >> [ 19.052596] Freeing unused kernel memory: 1144K (880001ae2000 - >> 880001c0) >> [ 19.060285] Freeing unused kernel memory: 1560K (88000207a000 - >> 88000220) >> [ 19.067079] [ cut here ] >> [ 19.073931] WARNING: CPU: 5 PID: 1 at >> arch/x86/mm/dump_pagetables.c:225 note_page+0x619/0x7e0() > Yes, this apparently is a known issue: https://lkml.org/lkml/2015/11/4/476 > -boris Ah thx for the pointer :) -- Sander ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v11 2/5] missing include asm/paravirt.h in cputime.c
On Thu, 5 Nov 2015, Peter Zijlstra wrote: > How can this be missing? Things compile fine now, right? Fair enough. > So please better explain why we do this change. asm/paravirt.h is included by one of the other headers included in kernel/sched/cputime.c on x86, but not on other architecures. On arm and arm64, where I am about to introduce asm/paravirt.h and stolen time support, without #include in cputime.c I would get: kernel/sched/cputime.c: In function ‘steal_account_process_tick’: kernel/sched/cputime.c:260:24: error: ‘paravirt_steal_enabled’ undeclared (first use in this function) if (static_key_false(¶virt_steal_enabled)) { A bit of digging on x86 (using gcc -E on cputime.c) tells me that asm/paravirt.h is coming from the following include chain: #include #include #include #include #include #include #include #include #include ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 2/2] arm: export platform_op XENPF_settime
Hi Stefano, You forgot to CC Daniel for the XSM part. Please use scripts/get_maintainers.pl to get the relevant maintainers. On 05/11/15 16:57, Stefano Stabellini wrote: > Call update_domain_wallclock_time at domain initialization. It's not really what you are doing in the code. You are calling update_domain_wallclock_time when the first vCPU is initialized. Also some rationale to explain why this call should be done here would be good. Finally, I'm a bit surprised that you only need to call update_domain_wallclock_time when the domain is created. x86 needs to call in various places. For instance we may want to call update_domain_wallclock_time in construct_dom0 before clearing the pause flags. This is because the wallclock may be out of sync as construction DOM0 takes some time. > Signed-off-by: Stefano Stabellini > Signed-off-by: Ian Campbell > --- > xen/arch/arm/Makefile |1 + > xen/arch/arm/domain.c |3 ++ > xen/arch/arm/platform_hypercall.c | 62 > + > xen/arch/arm/traps.c |1 + > xen/include/xsm/dummy.h | 12 +++ > xen/include/xsm/xsm.h | 13 You also have to fix xsm/flask/hooks.c. > 6 files changed, 80 insertions(+), 12 deletions(-) > create mode 100644 xen/arch/arm/platform_hypercall.c [..] > diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c > index b2bfc7d..ac9b1b3 100644 > --- a/xen/arch/arm/domain.c > +++ b/xen/arch/arm/domain.c > @@ -742,6 +742,9 @@ int arch_set_info_guest( > v->arch.ttbr1 = ctxt->ttbr1; > v->arch.ttbcr = ctxt->ttbcr; > > +if ( v->vcpu_id == 0 ) > +update_domain_wallclock_time(v->domain); > + > v->is_initialised = 1; > > if ( ctxt->flags & VGCF_online ) > diff --git a/xen/arch/arm/platform_hypercall.c > b/xen/arch/arm/platform_hypercall.c > new file mode 100644 > index 000..f60d7b3 > --- /dev/null > +++ b/xen/arch/arm/platform_hypercall.c > @@ -0,0 +1,62 @@ > +/** > + * platform_hypercall.c > + * > + * Hardware platform operations. Intended for use by domain-0 kernel. > + * > + * Copyright (c) 2015, Citrix > + */ > + > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > + > +DEFINE_SPINLOCK(xenpf_lock); > + > +long do_platform_op(XEN_GUEST_HANDLE_PARAM(xen_platform_op_t) u_xenpf_op) > +{ Would it make sense to introduce a common platform code which take care of common hypercall? See for instance do_domctl and arch_do_domctl. > +long ret; > +struct xen_platform_op curop, *op = &curop; > + > +if ( copy_from_guest(op, u_xenpf_op, 1) ) > +return -EFAULT; > + > +if ( op->interface_version != XENPF_INTERFACE_VERSION ) > +return -EACCES; > + > +ret = xsm_platform_op(XSM_PRIV, op->cmd); > +if ( ret ) > +return ret; > + > +spin_lock(&xenpf_lock); > + > +switch ( op->cmd ) > +{ > +case XENPF_settime32: > +do_settime(op->u.settime32.secs, > + op->u.settime32.nsecs, > + op->u.settime32.system_time); > +break; Do we really want to support settime32 on ARM? > + > +case XENPF_settime64: > +if ( likely(!op->u.settime64.mbz) ) > +do_settime(op->u.settime64.secs, > + op->u.settime64.nsecs, > + op->u.settime64.system_time); > +else > +ret = -EINVAL; > +break; > + > +default: > +ret = -ENOSYS; > +break; > +} > + > +spin_unlock(&xenpf_lock); > +return ret; > +} Regards, -- Julien Grall ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v4 2/9] xen: add generic flag to elf_dom_parms indicating support of unmapped initrd
On 05/11/15 14:36, Juergen Gross wrote: > Support of an unmapped initrd is indicated by the kernel of the domain > via elf notes. In order not to have to use raw elf data in the tools > for support of an unmapped initrd add a flag to the parsed data area > to indicate the kernel supporting this feature. > > Switch using this flag in the hypervisor domain builder. > > Cc: andrew.coop...@citrix.com > Cc: jbeul...@suse.com > Cc: k...@xen.org > Suggested-by: Ian Campbell > Signed-off-by: Juergen Gross > Acked-by: Jan Beulich Reviewed-by: Andrew Cooper ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel