date:20200820

[PATCH] xen: don't reschedule in preemption off sections

2020-08-20 Thread Juergen Gross

For support of long running hypercalls xen_maybe_preempt_hcall() is
calling cond_resched() in case a hypercall marked as preemptible has
been interrupted.

Normally this is no problem, as only hypercalls done via some ioctl()s
are marked to be preemptible. In rare cases when during such a
preemptible hypercall an interrupt occurs and any softirq action is
started from irq_exit(), a further hypercall issued by the softirq
handler will be regarded to be preemptible, too. This might lead to
rescheduling in spite of the softirq handler potentially having set
preempt_disable(), leading to splats like:

BUG: sleeping function called from invalid context at drivers/xen/preempt.c:37
in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 20775, name: xl
INFO: lockdep is turned off.
CPU: 1 PID: 20775 Comm: xl Tainted: G D W 5.4.46-1_prgmr_debug.el7.x86_64 #1
Call Trace:

dump_stack+0x8f/0xd0
___might_sleep.cold.76+0xb2/0x103
xen_maybe_preempt_hcall+0x48/0x70
xen_do_hypervisor_callback+0x37/0x40
RIP: e030:xen_hypercall_xen_version+0xa/0x20
Code: ...
RSP: e02b:c900400dcc30 EFLAGS: 0246
RAX: 0004000d RBX: 0200 RCX: 8100122a
RDX: 88812e788000 RSI:  RDI: 
RBP: 83ee3ad0 R08: 0001 R09: 0001
R10:  R11: 0246 R12: 8881824aa0b0
R13: 000865496000 R14: 000865496000 R15: 88815d04
? xen_hypercall_xen_version+0xa/0x20
? xen_force_evtchn_callback+0x9/0x10
? check_events+0x12/0x20
? xen_restore_fl_direct+0x1f/0x20
? _raw_spin_unlock_irqrestore+0x53/0x60
? debug_dma_sync_single_for_cpu+0x91/0xc0
? _raw_spin_unlock_irqrestore+0x53/0x60
? xen_swiotlb_sync_single_for_cpu+0x3d/0x140
? mlx4_en_process_rx_cq+0x6b6/0x1110 [mlx4_en]
? mlx4_en_poll_rx_cq+0x64/0x100 [mlx4_en]
? net_rx_action+0x151/0x4a0
? __do_softirq+0xed/0x55b
? irq_exit+0xea/0x100
? xen_evtchn_do_upcall+0x2c/0x40
? xen_do_hypervisor_callback+0x29/0x40

? xen_hypercall_domctl+0xa/0x20
? xen_hypercall_domctl+0x8/0x20
? privcmd_ioctl+0x221/0x990 [xen_privcmd]
? do_vfs_ioctl+0xa5/0x6f0
? ksys_ioctl+0x60/0x90
? trace_hardirqs_off_thunk+0x1a/0x20
? __x64_sys_ioctl+0x16/0x20
? do_syscall_64+0x62/0x250
? entry_SYSCALL_64_after_hwframe+0x49/0xbe

Fix that by testing preempt_count() before calling cond_resched().

In kernel 5.8 this can't happen any more due to the entry code rework
(more than 100 patches, so not a candidate for backporting).

The issue was introduced in kernel 4.3, so this patch should go into
all stable kernels in [4.3 ... 5.7].

Reported-by: Sarah Newman 
Fixes: 0fa2f5cb2b0ecd8 ("sched/preempt, xen: Use need_resched() instead of 
should_resched()")
Cc: Sarah Newman 
Cc: sta...@vger.kernel.org
Signed-off-by: Juergen Gross 
Tested-by: Chris Brannon 
---
 drivers/xen/preempt.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/xen/preempt.c b/drivers/xen/preempt.c
index 17240c5325a3..6ad87b5c95ed 100644
--- a/drivers/xen/preempt.c
+++ b/drivers/xen/preempt.c
@@ -27,7 +27,7 @@ EXPORT_SYMBOL_GPL(xen_in_preemptible_hcall);
 asmlinkage __visible void xen_maybe_preempt_hcall(void)
 {
if (unlikely(__this_cpu_read(xen_in_preemptible_hcall)
-&& need_resched())) {
+&& need_resched() && !preempt_count())) {
/*
 * Clear flag as we may be rescheduled on a different
 * cpu.
-- 
2.26.2

Re: [PATCH v2 2/5] drm/xen-front: Fix misused IS_ERR_OR_NULL checks

2020-08-20 Thread Oleksandr Andrushchenko

Hi,

On 8/20/20 2:56 AM, Sasha Levin wrote:
> Hi
>
> [This is an automated email]
>
> This commit has been processed because it contains a "Fixes:" tag
> fixing commit: c575b7eeb89f ("drm/xen-front: Add support for Xen PV display 
> frontend").
>
> The bot has tested the following trees: v5.8.1, v5.7.15, v5.4.58, v4.19.139.
>
> v5.8.1: Build OK!
> v5.7.15: Build OK!
> v5.4.58: Failed to apply! Possible dependencies:
>  4c1cb04e0e7a ("drm/xen: fix passing zero to 'PTR_ERR' warning")
>  93adc0c2cb72 ("drm/xen: Simplify fb_create")
>
> v4.19.139: Failed to apply! Possible dependencies:
>  4c1cb04e0e7a ("drm/xen: fix passing zero to 'PTR_ERR' warning")
>  93adc0c2cb72 ("drm/xen: Simplify fb_create")
>
>
> NOTE: The patch will not be queued to stable trees until it is upstream.
>
> How should we proceed with this patch?
>
This is because of commit 4c1cb04e0e7ac4ba1ef5457929ef9b5671d9eed3

was not CCed to stable. So, if we want the patch to be applied to older stable

kernels we also need this patch as well.

Thank you,

Oleksandr

Re: Xen 4.14.0 fails on Dell IoT Gateway without efi=no-rs

2020-08-20 Thread Jan Beulich

On 20.08.2020 00:50, Roman Shaposhnik wrote:
> below you can see a trace of Xen 4.14.0 failing on Dell IoT Gateway 3001
> without efi=no-rs. Please let me know if I can provide any additional
> information.

One of the usual firmware issues:

> Xen 4.14.0
> (XEN) Xen version 4.14.0 (@) (gcc (Alpine 6.4.0) 6.4.0) debug=n  Sat Jul 25
> 23:45:43 UTC 2020
> (XEN) Latest ChangeSet:
> (XEN) Bootloader: GRUB 2.03
> (XEN) Command line: com1=115200,8n1 console=com1 dom0_mem=1024M,max:1024M
> dom0_max_vcpus=1 dom0_vcpus_pin
> (XEN) Xen image load base address: 0x7100
> (XEN) Video information:
> (XEN)  VGA is text mode 80x25, font 8x16
> (XEN) Disc information:
> (XEN)  Found 0 MBR signatures
> (XEN)  Found 1 EDD information structures
> (XEN) EFI RAM map:
> (XEN)  [, 0003efff] (usable)
> (XEN)  [0003f000, 0003] (ACPI NVS)
> (XEN)  [0004, 0009] (usable)
> (XEN)  [0010, 1fff] (usable)
> (XEN)  [2000, 200f] (reserved)
> (XEN)  [2010, 76ccafff] (usable)
> (XEN)  [76ccb000, 76d42fff] (reserved)
> (XEN)  [76d43000, 76d53fff] (ACPI data)
> (XEN)  [76d54000, 772ddfff] (ACPI NVS)
> (XEN)  [772de000, 775f4fff] (reserved)
> (XEN)  [775f5000, 775f5fff] (usable)
> (XEN)  [775f6000, 77637fff] (reserved)
> (XEN)  [77638000, 789e4fff] (usable)
> (XEN)  [789e5000, 78ff9fff] (reserved)
> (XEN)  [78ffa000, 78ff] (usable)
> (XEN)  [e000, efff] (reserved)
> (XEN)  [fec0, fec00fff] (reserved)
> (XEN)  [fed01000, fed01fff] (reserved)
> (XEN)  [fed03000, fed03fff] (reserved)
> (XEN)  [fed08000, fed08fff] (reserved)
> (XEN)  [fed0c000, fed0] (reserved)
> (XEN)  [fed1c000, fed1cfff] (reserved)
> (XEN)  [fee0, fee00fff] (reserved)
> (XEN)  [fef0, feff] (reserved)
> (XEN)  [ff90, ] (reserved)
> (XEN) System RAM: 1919MB (1965176kB)
> (XEN) ACPI: RSDP 76D46000, 0024 (r2   DELL)
> (XEN) ACPI: XSDT 76D46088, 0094 (r1   DELL AS09  1072009 AMI 10013)
> (XEN) ACPI: FACP 76D52560, 010C (r5   DELL AS09  1072009 AMI 10013)
> (XEN) ACPI: DSDT 76D461B0, C3AF (r2   DELL AS09  1072009 INTL 20120913)
> (XEN) ACPI: FACS 772DDE80, 0040
> (XEN) ACPI: APIC 76D52670, 0068 (r3   DELL AS09  1072009 AMI 10013)
> (XEN) ACPI: FPDT 76D526D8, 0044 (r1   DELL AS09  1072009 AMI 10013)
> (XEN) ACPI: FIDT 76D52720, 009C (r1   DELL AS09  1072009 AMI 10013)
> (XEN) ACPI: MCFG 76D527C0, 003C (r1   DELL AS09  1072009 MSFT   97)
> (XEN) ACPI: LPIT 76D52800, 0104 (r1   DELL AS093 VLV2  10D)
> (XEN) ACPI: HPET 76D52908, 0038 (r1   DELL AS09  1072009 AMI.5)
> (XEN) ACPI: SSDT 76D52940, 0763 (r1   DELL AS09 3000 INTL 20061109)
> (XEN) ACPI: SSDT 76D530A8, 0290 (r1   DELL AS09 3000 INTL 20061109)
> (XEN) ACPI: SSDT 76D53338, 017A (r1   DELL AS09 3000 INTL 20061109)
> (XEN) ACPI: UEFI 76D534B8, 0042 (r1   DELL AS090 0)
> (XEN) ACPI: CSRT 76D53500, 014C (r0   DELL AS095 INTL 20120624)
> (XEN) ACPI: TPM2 76D53650, 0034 (r3Tpm2Tabl1 AMI 0)
> (XEN) ACPI: SSDT 76D53688, 00C9 (r1   MSFT  RHPROXY1 INTL 20120913)
> (XEN) Domain heap initialised
> (XEN) ACPI: 32/64X FACS address mismatch in FADT -
> 772dde80/, using 32
> (XEN) IOAPIC[0]: apic_id 1, version 32, address 0xfec0, GSI 0-86
> (XEN) Enabling APIC mode:  Flat.  Using 1 I/O APICs
> (XEN) CPU0: 400..1000 MHz
> (XEN) Speculative mitigation facilities:
> (XEN)   Hardware features:
> (XEN)   Compiled-in support: SHADOW_PAGING
> (XEN)   Xen settings: BTI-Thunk N/A, SPEC_CTRL: No, Other: BRANCH_HARDEN
> (XEN)   Support for HVM VMs: RSB
> (XEN)   Support for PV VMs: RSB
> (XEN)   XPTI (64-bit PV only): Dom0 enabled, DomU enabled (without PCID)
> (XEN)   PV L1TF shadowing: Dom0 disabled, DomU disabled
> (XEN) Using scheduler: SMP Credit Scheduler rev2 (credit2)
> (XEN) Initializing Credit2 scheduler
> (XEN) Disabling HPET for being unreliable
> (XEN) Platform timer is 3.580MHz ACPI PM Timer
> (XEN) Detected 1333.397 MHz processor.
> (XEN) Unknown cachability for MFNs 0xff900-0xf

The fault address falling in this range suggests you can use a less
heavy workaround: "efi=attr=uc". (Quite possibly "efi=no-rs" or yet
some other workaround may still be needed for your subsequent reboot
hang.)

> (XEN) I/O virtualisation disabled
> (XEN) ENABLING IO-APIC IRQs
> (XEN)  -> Using new ACK method
> (XEN) [ Xen-4.14.0  x86_64  debug=n   Not tainted ]

I general please try to repro issues with a "debug=y" build, such
that ...

> (XEN) CPU:0
> (XEN) RIP:e008:[<7

Re: [PATCH] xen: Introduce cmpxchg64() and guest_cmpxchg64()

2020-08-20 Thread Julien Grall





On 19/08/2020 10:22, Jan Beulich wrote:

On 17.08.2020 15:03, Julien Grall wrote:

On 17/08/2020 12:50, Roger Pau Monné wrote:

On Mon, Aug 17, 2020 at 12:05:54PM +0100, Julien Grall wrote:

The only way I could see to make it work would be to use the same trick as
we do for {read, write}_atomic() (see asm-arm/atomic.h). We are using union
and void pointer to prevent explicit cast.


I'm mostly worried about common code having assumed that cmpxchg
does also handle 64bit sized parameters, and thus failing to use
cmpxchg64 when required. I assume this is not much of a deal as then
the Arm 32 build would fail, so it should be fairly easy to catch
those.

FWIW, this is not very different to the existing approach. If one would
use cmpxchg() with 64-bit, then it would fail to compile.


A somewhat related question then: Do you really need both the
guest_* and the non-guest variants? Limiting things to plain
cmpxchg() would further reduce the risk of someone picking the
wrong one without right away noticing the build issue on Arm32.
For guest_cmpxchg{,64}() I think there's less of a risk.


For the IOREQ code, we will need the guest_* version that is built on 
top of the non-guest variant.


I would like at least consistency between the two variants. IOW, if we 
decide to use the name guest_cmpxchg64(), then I would like to use 
cmpxchg64().


I still need to explore the code generated by cmpxchg() if I include 
support for 64-bit.


Cheers,

--
Julien Grall

Re: [PATCH] xen: Introduce cmpxchg64() and guest_cmpxchg64()

2020-08-20 Thread Jan Beulich

On 20.08.2020 11:14, Julien Grall wrote:
> 
> 
> On 19/08/2020 10:22, Jan Beulich wrote:
>> On 17.08.2020 15:03, Julien Grall wrote:
>>> On 17/08/2020 12:50, Roger Pau Monné wrote:
 On Mon, Aug 17, 2020 at 12:05:54PM +0100, Julien Grall wrote:
> The only way I could see to make it work would be to use the same trick as
> we do for {read, write}_atomic() (see asm-arm/atomic.h). We are using 
> union
> and void pointer to prevent explicit cast.

 I'm mostly worried about common code having assumed that cmpxchg
 does also handle 64bit sized parameters, and thus failing to use
 cmpxchg64 when required. I assume this is not much of a deal as then
 the Arm 32 build would fail, so it should be fairly easy to catch
 those.
>>> FWIW, this is not very different to the existing approach. If one would
>>> use cmpxchg() with 64-bit, then it would fail to compile.
>>
>> A somewhat related question then: Do you really need both the
>> guest_* and the non-guest variants? Limiting things to plain
>> cmpxchg() would further reduce the risk of someone picking the
>> wrong one without right away noticing the build issue on Arm32.
>> For guest_cmpxchg{,64}() I think there's less of a risk.
> 
> For the IOREQ code, we will need the guest_* version that is built on 
> top of the non-guest variant.
> 
> I would like at least consistency between the two variants. IOW, if we 
> decide to use the name guest_cmpxchg64(), then I would like to use 
> cmpxchg64().

On Arm, that is. There wouldn't be any need to expose cmpxchg64()
for use in common code, and hence not at all on x86, I guess?

Jan

Re: [PATCH 0/2] Enable 1165522 Errata for Neovers

2020-08-20 Thread Julien Grall


Hi,

On 18/08/2020 14:47, Bertrand Marquis wrote:

This patch serie is adding Neoverse N1 processor identification and
enabling the processor errata 1165522 for Neoverse N1 processors.

Bertrand Marquis (2):
   arm: Add Neoverse N1 processor identifation
   xen/arm: Enable CPU Errata 1165522 for Neoverse


Committed, thank you!

Cheers,

--
Julien Grall

Re: [PATCH] efi: discover ESRT table on Xen PV too

2020-08-20 Thread Marek Marczykowski-Górecki

On Thu, Aug 20, 2020 at 11:30:25AM +0200, Roger Pau Monné wrote:
> Right, so you only need access to the ESRT table, that's all. Then I
> think we need to make sure Xen doesn't use this memory for anything
> else, which will require some changes in Xen (or at least some
> checks?).
> 
> We also need to decide what to do if the table turns out to be placed
> in a wrong region. How are we going to prevent dom0 from using it
> then? My preference would be to completely hide it from dom0 in that
> case, such that it believes there's no ESRT at all if possible.

Yes, that makes sense. As discussed earlier, that probably means
re-constructing SystemTable before giving it to dom0. We'd need to do
that in PVH case anyway, to adjust addresses, right? Is there something
like this in the Xen codebase already, or it needs to be written from
scratch?

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?


signature.asc
Description: PGP signature

Re: [PATCH] xen: Introduce cmpxchg64() and guest_cmpxchg64()

2020-08-20 Thread Julien Grall





On 20/08/2020 10:25, Jan Beulich wrote:

On 20.08.2020 11:14, Julien Grall wrote:



On 19/08/2020 10:22, Jan Beulich wrote:

On 17.08.2020 15:03, Julien Grall wrote:

On 17/08/2020 12:50, Roger Pau Monné wrote:

On Mon, Aug 17, 2020 at 12:05:54PM +0100, Julien Grall wrote:

The only way I could see to make it work would be to use the same trick as
we do for {read, write}_atomic() (see asm-arm/atomic.h). We are using union
and void pointer to prevent explicit cast.


I'm mostly worried about common code having assumed that cmpxchg
does also handle 64bit sized parameters, and thus failing to use
cmpxchg64 when required. I assume this is not much of a deal as then
the Arm 32 build would fail, so it should be fairly easy to catch
those.

FWIW, this is not very different to the existing approach. If one would
use cmpxchg() with 64-bit, then it would fail to compile.


A somewhat related question then: Do you really need both the
guest_* and the non-guest variants? Limiting things to plain
cmpxchg() would further reduce the risk of someone picking the
wrong one without right away noticing the build issue on Arm32.
For guest_cmpxchg{,64}() I think there's less of a risk.


For the IOREQ code, we will need the guest_* version that is built on
top of the non-guest variant.

I would like at least consistency between the two variants. IOW, if we
decide to use the name guest_cmpxchg64(), then I would like to use
cmpxchg64().


On Arm, that is. There wouldn't be any need to expose cmpxchg64()
for use in common code, and hence not at all on x86, I guess?


Right, we would only need to introduce guest_cmpxchg64() for common code.

Cheers,

--
Julien Grall

Re: [PATCH] efi: discover ESRT table on Xen PV too

2020-08-20 Thread Ard Biesheuvel

On Thu, 20 Aug 2020 at 11:30, Roger Pau Monné  wrote:
>
> On Wed, Aug 19, 2020 at 01:33:39PM +0200, Norbert Kaminski wrote:
> >
> > On 19.08.2020 10:19, Roger Pau Monné wrote:
> > > On Tue, Aug 18, 2020 at 08:40:18PM +0200, Marek Marczykowski-Górecki 
> > > wrote:
> > > > On Tue, Aug 18, 2020 at 07:21:14PM +0200, Roger Pau Monné wrote:
> > > > > > Let me draw the picture from the beginning.
> > > > > Thanks, greatly appreciated.
> > > > >
> > > > > > EFI memory map contains various memory regions. Some of them are 
> > > > > > marked
> > > > > > as not needed after ExitBootServices() call (done in Xen before
> > > > > > launching dom0). This includes EFI_BOOT_SERVICES_DATA and
> > > > > > EFI_BOOT_SERVICES_CODE.
> > > > > >
> > > > > > EFI SystemTable contains pointers to various ConfigurationTables -
> > > > > > physical addresses (at least in this case). Xen does interpret some 
> > > > > > of
> > > > > > them, but not ESRT. Xen pass the whole (address of) SystemTable to 
> > > > > > Linux
> > > > > > dom0 (at least in PV case). Xen doesn't do anything about tables it
> > > > > > doesn't understand.
> > > > > >
> > > > > > Now, the code in Linux takes the (ESRT) table address early and 
> > > > > > checks
> > > > > > the memory map for it. We have 3 cases:
> > > > > >   - it points at area marked as neither EFI_*_SERVICES_DATA, nor 
> > > > > > with
> > > > > > EFI_MEMORY_RUNTIME attribute -> Linux refuse to use it
> > > > > >   - it points to EFI_RUNTIME_SERVICES_DATA or with 
> > > > > > EFI_MEMORY_RUNTIME
> > > > > > attribute - Linux uses the table; memory map already says the 
> > > > > > area
> > > > > > belongs to EFI and the OS should not use it for something else
> > > > > >   - it points to EFI_BOOT_SERVICES_DATA - Linux mark the area as 
> > > > > > reserved
> > > > > > to not release it after calling ExitBootServices()
> > > > > >
> > > > > > The problematic is the third case - at the time when Linux dom0 is 
> > > > > > run,
> > > > > > ExitBootServices() was already called and EFI_BOOT_SERVICES_* 
> > > > > > memory was
> > > > > > already released. It could be already used for something else (for
> > > > > > example Xen could overwrite it while loading dom0).
> > > > > >
> > > > > > Note the problematic case should be the most common - UEFI 
> > > > > > specification
> > > > > > says "The ESRT shall be stored in memory of type 
> > > > > > EfiBootServicesData"
> > > > > > (chapter 22.3 of UEFI Spec v2.6).
> > > > > >
> > > > > > For this reason, to use ESRT in dom0, Xen should do something about 
> > > > > > it
> > > > > > before ExitBootServices() call. While analyzing all the EFI tables 
> > > > > > is
> > > > > > probably not a viable option, it can do some simple action:
> > > > > >   - retains all the EFI_BOOT_SERVICES_* areas - there is already 
> > > > > > code
> > > > > > for that, controlled with /mapbs boot switch (to xen.efi, would 
> > > > > > need
> > > > > > another option for multiboot2+efi)
> > > > > >   - have a list of tables to retain - since Xen already do analyze 
> > > > > > some
> > > > > > of the ConfigurationTables, it can also have a list of those to
> > > > > > preserve even if they live in EFI_BOOT_SERVICES_DATA. In this 
> > > > > > case,
> > > > > > while Xen doesn't need to parse the whole table, it need to 
> > > > > > parse it's
> > > > > > header to get the table size - to reserve that memory and not 
> > > > > > reuse
> > > > > > it after ExitBootServices().
> > > > > Xen seems to already contain skeleton
> > > > > XEN_EFI_query_capsule_capabilities and XEN_EFI_update_capsule
> > > > > hypercalls which is what should be used in order to perform the
> > > > > updates?
> > > > I think those covers only runtime service calls similarly named. But you
> > > > need also ESRT table to collect info about devices that you can even
> > > > attempt to update.
> > > Right, the ESRT must be available so that dom0 can discover the
> > > resources.
> > >
> > > > TBH, I'm not sure if those runtime services are really needed. I think
> > > > Norbert succeeded UEFI update from within Xen PV dom0 with just access
> > > > to the ESRT table, but without those services.
> > > >
> > Marek is right here. I was able to successfully update and downgrade
> > UFEI when the ESRT table was provided to the Xen PV dom0. I didn't
> > need any extra services to make the UEFI capsule update work.
>
> OK, I think that's using the method described in 8.5.5 of delivery of
> Capsules via file on Mass Storage, which doesn't use the
> UpdateCapsule() runtime API?
>

No, it doesn't even do that. It uses its own .efi binary to invoke
UpdateCapsule() after a reboot, by setting up the BootNext variable to
override the boot target for the next boot only.

> Using such method doesn't require QueryCapsuleCapabilities(), as
> that's used to know whether a certain capsule can be updated via
> UpdateCapsule().
>

That is a bit of a downside here. But the reason

Re: [PATCH] xen/x86: irq: Avoid a TOCTOU race in pirq_spin_lock_irq_desc()

2020-08-20 Thread Julien Grall


Hi Roger,

On 18/08/2020 09:35, Roger Pau Monné wrote:

On Mon, Aug 17, 2020 at 06:56:24PM +0100, Julien Grall wrote:



On 17/08/2020 18:33, Roger Pau Monné wrote:

On Mon, Aug 17, 2020 at 04:53:51PM +0100, Julien Grall wrote:



On 17/08/2020 16:03, Roger Pau Monné wrote:

On Mon, Aug 17, 2020 at 03:39:52PM +0100, Julien Grall wrote:



On 17/08/2020 15:01, Roger Pau Monné wrote:

On Mon, Aug 17, 2020 at 02:14:01PM +0100, Julien Grall wrote:

Hi,

On 17/08/2020 13:46, Roger Pau Monné wrote:

On Fri, Aug 14, 2020 at 08:25:28PM +0100, Julien Grall wrote:

Hi Andrew,

Sorry for the late answer.

On 23/07/2020 14:59, Andrew Cooper wrote:

On 23/07/2020 14:22, Julien Grall wrote:

Hi Jan,

On 23/07/2020 12:23, Jan Beulich wrote:

On 22.07.2020 18:53, Julien Grall wrote:

--- a/xen/arch/x86/irq.c
+++ b/xen/arch/x86/irq.c
@@ -1187,7 +1187,7 @@ struct irq_desc *pirq_spin_lock_irq_desc(
      for ( ; ; )
    {
-    int irq = pirq->arch.irq;
+    int irq = read_atomic(&pirq->arch.irq);


There we go - I'd be fine this way, but I'm pretty sure Andrew
would want this to be ACCESS_ONCE(). So I guess now is the time
to settle which one to prefer in new code (or which criteria
there are to prefer one over the other).


I would prefer if we have a single way to force the compiler to do a
single access (read/write).


Unlikely to happen, I'd expect.

But I would really like to get rid of (or at least rename)
read_atomic()/write_atomic() specifically because they've got nothing to
do with atomic_t's and the set of functionality who's namespace they share.


Would you be happy if I rename both to READ_ONCE() and WRITE_ONCE()? I would
also suggest to move them implementation in a new header asm/lib.h.


Maybe {READ/WRITE}_SINGLE (to note those should be implemented using a
single instruction)?


The asm volatile statement contains only one instruction, but this doesn't
mean the helper will generate a single instruction.


Well, the access should be done using a single instruction, which is
what we care about when using this helpers.


You may have other instructions to get the registers ready for the access.



ACCESS_ONCE (which also has the _ONCE suffix) IIRC could be
implemented using several instructions, and hence doesn't seem right
that they all have the _ONCE suffix.


The goal here is the same, we want to access the variable *only* once.


Right, but this is not guaranteed by the current implementation of
ACCESS_ONCE AFAICT, as the compiler *might* split the access into two
(or more) instructions, and hence won't be an atomic access anymore?

   From my understanding, at least on GCC/Clang, ACCESS_ONCE() should be atomic
if you are using aligned address and the size smaller than a register size.


Yes, any sane compiler shouldn't split such access, but this is not
guaranteed by the current code in ACCESS_ONCE.

To be sure, your concern here is not about GCC/Clang but other compilers. Am
I correct?


Or about the existing ones switching behavior, which is again quite
unlikely I would like to assume.


The main goal of the macro is to mark place which require the variable to be
accessed once. So, in the unlikely event this may happen, it would be easy
to modify the implementation.




We already have a collection of compiler specific macros in compiler.h. So
how about we classify this macro as a compiler specific one? (See more
below).






May I ask why we would want to expose the difference to the user?


I'm not saying we should, but naming them using the _ONCE suffix seems
misleading IMO, as they have different guarantees than what
ACCESS_ONCE currently provides.


Lets leave aside how ACCESS_ONCE() is implemented for a moment.

If ACCESS_ONCE() doesn't guarantee atomicy, then it means you may read a mix
of the old and new value. This would most likely break quite a few of the
users because the result wouldn't be coherent.

Do you have place in mind where the non-atomicity would be useful?


Not that I'm aware, I think they could all be safely switched to use
the atomic variants

There is concern that read_atomic(), write_atomic() prevent the compiler to
do certain optimization. Andrew gave the example of:

ACCESS_ONCE(...) |= ...


I'm not sure how will that behave when used with a compile known
value that's smaller than the size of the destination. Could the
compiler optimize this as a partial read/write if only the lower byte
is modified for example?


Here what Andrew wrote in a previous answer:

"Which a sufficiently clever compiler could convert to a single `or $val,
ptr` instruction on x86, while read_atomic()/write_atomic() would force it
to be `mov ptr, %reg; or $val, %reg; mov %reg, ptr`."

On Arm, a RwM operation will still not be atomic as it would require 3
instructions.


I don't think we should rely on this behavior of ACCESS_ONCE (OR being
translated into a single instruction), as it seems to even be more
fragile than relying on ACCESS_ONCE performing reads and writes
acce

Re: [PATCH v2] xen/arm: Convert runstate address during hypcall

2020-08-20 Thread Julien Grall


Hi,

Sorry for the late answer.

On 14/08/2020 10:25, Bertrand Marquis wrote:




On 1 Aug 2020, at 00:03, Stefano Stabellini  wrote:

On Fri, 31 Jul 2020, Bertrand Marquis wrote:

On 31 Jul 2020, at 12:18, Jan Beulich  wrote:

On 31.07.2020 12:12, Julien Grall wrote:

On 31/07/2020 07:39, Jan Beulich wrote:

We're fixing other issues without breaking the ABI. Where's the
problem of backporting the kernel side change (which I anticipate
to not be overly involved)?

This means you can't take advantage of the runstage on existing Linux
without any modification.


If the plan remains to be to make an ABI breaking change,


For a theoritical PoV, this is a ABI breakage. However, I fail to see
how the restrictions added would affect OSes at least on Arm.


"OSes" covering what? Just Linux?


In particular, you can't change the VA -> PA on Arm without going
through an invalid mapping. So I wouldn't expect this to happen for the
runstate.

The only part that *may* be an issue is if the guest is registering the
runstate with an initially invalid VA. Although, I have yet to see that
in practice. Maybe you know?


I'm unaware of any such use, but this means close to nothing.


then I
think this will need an explicit vote.


I was under the impression that the two Arm maintainers (Stefano and I)
already agreed with the approach here. Therefore, given the ABI breakage
is only affecting Arm, why would we need a vote?


The problem here is of conceptual nature: You're planning to
make the behavior of a common hypercall diverge between
architectures, and in a retroactive fashion. Imo that's nothing
we should do even for new hypercalls, if _at all_ avoidable. If
we allow this here, we'll have a precedent that people later
may (and based on my experience will, sooner or later) reference,
to get their own change justified.


Please let's avoid "slippery slope" arguments
(https://en.wikipedia.org/wiki/Slippery_slope)

We shouldn't consider this instance as the first in a long series of bad
decisions on hypercall compatibility. Each new case, if there will be
any, will have to be considered based on its own merits. Also, let's
keep in mind that there have been no other cases in the last 8 years. (I
would like to repeat my support for hypercall ABI compatibility.)


I would also kindly ask not to put the discussion on a "conceptual"
level: there is no way to fix all guests and also keep compatibility.
 From a conceptual point of view, it is already game over :-)



After a discussion with Jan, he is proposing to have a guest config setting to
turn on or off the translation of the address during the hypercall and add a
global Xen command line parameter to set the global default behaviour.
With this was done on arm could be done on x86 and the current behaviour
would be kept by default but possible to modify by configuration.

@Jan: please correct me if i said something wrong
@others: what is your view on this solution ?


Having options to turn on or off the new behavior could be good-to-have
if we find a guest that actually requires the old behavior. Today we
don't know of any such cases. We have strong reasons to believe that
there aren't any on ARM (see Julien's explanation in regards to the
temporary invalid mappings.) In fact, it is one of the factors that led
us to think this patch is the right approach.

That said, I am also OK with adding such a parameter now, but we need to
choose the default value carefully.


I agree with that :).



This would also mean keeping support in the code for old and new behaviour
which might make the code bigger and more complex.


I am concerned with that as well. However, this concern is also going to 
be true if we introduce an hypercall using a physical address as 
parameter. Indeed, the old hypercall will not go away.


If we introduce a second hypercall, you will also have to think about 
the interactions between the two. For instance:
- The firmware may register the runstate using the old hypercall, 
while the OS may register using the new hypercall.

- Can an OS use a mix of the two hypercalls?

For more details, you can have a look at the original attempt for a new 
hypercall (see [1]).


The approach you discussed with Jan has the advantage to not require any 
change in the guest software stack. So this would be my preference over 
a new hypercall.





We need the new behavior as default on ARM because we need the fix to
work for all guests. I don't think we want to explain how you always
need to set config_foobar otherwise things don't work. It has to work
out of the box.

It would be nice if we had the same default on x86 too, although I
understand if Jan and Andrew don't want to make the same change on x86,
at least initially.


So you mean here adding a parameter but only on Arm ?
Should it be a command line parameter ? a configuration parameter ? both ?

It seems that with this patch i touched some kind of sensible area.
Should i just abandon it and see later to wo

[PATCH 0/3] tools/hotplug: Fixes to vif-nat

2020-08-20 Thread Diego Sueiro

This patch series fixes issues around the vif-nat script when
setting up the vif interface and dhcp server in dom0.

It has been validated and used in Yocto and meta-arm-autonomy

Diego Sueiro (3):
  tools/hotplug: Fix hostname setting in vif-nat
  tools/hotplug: Fix dhcpd symlink removal in vif-nat
  tools/hotplug: Extend dhcpd conf, init and arg files search

 tools/hotplug/Linux/vif-nat   | 14 --
 tools/hotplug/Linux/xen-network-common.sh |  6 +++---
 2 files changed, 11 insertions(+), 9 deletions(-)

-- 
2.7.4

Re: u-boot vs. uefi as boot loaders on ARM

2020-08-20 Thread Julien Grall


Hi Roman,

On 16/08/2020 21:45, Roman Shaposhnik wrote:

On Sun, Aug 16, 2020 at 7:54 AM Julien Grall  wrote:

On 15/08/2020 21:43, Roman Shaposhnik wrote:

Hi!


Hi,


with the recent excellent work by Anastasiia committed to the u-boot's
main line, we now have two different ways of bringing ARM DomUs.

Is there any chance someone can educate the general public on pros
and cons of both approaches?

In Project EVE we're still using uefi on ARM (to stay closer to the more
"ARM in the cloud" use case) but perhaps the situation now is more
nuanced?


UEFI is just standard, so I am guessing you are referring to
Tianocore/EDK2. am I correct?


Yes, but I was actually referring to both in a way (I should've been
clearer tho).
To be more explicit my question was around trying to compare a "standardized"
way of botting a generic DomU on ARM (and that standard is UEFI with one
particular implementation that works out of the box with Xen being TC/EDK2) with
a more ad-hoc u-boot style of booting.


Recent version of U-boot are also able to partially UEFI. This means you
could easily use GRUB with U-boot.


Yup -- which complicated things even more. And it is funny you should mention
it, since we actually started with TC/EDK2 for RaspberryPi4 as a board
bootloader,
but quickly switched to u-boot with UEFI shim layer, since it was much smaller,
better supported (still?) and gave us all we needed to boot Xen on RPi4 as a
UEFI payload.


 From my understanding, U-boot is just a bootloader. Therefore it will
not provide runtime services (such as date & time).


It actually does provide some of that (see below)


Cool! Although, it looks mostly related to the environment variable though.




Furthermore, the
interface is less user friendly, you will have to know the memory layout
in order to load binaries.

On the other hand, Tianocore/EDK2 is very similar to what non-embedded
may be used to. It will not require you to know your memory layout. But
this comes at the cost of a more complex bootloader to debug.


That's literally the crux of my question -- trying to understand what use cases
either one of them is meant for. Especially given that this shim layer is now
quite capable:
 https://github.com/ARM-software/u-boot/blob/master/doc/README.uefi#L127


While I can see major differences when using either on baremetal (you 
have better control on the Device-Tree with U-boot), it is much less 
clear in a guest. Maybe Anastasiia can explain why they decided to add 
support in U-boot? :).


Cheers,

--
Julien Grall

[PATCH 1/3] tools/hotplug: Fix hostname setting in vif-nat

2020-08-20 Thread Diego Sueiro

Setting the hostname is failing because the "$XENBUS_PATH/domain"
doesn't exist anymore. To fix this we set it to dom$domid

Signed-off-by: Diego Sueiro 
---
 tools/hotplug/Linux/vif-nat | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/hotplug/Linux/vif-nat b/tools/hotplug/Linux/vif-nat
index a76d9c7..2614435 100644
--- a/tools/hotplug/Linux/vif-nat
+++ b/tools/hotplug/Linux/vif-nat
@@ -85,7 +85,7 @@ router_ip=$(routing_ip "$ip")
 # Split the given IP/bits pair.
 vif_ip=`echo ${ip} | awk -F/ '{print $1}'`
 
-hostname=$(xenstore_read "$XENBUS_PATH/domain" | tr -- '_.:/+' '-')
+hostname=dom$domid
 if [ "$vifid" != "1" ]
 then
   hostname="$hostname-$vifid"
-- 
2.7.4

Re: About VIRTIO support on Xen

2020-08-20 Thread Julien Grall





On 20/08/2020 05:45, Jedi Chen wrote:

Hi xen-devel,


Hi,



I am very interesting about the VIRTIO on Xen. And from one meeting 
report of AGL Virtualization Expert Group (EG-VIRT)
https://wiki.automotivelinux.org/eg-virt-meetings#pm_cest_meeting4, I 
got the information that ARM and Linaro are
upstreaming XEN work incorporating VirtIO. But I can't find any 
information in the mailing list. Is there any

architecture overview or design doc about it?


There is some discussion on xen-devel [1] to add support for Virtio MMIO 
on Arm. This is still in early development, but you should be able to 
get a PoC setup with the work.


Best regards,

[1] <159647-23030-1-git-send-email-olekst...@gmail.com>



Thanks,







--
Julien Grall

[PATCH 2/3] tools/hotplug: Fix dhcpd symlink removal in vif-nat

2020-08-20 Thread Diego Sueiro

Copy temp files used to add/remove dhcpd configurations to avoid
replacing potential symlinks.

Signed-off-by: Diego Sueiro 
---
 tools/hotplug/Linux/vif-nat | 12 +++-
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/tools/hotplug/Linux/vif-nat b/tools/hotplug/Linux/vif-nat
index 2614435..1ab80ed 100644
--- a/tools/hotplug/Linux/vif-nat
+++ b/tools/hotplug/Linux/vif-nat
@@ -99,7 +99,8 @@ dhcparg_remove_entry()
   then
 rm "$tmpfile"
   else
-mv "$tmpfile" "$dhcpd_arg_file"
+cp "$tmpfile" "$dhcpd_arg_file"
+rm "$tmpfile"
   fi
 }
 
@@ -109,11 +110,11 @@ dhcparg_add_entry()
   local tmpfile=$(mktemp)
   # handle Red Hat, SUSE, and Debian styles, with or without quotes
   sed -e 's/^DHCPDARGS="*\([^"]*\)"*/DHCPDARGS="\1'"${dev} "'"/' \
- "$dhcpd_arg_file" >"$tmpfile" && mv "$tmpfile" "$dhcpd_arg_file"
+ "$dhcpd_arg_file" >"$tmpfile" && cp "$tmpfile" "$dhcpd_arg_file"
   sed -e 's/^DHCPD_INTERFACE="*\([^"]*\)"*/DHCPD_INTERFACE="\1'"${dev} "'"/' \
- "$dhcpd_arg_file" >"$tmpfile" && mv "$tmpfile" "$dhcpd_arg_file"
+ "$dhcpd_arg_file" >"$tmpfile" && cp "$tmpfile" "$dhcpd_arg_file"
   sed -e 's/^INTERFACES="*\([^"]*\)"*/INTERFACES="\1'"${dev} "'"/' \
- "$dhcpd_arg_file" >"$tmpfile" && mv "$tmpfile" "$dhcpd_arg_file"
+ "$dhcpd_arg_file" >"$tmpfile" && cp "$tmpfile" "$dhcpd_arg_file"
   rm -f "$tmpfile"
 }
 
@@ -125,7 +126,8 @@ dhcp_remove_entry()
   then
 rm "$tmpfile"
   else
-mv "$tmpfile" "$dhcpd_conf_file"
+cp "$tmpfile" "$dhcpd_conf_file"
+rm "$tmpfile"
   fi
   dhcparg_remove_entry
 }
-- 
2.7.4

[PATCH 3/3] tools/hotplug: Extend dhcpd conf, init and arg files search

2020-08-20 Thread Diego Sueiro

Newer versions of the ISC dhcp server expect the dhcpd.conf file
to be located at /etc/dhcp directory.

Also, some distributions and Yocto based ones have these installation
paths by default: /etc/init.d/{isc-dhcp-server,dhcp-server} and
/etc/default/dhcp-server.

Signed-off-by: Diego Sueiro 
---
 tools/hotplug/Linux/xen-network-common.sh | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/tools/hotplug/Linux/xen-network-common.sh 
b/tools/hotplug/Linux/xen-network-common.sh
index 8dd3a62..be632ce 100644
--- a/tools/hotplug/Linux/xen-network-common.sh
+++ b/tools/hotplug/Linux/xen-network-common.sh
@@ -64,18 +64,18 @@ first_file()
 
 find_dhcpd_conf_file()
 {
-  first_file -f /etc/dhcp3/dhcpd.conf /etc/dhcpd.conf
+  first_file -f /etc/dhcp/dhcpd.conf /etc/dhcp3/dhcpd.conf /etc/dhcpd.conf
 }
 
 
 find_dhcpd_init_file()
 {
-  first_file -x /etc/init.d/{dhcp3-server,dhcp,dhcpd}
+  first_file -x 
/etc/init.d/{isc-dhcp-server,dhcp-server,dhcp3-server,dhcp,dhcpd}
 }
 
 find_dhcpd_arg_file()
 {
-  first_file -f /etc/sysconfig/dhcpd /etc/defaults/dhcp 
/etc/default/dhcp3-server
+  first_file -f /etc/sysconfig/dhcpd /etc/defaults/dhcp 
/etc/default/dhcp-server /etc/default/dhcp3-server
 }
 
 # configure interfaces which act as pure bridge ports:
-- 
2.7.4

Re: About VIRTIO support on Xen

2020-08-20 Thread Julien Grall

On Thu, 20 Aug 2020 at 11:59, Julien Grall  wrote:
>
>
>
> On 20/08/2020 05:45, Jedi Chen wrote:
> > Hi xen-devel,
>
> Hi,
>
> >
> > I am very interesting about the VIRTIO on Xen. And from one meeting
> > report of AGL Virtualization Expert Group (EG-VIRT)
> > https://wiki.automotivelinux.org/eg-virt-meetings#pm_cest_meeting4, I
> > got the information that ARM and Linaro are
> > upstreaming XEN work incorporating VirtIO. But I can't find any
> > information in the mailing list. Is there any
> > architecture overview or design doc about it?
>
> There is some discussion on xen-devel [1] to add support for Virtio MMIO
> on Arm. This is still in early development, but you should be able to
> get a PoC setup with the work.
>
> Best regards,
>
> [1] <159647-23030-1-git-send-email-olekst...@gmail.com>

Sorry I should have added a direct link:

https://lore.kernel.org/xen-devel/159647-23030-1-git-send-email-olekst...@gmail.com/

>
> >
> > Thanks,
> >
> >
> >
>
>
>
> --
> Julien Grall

Re: [Linux] [ARM] Granting memory obtained from the DMA API

2020-08-20 Thread Julien Grall





On 19/08/2020 12:04, Simon Leiner wrote:

Hi everyone,


Hi Simon,



I'm working on a virtio driver for the Linux kernel that supports the
dynamic connection of devices via the Xenbus (as part of a research
project at the Karlsruhe Institute of Technology).


There is a lot of interest to get Virtio working on Xen at the moment. 
Is this going to be a new transport layer for Virtio?



My question concerns the Xenbus client API in the Linux kernel. As this
is my first time posting on the Xen mailing lists, I'm not entirely
sure if this is the right place for this question. If not, feel free to
point me to the right place :-)


Xen-devel is probably most suitable for this discussion, so I moved the 
discussion there. I have also CCed a couple of Linux maintainers that 
should be able to provide feedbacks on the approaches.




Part of virtio is having shared memory. So naturally, I'm using Xen's
grant system for that. Part of the Xenbus client API is the function
xenbus_grant_ring which, by its documentation grants access to a block
of memory starting at vaddr to another domain. I tried using this in my
driver which created the grants and returned without any error, but
after mounting the grants on another domain, it turns out that some
other location in memory was actually granted instead of the one behind
the original vaddr.

So I found the problem: The vaddr that I was using xenbus_grant_ring
with was obtained by dma_alloc_coherent (whereas the other split
drivers included in the mainline kernel use Xen IO rings allocated by
the "regular" mechanisms such as __get_free_page, alloc_page etc.).
But xenbus_grant_ring uses virt_to_gfn to get the GFN for the vaddr
which on ARM(64) must not be used for DMA addresses. So I could fix the
problem by providing a modified version of xenbus_grant_ring as part of
my driver which takes a dma_addr_t instead of a void* for the start
address, gets the PFN via dma_to_phys, converts it to a GFN and then
delegates to gnttab_grant_foreign_access, just like xenbus_grant_ring.
I can confirm that this works on Linux 5.4.0.

My question to you is: How can this be fixed "the right way"?
Is there anything that can be done to prevent others from debugging
the same problem (which for me, took some hours...)?

I can see multiple approaches:
1. Have xenbus_grant_ring "just work" even with DMA addresses on ARM
This would certainly be the nicest solution, but I don't see how
it could be implemented. I don't know how to check whether some
address actually is a DMA address and even if there was a way to
know, dma_to_phys still requires a pointer to the device struct
which was used for allocation.
2. Provide another version which takes a dma_addr_t instead of void*
This can be easily done, but things get complicated when the device
for which the DMA memory was allocated is not the xenbus_device
which is passed anyway. So, it would be necessary to include an
additional argument pointing the actual device struct which was used
for allocation.
3. Just use gnttab_grant_foreign_access which works with GFNs anyway
Which is essentially what I'm doing currently, as in my driver I
know from which the device the DMA addresses were allocated.
If this is the preferred solution to this problem, I propose adding
a warning to the documentation of xenbus_grant_ring that forbids
using this for vaddrs obtained from the DMA API as it will not work
(at least on ARM).

What do you think?

Greetings from Germany,
Simon


Best regards,

--
Julien Grall

Re: Xen 4.14.0 fails on Dell IoT Gateway without efi=no-rs

2020-08-20 Thread George Dunlap

On Thu, Aug 20, 2020 at 9:35 AM Jan Beulich  wrote:

>
> As far as making cases like this work by default, I'm afraid it'll
> need to be proposed to replace me as the maintainer of EFI code in
> Xen. I will remain on the position that it is not acceptable to
> apply workarounds for firmware issues by default unless they're
> entirely benign to spec-conforming systems. DMI data based enabling
> of workarounds, for example, is acceptable in the common case, as
> long as the matching pattern isn't unreasonably wide.
>

It sort of sounds like it would be useful to have a wider discussion on
this then, to hash out what exactly it is we want to do as a project.

 -George

Re: u-boot vs. uefi as boot loaders on ARM

2020-08-20 Thread Oleksandr Andrushchenko


On 8/20/20 1:50 PM, Julien Grall wrote:
> Hi Roman,
>
> On 16/08/2020 21:45, Roman Shaposhnik wrote:
>> On Sun, Aug 16, 2020 at 7:54 AM Julien Grall  wrote:
>>> On 15/08/2020 21:43, Roman Shaposhnik wrote:
 Hi!
>>>
>>> Hi,
>>>
 with the recent excellent work by Anastasiia committed to the u-boot's
 main line, we now have two different ways of bringing ARM DomUs.

 Is there any chance someone can educate the general public on pros
 and cons of both approaches?

 In Project EVE we're still using uefi on ARM (to stay closer to the more
 "ARM in the cloud" use case) but perhaps the situation now is more
 nuanced?
>>>
>>> UEFI is just standard, so I am guessing you are referring to
>>> Tianocore/EDK2. am I correct?
>>
>> Yes, but I was actually referring to both in a way (I should've been
>> clearer tho).
>> To be more explicit my question was around trying to compare a "standardized"
>> way of botting a generic DomU on ARM (and that standard is UEFI with one
>> particular implementation that works out of the box with Xen being TC/EDK2) 
>> with
>> a more ad-hoc u-boot style of booting.
>>
>>> Recent version of U-boot are also able to partially UEFI. This means you
>>> could easily use GRUB with U-boot.
>>
>> Yup -- which complicated things even more. And it is funny you should mention
>> it, since we actually started with TC/EDK2 for RaspberryPi4 as a board
>> bootloader,
>> but quickly switched to u-boot with UEFI shim layer, since it was much 
>> smaller,
>> better supported (still?) and gave us all we needed to boot Xen on RPi4 as a
>> UEFI payload.
>>
>>>  From my understanding, U-boot is just a bootloader. Therefore it will
>>> not provide runtime services (such as date & time).
>>
>> It actually does provide some of that (see below)
>
> Cool! Although, it looks mostly related to the environment variable though.
>
>>
>>> Furthermore, the
>>> interface is less user friendly, you will have to know the memory layout
>>> in order to load binaries.
>>>
>>> On the other hand, Tianocore/EDK2 is very similar to what non-embedded
>>> may be used to. It will not require you to know your memory layout. But
>>> this comes at the cost of a more complex bootloader to debug.
>>
>> That's literally the crux of my question -- trying to understand what use 
>> cases
>> either one of them is meant for. Especially given that this shim layer is now
>> quite capable:
>> https://github.com/ARM-software/u-boot/blob/master/doc/README.uefi#L127
>
> While I can see major differences when using either on baremetal (you have 
> better control on the Device-Tree with U-boot), it is much less clear in a 
> guest. Maybe Anastasiia can explain why they decided to add support in 
> U-boot? :).

Well, there are many SoC vendors provide u-boot as their boot loader,

so it was natural for us to add pvblock to it (Renesas, Xilinx, iMX, RPi, you 
name it).

So this is the only reason I guess
> Cheers,
>
Regards,

Oleksandr

Re: [Linux] [ARM] Granting memory obtained from the DMA API

2020-08-20 Thread Simon Leiner

Hi Julien,

On 20.08.20 13:17, Julien Grall wrote:
> There is a lot of interest to get Virtio working on Xen at the moment.
> Is this going to be a new transport layer for Virtio?


It is designed that way, yes. The current implementation (based on
virtio_mmio.c) has a few limitations:
 - Only the host side is implemented for Linux (We are currently only
using bare metal domains for the device side - so the device
implementation is based on OpenAMP[1])

- It lacks some features, e.g. there is currently no device
configuration space

- It is tested only very narrowly (only for my use case which is RPMsg
via the rpmsg_char kernel driver)

As this was really just a byproduct of my main research topic, I'm
currently not in touch with the virtio standards committee. But I'm
happy to contribute my work if there is interest :-)

> Xen-devel is probably most suitable for this discussion, so I moved the
> discussion there. I have also CCed a couple of Linux maintainers that
> should be able to provide feedbacks on the approaches.

Thanks!

Greetings,
Simon


[1]: https://www.openampproject.org

[xen-unstable-smoke test] 152632: tolerable all pass - PUSHED

2020-08-20 Thread osstest service owner

flight 152632 xen-unstable-smoke real [real]
http://logs.test-lab.xenproject.org/osstest/logs/152632/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-arm64-arm64-xl-xsm  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  14 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt 13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  14 saverestore-support-checkfail   never pass

version targeted for testing:
 xen  858c0be8c2fa4125a0fa0acaa03ae730e5c7cb3c
baseline version:
 xen  f9d67340b4aa254f64b40f2031720f61a33c2904

Last test of basis   152622  2020-08-19 12:01:17 Z1 days
Testing same since   152632  2020-08-20 10:01:46 Z0 days1 attempts


People who touched revisions under test:
  Bertrand Marquis 
  Julien Grall 

jobs:
 build-arm64-xsm  pass
 build-amd64  pass
 build-armhf  pass
 build-amd64-libvirt  pass
 test-armhf-armhf-xl  pass
 test-arm64-arm64-xl-xsm  pass
 test-amd64-amd64-xl-qemuu-debianhvm-amd64pass
 test-amd64-amd64-libvirt pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/xen.git
   f9d67340b4..858c0be8c2  858c0be8c2fa4125a0fa0acaa03ae730e5c7cb3c -> smoke

Re: Xen 4.14.0 fails on Dell IoT Gateway without efi=no-rs

2020-08-20 Thread Rich Persaud

On Aug 20, 2020, at 07:24, George Dunlap  wrote:
> 
>> On Thu, Aug 20, 2020 at 9:35 AM Jan Beulich  wrote:
> 
>> 
>> As far as making cases like this work by default, I'm afraid it'll
>> need to be proposed to replace me as the maintainer of EFI code in
>> Xen. I will remain on the position that it is not acceptable to
>> apply workarounds for firmware issues by default unless they're
>> entirely benign to spec-conforming systems. DMI data based enabling
>> of workarounds, for example, is acceptable in the common case, as
>> long as the matching pattern isn't unreasonably wide.
> 
> 
> It sort of sounds like it would be useful to have a wider discussion on this 
> then, to hash out what exactly it is we want to do as a project.
> 
>  -George

Sometimes a middle ground is possible, e.g. see this Nov 2019 thread about a 
possible Xen Kconfig option for EFI_NONSPEC_COMPATIBILITY, targeting 
Edge/IoT/laptop hardware:

https://lists.archive.carbon60.com/xen/devel/571670#571670

In the years to come, edge devices will only grow in numbers.  Some will be 
supported in production for more than a decade, which will require new 
long-term commercial support mechanisms for device BIOS, rather than firmware 
engineers shifting focus after a device is launched. 

In parallel to (opt-in) Xen workarounds for a constrained and documented set of 
firmware issues, we need more industry efforts to support open firmware, like 
coreboot and OCP Open System Firmware with minimum binary blobs.  At least one 
major x86 OEM is expected to ship open firmware in one of their popular 
devices, which may encourage competing OEM devices to follow.

PC Engines APU2 (dual-core AMD, 4GB RAM, 6W TDP, triple NIC + LTE) is one 
available edge device which supports Xen and has open (coreboot) firmware.  It 
would be nice to include APU2 in LF Edge support, if only to provide 
competition to OEM devices with buggy firmware. Upcoming Intel Tiger Lake 
(Core) and Elkhart Lake (Atom Tremont) are expected to expand edge-relevant 
security features, which would make such devices attractive to Xen deployments. 

We also need edge software vendors to encourage device OEMs to enable open 
firmware via coreboot, OCP OSF, Intel MinPlatform and similar programs. See 
https://software.intel.com/content/www/us/en/develop/articles/minimum-platform-architecture-open-source-uefi-firmware-for-intel-based-platforms.html
 and other talks from the open firmware conference, https://osfc.io/archive

Rich

Re: About VIRTIO support on Xen

2020-08-20 Thread Wei Chen

Thank you very much! It’s very valuable information.

Regards,

> 在 2020年8月20日，19:02，Julien Grall  写道：
> 
> On Thu, 20 Aug 2020 at 11:59, Julien Grall  wrote:
>> 
>> 
>> 
>>> On 20/08/2020 05:45, Jedi Chen wrote:
>>> Hi xen-devel,
>> 
>> Hi,
>> 
>>> 
>>> I am very interesting about the VIRTIO on Xen. And from one meeting
>>> report of AGL Virtualization Expert Group (EG-VIRT)
>>> https://wiki.automotivelinux.org/eg-virt-meetings#pm_cest_meeting4, I
>>> got the information that ARM and Linaro are
>>> upstreaming XEN work incorporating VirtIO. But I can't find any
>>> information in the mailing list. Is there any
>>> architecture overview or design doc about it?
>> 
>> There is some discussion on xen-devel [1] to add support for Virtio MMIO
>> on Arm. This is still in early development, but you should be able to
>> get a PoC setup with the work.
>> 
>> Best regards,
>> 
>> [1] <159647-23030-1-git-send-email-olekst...@gmail.com>
> 
> Sorry I should have added a direct link:
> 
> https://lore.kernel.org/xen-devel/159647-23030-1-git-send-email-olekst...@gmail.com/
> 
>> 
>>> 
>>> Thanks,
>>> 
>>> 
>>> 
>> 
>> 
>> 
>> --
>> Julien Grall

Re: [PATCH] efi: discover ESRT table on Xen PV too

2020-08-20 Thread Roger Pau Monné

On Wed, Aug 19, 2020 at 01:33:39PM +0200, Norbert Kaminski wrote:
> 
> On 19.08.2020 10:19, Roger Pau Monné wrote:
> > On Tue, Aug 18, 2020 at 08:40:18PM +0200, Marek Marczykowski-Górecki wrote:
> > > On Tue, Aug 18, 2020 at 07:21:14PM +0200, Roger Pau Monné wrote:
> > > > > Let me draw the picture from the beginning.
> > > > Thanks, greatly appreciated.
> > > > 
> > > > > EFI memory map contains various memory regions. Some of them are 
> > > > > marked
> > > > > as not needed after ExitBootServices() call (done in Xen before
> > > > > launching dom0). This includes EFI_BOOT_SERVICES_DATA and
> > > > > EFI_BOOT_SERVICES_CODE.
> > > > > 
> > > > > EFI SystemTable contains pointers to various ConfigurationTables -
> > > > > physical addresses (at least in this case). Xen does interpret some of
> > > > > them, but not ESRT. Xen pass the whole (address of) SystemTable to 
> > > > > Linux
> > > > > dom0 (at least in PV case). Xen doesn't do anything about tables it
> > > > > doesn't understand.
> > > > > 
> > > > > Now, the code in Linux takes the (ESRT) table address early and checks
> > > > > the memory map for it. We have 3 cases:
> > > > >   - it points at area marked as neither EFI_*_SERVICES_DATA, nor with
> > > > > EFI_MEMORY_RUNTIME attribute -> Linux refuse to use it
> > > > >   - it points to EFI_RUNTIME_SERVICES_DATA or with EFI_MEMORY_RUNTIME
> > > > > attribute - Linux uses the table; memory map already says the area
> > > > > belongs to EFI and the OS should not use it for something else
> > > > >   - it points to EFI_BOOT_SERVICES_DATA - Linux mark the area as 
> > > > > reserved
> > > > > to not release it after calling ExitBootServices()
> > > > > 
> > > > > The problematic is the third case - at the time when Linux dom0 is 
> > > > > run,
> > > > > ExitBootServices() was already called and EFI_BOOT_SERVICES_* memory 
> > > > > was
> > > > > already released. It could be already used for something else (for
> > > > > example Xen could overwrite it while loading dom0).
> > > > > 
> > > > > Note the problematic case should be the most common - UEFI 
> > > > > specification
> > > > > says "The ESRT shall be stored in memory of type EfiBootServicesData"
> > > > > (chapter 22.3 of UEFI Spec v2.6).
> > > > > 
> > > > > For this reason, to use ESRT in dom0, Xen should do something about it
> > > > > before ExitBootServices() call. While analyzing all the EFI tables is
> > > > > probably not a viable option, it can do some simple action:
> > > > >   - retains all the EFI_BOOT_SERVICES_* areas - there is already code
> > > > > for that, controlled with /mapbs boot switch (to xen.efi, would 
> > > > > need
> > > > > another option for multiboot2+efi)
> > > > >   - have a list of tables to retain - since Xen already do analyze 
> > > > > some
> > > > > of the ConfigurationTables, it can also have a list of those to
> > > > > preserve even if they live in EFI_BOOT_SERVICES_DATA. In this 
> > > > > case,
> > > > > while Xen doesn't need to parse the whole table, it need to parse 
> > > > > it's
> > > > > header to get the table size - to reserve that memory and not 
> > > > > reuse
> > > > > it after ExitBootServices().
> > > > Xen seems to already contain skeleton
> > > > XEN_EFI_query_capsule_capabilities and XEN_EFI_update_capsule
> > > > hypercalls which is what should be used in order to perform the
> > > > updates?
> > > I think those covers only runtime service calls similarly named. But you
> > > need also ESRT table to collect info about devices that you can even
> > > attempt to update.
> > Right, the ESRT must be available so that dom0 can discover the
> > resources.
> > 
> > > TBH, I'm not sure if those runtime services are really needed. I think
> > > Norbert succeeded UEFI update from within Xen PV dom0 with just access
> > > to the ESRT table, but without those services.
> > > 
> Marek is right here. I was able to successfully update and downgrade
> UFEI when the ESRT table was provided to the Xen PV dom0. I didn't
> need any extra services to make the UEFI capsule update work.

OK, I think that's using the method described in 8.5.5 of delivery of
Capsules via file on Mass Storage, which doesn't use the
UpdateCapsule() runtime API?

Using such method doesn't require QueryCapsuleCapabilities(), as
that's used to know whether a certain capsule can be updated via
UpdateCapsule().

> > OK, by reading the UEFI spec I assumed that you needed access to
> > QueryCapsuleCapabilities and UpdateCapsule in order to perform the
> > updates, and those should be proxied using hyopercalls. Maybe this is
> > not mandatory and there's a side-band mechanism of doing this?
> > 
> > I think we need more info here.
> > 
> > > > So yes, I agree Xen should make sure the region of the table is not
> > > > freed when exiting boot services, and that dom0 can access it. I
> > > > guess we should move the checks done by Linux to Xen, and then only
> > > > provide the ESRT tabl

Re: [PATCH] efi: discover ESRT table on Xen PV too

2020-08-20 Thread Roger Pau Monné

On Thu, Aug 20, 2020 at 11:34:54AM +0200, Marek Marczykowski-Górecki wrote:
> On Thu, Aug 20, 2020 at 11:30:25AM +0200, Roger Pau Monné wrote:
> > Right, so you only need access to the ESRT table, that's all. Then I
> > think we need to make sure Xen doesn't use this memory for anything
> > else, which will require some changes in Xen (or at least some
> > checks?).
> > 
> > We also need to decide what to do if the table turns out to be placed
> > in a wrong region. How are we going to prevent dom0 from using it
> > then? My preference would be to completely hide it from dom0 in that
> > case, such that it believes there's no ESRT at all if possible.
> 
> Yes, that makes sense. As discussed earlier, that probably means
> re-constructing SystemTable before giving it to dom0. We'd need to do
> that in PVH case anyway, to adjust addresses, right?

Not really, on PVH dom0 we should be able to identity map the required
EFI regions in the dom0 p2m, so the only difference between a classic
PV dom0 is that we need to assure that those regions are correctly
identity mapped in the p2m, but that shouldn't require any change to
the SystemTable unless we need to craft custom tables (see below).

> Is there something
> like this in the Xen codebase already, or it needs to be written from
> scratch?

AFAICT it needs to be written for EFI. For the purposes here I think
you could copy the SystemTable and modify the NumberOfTableEntries and
ConfigurationTable fields in the copy in order to delete the ESRT if
found to be placed in a non suitable region?

At that point we can remove the checks from Linux since Xen will
assert that whatever gets passed to dom0 is in a suitable region. It
would be nice to have a way to signal that the placement of the ESRT
has been checked, but I'm not sure how to do this, do you have any
ideas?

Roger.

Re: [PATCH] x86/pci: fix xen.c build error when CONFIG_ACPI is not set

2020-08-20 Thread Konrad Rzeszutek Wilk

On Wed, Aug 19, 2020 at 08:09:11PM -0700, Randy Dunlap wrote:
> Hi Konrad,

Hey Randy,

I believe Juergen is picking this up.
> 
> ping.
> 
> I am still seeing this build error. It looks like this is
> in your territory to merge...
> 
> 
> On 8/13/20 4:00 PM, Randy Dunlap wrote:
> > From: Randy Dunlap 
> > 
> > Fix build error when CONFIG_ACPI is not set/enabled:
> > 
> > ../arch/x86/pci/xen.c: In function ‘pci_xen_init’:
> > ../arch/x86/pci/xen.c:410:2: error: implicit declaration of function 
> > ‘acpi_noirq_set’; did you mean ‘acpi_irq_get’? 
> > [-Werror=implicit-function-declaration]
> >   acpi_noirq_set();
> > 
> > Fixes: 88e9ca161c13 ("xen/pci: Use acpi_noirq_set() helper to avoid #ifdef")
> > Signed-off-by: Randy Dunlap 
> > Cc: Andy Shevchenko 
> > Cc: Bjorn Helgaas 
> > Cc: Konrad Rzeszutek Wilk 
> > Cc: xen-devel@lists.xenproject.org
> > Cc: linux-...@vger.kernel.org
> > ---
> >  arch/x86/pci/xen.c |1 +
> >  1 file changed, 1 insertion(+)
> > 
> > --- linux-next-20200813.orig/arch/x86/pci/xen.c
> > +++ linux-next-20200813/arch/x86/pci/xen.c
> > @@ -26,6 +26,7 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> >  #include 
> >  
> >  static int xen_pcifront_enable_irq(struct pci_dev *dev)
> > 
> 
> 
> thanks.
> -- 
> ~Randy
>

Re: [PATCH] x86/pci: fix xen.c build error when CONFIG_ACPI is not set

2020-08-20 Thread Jürgen Groß


On 20.08.20 16:40, Konrad Rzeszutek Wilk wrote:

On Wed, Aug 19, 2020 at 08:09:11PM -0700, Randy Dunlap wrote:

Hi Konrad,


Hey Randy,

I believe Juergen is picking this up.


Yes, have queued it for rc2.


Juergen



ping.

I am still seeing this build error. It looks like this is
in your territory to merge...


On 8/13/20 4:00 PM, Randy Dunlap wrote:

From: Randy Dunlap 

Fix build error when CONFIG_ACPI is not set/enabled:

../arch/x86/pci/xen.c: In function ‘pci_xen_init’:
../arch/x86/pci/xen.c:410:2: error: implicit declaration of function 
‘acpi_noirq_set’; did you mean ‘acpi_irq_get’? 
[-Werror=implicit-function-declaration]
   acpi_noirq_set();

Fixes: 88e9ca161c13 ("xen/pci: Use acpi_noirq_set() helper to avoid #ifdef")
Signed-off-by: Randy Dunlap 
Cc: Andy Shevchenko 
Cc: Bjorn Helgaas 
Cc: Konrad Rzeszutek Wilk 
Cc: xen-devel@lists.xenproject.org
Cc: linux-...@vger.kernel.org
---
  arch/x86/pci/xen.c |1 +
  1 file changed, 1 insertion(+)

--- linux-next-20200813.orig/arch/x86/pci/xen.c
+++ linux-next-20200813/arch/x86/pci/xen.c
@@ -26,6 +26,7 @@
  #include 
  #include 
  #include 
+#include 
  #include 
  
  static int xen_pcifront_enable_irq(struct pci_dev *dev)





thanks.
--
~Randy

Re: [PATCH v4 1/2] memremap: rename MEMORY_DEVICE_DEVDAX to MEMORY_DEVICE_GENERIC

2020-08-20 Thread Roger Pau Monné

On Tue, Aug 11, 2020 at 11:07:36PM +0200, David Hildenbrand wrote:
> On 11.08.20 11:44, Roger Pau Monne wrote:
> > This is in preparation for the logic behind MEMORY_DEVICE_DEVDAX also
> > being used by non DAX devices.
> > 
> > No functional change intended.
> > 
> > Signed-off-by: Roger Pau Monné 
> > ---
> > Cc: Dan Williams 
> > Cc: Vishal Verma 
> > Cc: Dave Jiang 
> > Cc: Andrew Morton 
> > Cc: Jason Gunthorpe 
> > Cc: Ira Weiny 
> > Cc: "Aneesh Kumar K.V" 
> > Cc: Johannes Thumshirn 
> > Cc: Logan Gunthorpe 
> > Cc: linux-nvd...@lists.01.org
> > Cc: xen-devel@lists.xenproject.org
> > Cc: linux...@kvack.org
> > ---
> >  drivers/dax/device.c | 2 +-
> >  include/linux/memremap.h | 9 -
> >  mm/memremap.c| 2 +-
> >  3 files changed, 6 insertions(+), 7 deletions(-)
> > 
> > diff --git a/drivers/dax/device.c b/drivers/dax/device.c
> > index 4c0af2eb7e19..1e89513f3c59 100644
> > --- a/drivers/dax/device.c
> > +++ b/drivers/dax/device.c
> > @@ -429,7 +429,7 @@ int dev_dax_probe(struct device *dev)
> > return -EBUSY;
> > }
> >  
> > -   dev_dax->pgmap.type = MEMORY_DEVICE_DEVDAX;
> > +   dev_dax->pgmap.type = MEMORY_DEVICE_GENERIC;
> > addr = devm_memremap_pages(dev, &dev_dax->pgmap);
> > if (IS_ERR(addr))
> > return PTR_ERR(addr);
> > diff --git a/include/linux/memremap.h b/include/linux/memremap.h
> > index 5f5b2df06e61..e5862746751b 100644
> > --- a/include/linux/memremap.h
> > +++ b/include/linux/memremap.h
> > @@ -46,11 +46,10 @@ struct vmem_altmap {
> >   * wakeup is used to coordinate physical address space management (ex:
> >   * fs truncate/hole punch) vs pinned pages (ex: device dma).
> >   *
> > - * MEMORY_DEVICE_DEVDAX:
> > + * MEMORY_DEVICE_GENERIC:
> >   * Host memory that has similar access semantics as System RAM i.e. DMA
> > - * coherent and supports page pinning. In contrast to
> > - * MEMORY_DEVICE_FS_DAX, this memory is access via a device-dax
> > - * character device.
> > + * coherent and supports page pinning. This is for example used by DAX 
> > devices
> > + * that expose memory using a character device.
> >   *
> >   * MEMORY_DEVICE_PCI_P2PDMA:
> >   * Device memory residing in a PCI BAR intended for use with Peer-to-Peer
> > @@ -60,7 +59,7 @@ enum memory_type {
> > /* 0 is reserved to catch uninitialized type fields */
> > MEMORY_DEVICE_PRIVATE = 1,
> > MEMORY_DEVICE_FS_DAX,
> > -   MEMORY_DEVICE_DEVDAX,
> > +   MEMORY_DEVICE_GENERIC,
> > MEMORY_DEVICE_PCI_P2PDMA,
> >  };
> >  
> > diff --git a/mm/memremap.c b/mm/memremap.c
> > index 03e38b7a38f1..006dace60b1a 100644
> > --- a/mm/memremap.c
> > +++ b/mm/memremap.c
> > @@ -216,7 +216,7 @@ void *memremap_pages(struct dev_pagemap *pgmap, int nid)
> > return ERR_PTR(-EINVAL);
> > }
> > break;
> > -   case MEMORY_DEVICE_DEVDAX:
> > +   case MEMORY_DEVICE_GENERIC:
> > need_devmap_managed = false;
> > break;
> > case MEMORY_DEVICE_PCI_P2PDMA:
> > 
> 
> No strong opinion (@Dan?), I do wonder if a separate type would make sense.

Gentle ping.

Thanks, Roger.

Re: Xen 4.14.0 fails on Dell IoT Gateway without efi=no-rs

2020-08-20 Thread Andrew Cooper

On 19/08/2020 23:50, Roman Shaposhnik wrote:
>  Hi!
>
> below you can see a trace of Xen 4.14.0 failing on Dell IoT Gateway 3001
> without efi=no-rs. Please let me know if I can provide any additional
> information.

Just to be able to get all datapoints, could you build Xen with
CONFIG_EFI_SET_VIRTUAL_ADDRESS_MAP and see if the failure mode changes?

Thanks,

~Andrew

[libvirt test] 152628: regressions - FAIL

2020-08-20 Thread osstest service owner

flight 152628 libvirt real [real]
http://logs.test-lab.xenproject.org/osstest/logs/152628/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 build-amd64-libvirt   6 libvirt-buildfail REGR. vs. 151777
 build-i386-libvirt6 libvirt-buildfail REGR. vs. 151777
 build-armhf-libvirt   6 libvirt-buildfail REGR. vs. 151777
 build-arm64-libvirt   6 libvirt-buildfail REGR. vs. 151777

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-libvirt  1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt-pair  1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 1 build-check(1) blocked n/a
 test-amd64-amd64-libvirt-vhd  1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt-xsm  1 build-check(1)   blocked  n/a
 test-amd64-i386-libvirt   1 build-check(1)   blocked  n/a
 test-amd64-i386-libvirt-pair  1 build-check(1)   blocked  n/a
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 1 build-check(1) blocked n/a
 test-amd64-i386-libvirt-xsm   1 build-check(1)   blocked  n/a
 test-arm64-arm64-libvirt  1 build-check(1)   blocked  n/a
 test-arm64-arm64-libvirt-qcow2  1 build-check(1)   blocked  n/a
 test-arm64-arm64-libvirt-xsm  1 build-check(1)   blocked  n/a
 test-armhf-armhf-libvirt  1 build-check(1)   blocked  n/a
 test-armhf-armhf-libvirt-raw  1 build-check(1)   blocked  n/a

version targeted for testing:
 libvirt  53d9af1e7924757e3b5f661131dd707d7110d094
baseline version:
 libvirt  2c846fa6bcc11929c9fb857a22430fb9945654ad

Last test of basis   151777  2020-07-10 04:19:19 Z   41 days
Failing since151818  2020-07-11 04:18:52 Z   40 days   36 attempts
Testing same since   152628  2020-08-20 04:19:39 Z0 days1 attempts


People who touched revisions under test:
  Andrea Bolognani 
  Balázs Meskó 
  Bastien Orivel 
  Bihong Yu 
  Binfeng Wu 
  Boris Fiuczynski 
  Christian Ehrhardt 
  Côme Borsoi 
  Daniel Henrique Barboza 
  Daniel P. Berrange 
  Daniel P. Berrangé 
  Erik Skultety 
  Fedora Weblate Translation 
  Han Han 
  Hao Wang 
  Jamie Strandboge 
  Jamie Strandboge 
  Jean-Baptiste Holcroft 
  Jianan Gao 
  Jin Yan 
  Jiri Denemark 
  Ján Tomko 
  Laine Stump 
  Liao Pingfang 
  Martin Kletzander 
  Michal Privoznik 
  Nikolay Shirokovskiy 
  Paulo de Rezende Pinatti 
  Pavel Hrdina 
  Peter Krempa 
  Pino Toscano 
  Pino Toscano 
  Piotr Drąg 
  Prathamesh Chavan 
  Roman Bogorodskiy 
  Ryan Schmidt 
  Sam Hartman 
  Stefan Bader 
  Stefan Berger 
  Szymon Scholz 
  Wang Xin 
  Weblate 
  Yang Hang 
  Yi Wang 
  Yuri Chornoivan 
  Zheng Chuan 

jobs:
 build-amd64-xsm  pass
 build-arm64-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-arm64  pass
 build-armhf  pass
 build-i386   pass
 build-amd64-libvirt  fail
 build-arm64-libvirt  fail
 build-armhf-libvirt  fail
 build-i386-libvirt   fail
 build-amd64-pvopspass
 build-arm64-pvopspass
 build-armhf-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm   blocked 
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsmblocked 
 test-amd64-amd64-libvirt-xsm blocked 
 test-arm64-arm64-libvirt-xsm blocked 
 test-amd64-i386-libvirt-xsm  blocked 
 test-amd64-amd64-libvirt blocked 
 test-arm64-arm64-libvirt blocked 
 test-armhf-armhf-libvirt blocked 
 test-amd64-i386-libvirt  blocked 
 test-amd64-amd64-libvirt-pairblocked 
 test-amd64-i386-libvirt-pair blocked 
 test-arm64-arm64-libvirt-qcow2   blocked 
 test-armhf-armhf-libvirt-raw blocked 
 test-amd64-amd64-libvirt-vhd blocked 


---

[xen-unstable test] 152623: regressions - FAIL

2020-08-20 Thread osstest service owner

flight 152623 xen-unstable real [real]
http://logs.test-lab.xenproject.org/osstest/logs/152623/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-amd64-xl-qemut-debianhvm-i386-xsm 10 debian-hvm-install fail REGR. 
vs. 152597
 test-amd64-i386-xl-qemuu-debianhvm-i386-xsm 10 debian-hvm-install fail REGR. 
vs. 152597

Regressions which are regarded as allowable (not blocking):
 test-amd64-amd64-xl-rtds 18 guest-localmigrate/x10   fail REGR. vs. 152597
 test-armhf-armhf-xl-rtds16 guest-start/debian.repeat fail REGR. vs. 152597

Tests which did not succeed, but are not blocking:
 test-armhf-armhf-libvirt 14 saverestore-support-checkfail  like 152597
 test-amd64-amd64-xl-qemuu-ws16-amd64 17 guest-stopfail like 152597
 test-amd64-amd64-xl-qemut-win7-amd64 17 guest-stopfail like 152597
 test-amd64-i386-xl-qemut-win7-amd64 17 guest-stop fail like 152597
 test-amd64-amd64-xl-qemuu-win7-amd64 17 guest-stopfail like 152597
 test-amd64-i386-xl-qemuu-win7-amd64 17 guest-stop fail like 152597
 test-armhf-armhf-libvirt-raw 13 saverestore-support-checkfail  like 152597
 test-amd64-amd64-xl-qemut-ws16-amd64 17 guest-stopfail like 152597
 test-amd64-i386-xl-qemuu-ws16-amd64 17 guest-stop fail like 152597
 test-amd64-i386-xl-pvshim12 guest-start  fail   never pass
 test-amd64-amd64-libvirt 13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  14 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 13 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  13 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  14 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  14 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 13 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 14 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  14 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  14 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 14 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check 
fail never pass
 test-amd64-amd64-libvirt-vhd 12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 14 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt 13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 13 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 14 saverestore-support-checkfail never pass
 test-armhf-armhf-xl-credit1  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  14 saverestore-support-checkfail   never pass
 test-amd64-i386-xl-qemut-ws16-amd64 17 guest-stop  fail never pass
 test-armhf-armhf-xl-vhd  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  13 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 13 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 14 saverestore-support-checkfail  never pass
 test-armhf-armhf-xl  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  14 saverestore-support-checkfail   never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check 
fail never pass
 test-armhf-armhf-libvirt-raw 12 migrate-support-checkfail   never pass
 test-amd64-amd64-qemuu-nested-amd 17 debian-hvm-install/l1/l2  fail never pass

version targeted for testing:
 xen  a825751f633482c0634ebb7c7b7ba33acadcfe7b
baseline version:
 xen  391a8b6d20b72c4f24f8511f78ef75a6119cbe22

Last test of basis   152597  2020-08-14 04:46:41 Z6 days
T

Re: [RFC PATCH V1 04/12] xen/arm: Introduce arch specific bits for IOREQ/DM features

2020-08-20 Thread Oleksandr




Hello all.


I would like to clarify some questions based on the comments for the 
patch series. I put them together (please see below).



On 06.08.20 14:29, Jan Beulich wrote:

On 06.08.2020 13:08, Julien Grall wrote:

On 05/08/2020 20:30, Oleksandr wrote:

I was thinking how to split handle_hvm_io_completion()
gracefully but I failed find a good solution for that, so decided to add
two stubs (msix_write_completion and handle_realmode_completion) on Arm.
I could add a comment describing why they are here if appropriate. But
if you think they shouldn't be called from the common code in any way, I
will try to split it.

I am not entirely sure what msix_write_completion is meant to do on x86.
Is it dealing with virtual MSIx? Maybe Jan, Roger or Paul could help?

Due to the split brain model of handling PCI pass-through (between
Xen and qemu), a guest writing to an MSI-X entry needs this write
handed to qemu, and upon completion of the write there Xen also
needs to take some extra action.



1. Regarding common handle_hvm_io_completion() implementation:

Could msix_write_completion() be called later on so we would be able to 
split handle_hvm_io_completion() gracefully or could we call it from 
handle_mmio()?
The reason I am asking is to avoid calling it from the common code in 
order to avoid introducing stub on Arm which is not going to be ever 
implemented

(if msix_write_completion() is purely x86 material).

For the non-RFC patch series I moved handle_realmode_completion to the 
x86 code and now my local implementation looks like:


bool handle_hvm_io_completion(struct vcpu *v)
{
    struct domain *d = v->domain;
    struct hvm_vcpu_io *vio = &v->arch.hvm.hvm_io;
    struct hvm_ioreq_server *s;
    struct hvm_ioreq_vcpu *sv;
    enum hvm_io_completion io_completion;

    if ( has_vpci(d) && vpci_process_pending(v) )
    {
    raise_softirq(SCHEDULE_SOFTIRQ);
    return false;
    }

    sv = get_pending_vcpu(v, &s);
    if ( sv && !hvm_wait_for_io(sv, get_ioreq(s, v)) )
    return false;

    vio->io_req.state = hvm_ioreq_needs_completion(&vio->io_req) ?
    STATE_IORESP_READY : STATE_IOREQ_NONE;

    msix_write_completion(v);
    vcpu_end_shutdown_deferral(v);

    io_completion = vio->io_completion;
    vio->io_completion = HVMIO_no_completion;

    switch ( io_completion )
    {
    case HVMIO_no_completion:
    break;

    case HVMIO_mmio_completion:
    return handle_mmio();

    case HVMIO_pio_completion:
    return handle_pio(vio->io_req.addr, vio->io_req.size,
  vio->io_req.dir);

    default:
    return arch_handle_hvm_io_completion(io_completion);
    }

    return true;
}

2. Regarding renaming common handle_mmio() to ioreq_handle_complete_mmio():

There was a request to consider renaming that function which is called 
from the common code in the context of IOREQ series.
The point is, that the name of the function is pretty generic and can be 
confusing on Arm (we already have a try_handle_mmio()).
I noticed that except common code that function is called from a few 
places on x86 (I am not even sure whether all of them are IOREQ related).

The question is would x86 folks be happy with such renaming?

Alternatively I could provide the following in 
include/asm-arm/hvm/ioreq.h without renaming it in the common code and
still using non-confusing variant on Arm (however I am not sure whether 
this is a good idea):


#define handle_mmio ioreq_handle_complete_mmio


3. Regarding common IOREQ/DM stuff location:

Currently it is located at:
common/hvm/...
include/xen/hvm/...

For the non-RFC patch series I am going to avoid using "hvm" name (which 
is internal detail of arch specific code and shouldn't be exposed to the 
common code).
The question is whether I should use another directory name (probably 
ioreq?) or just place them in common root directory?



Could you please share your opinion?

--
Regards,

Oleksandr Tyshchenko

[ovmf test] 152627: all pass - PUSHED

2020-08-20 Thread osstest service owner

flight 152627 ovmf real [real]
http://logs.test-lab.xenproject.org/osstest/logs/152627/

Perfect :-)
All tests in this flight passed as required
version targeted for testing:
 ovmf 5a6d764e1d073d28e8f398289ccb5592bf9a72ba
baseline version:
 ovmf a048af3c9073e4b8108e6cf920bbb35574059639

Last test of basis   152617  2020-08-19 09:13:09 Z1 days
Testing same since   152627  2020-08-20 03:50:23 Z0 days1 attempts


People who touched revisions under test:
  Sami Mujawar 

jobs:
 build-amd64-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-xl-qemuu-ovmf-amd64 pass
 test-amd64-i386-xl-qemuu-ovmf-amd64  pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/osstest/ovmf.git
   a048af3c90..5a6d764e1d  5a6d764e1d073d28e8f398289ccb5592bf9a72ba -> 
xen-tested-master

Re: [Linux] [ARM] Granting memory obtained from the DMA API

2020-08-20 Thread Stefano Stabellini

On Thu, 20 Aug 2020, Julien Grall wrote:
> > Part of virtio is having shared memory. So naturally, I'm using Xen's
> > grant system for that. Part of the Xenbus client API is the function
> > xenbus_grant_ring which, by its documentation grants access to a block
> > of memory starting at vaddr to another domain. I tried using this in my
> > driver which created the grants and returned without any error, but
> > after mounting the grants on another domain, it turns out that some
> > other location in memory was actually granted instead of the one behind
> > the original vaddr.
> > 
> > So I found the problem: The vaddr that I was using xenbus_grant_ring
> > with was obtained by dma_alloc_coherent (whereas the other split
> > drivers included in the mainline kernel use Xen IO rings allocated by
> > the "regular" mechanisms such as __get_free_page, alloc_page etc.).
> > But xenbus_grant_ring uses virt_to_gfn to get the GFN for the vaddr
> > which on ARM(64) must not be used for DMA addresses. So I could fix the
> > problem by providing a modified version of xenbus_grant_ring as part of
> > my driver which takes a dma_addr_t instead of a void* for the start
> > address, gets the PFN via dma_to_phys, converts it to a GFN and then
> > delegates to gnttab_grant_foreign_access, just like xenbus_grant_ring.
> > I can confirm that this works on Linux 5.4.0.
>
> > My question to you is: How can this be fixed "the right way"?
> > Is there anything that can be done to prevent others from debugging
> > the same problem (which for me, took some hours...)?
> > 
> > I can see multiple approaches:
> > 1. Have xenbus_grant_ring "just work" even with DMA addresses on ARM
> > This would certainly be the nicest solution, but I don't see how
> > it could be implemented. I don't know how to check whether some
> > address actually is a DMA address and even if there was a way to
> > know, dma_to_phys still requires a pointer to the device struct
> > which was used for allocation.
> > 2. Provide another version which takes a dma_addr_t instead of void*
> > This can be easily done, but things get complicated when the device
> > for which the DMA memory was allocated is not the xenbus_device
> > which is passed anyway. So, it would be necessary to include an
> > additional argument pointing the actual device struct which was used
> > for allocation.
> > 3. Just use gnttab_grant_foreign_access which works with GFNs anyway
> > Which is essentially what I'm doing currently, as in my driver I
> > know from which the device the DMA addresses were allocated.
> > If this is the preferred solution to this problem, I propose adding
> > a warning to the documentation of xenbus_grant_ring that forbids
> > using this for vaddrs obtained from the DMA API as it will not work
> > (at least on ARM).
> > 
> > What do you think?

Thank for the well-written analysis of the problem. The following should
work to translate the virtual address properly in xenbus_grant_ring:

if (is_vmalloc_addr(vaddr))
page = vmalloc_to_page(vaddr);
else
page = virt_to_page(vaddr);

Please give it a try and let me know. Otherwise, if it cannot be made to
work, option 3 with a proper warning is also fine.

Re: u-boot vs. uefi as boot loaders on ARM

2020-08-20 Thread Roman Shaposhnik

On Thu, Aug 20, 2020 at 4:27 AM Oleksandr Andrushchenko
 wrote:
>
>
> On 8/20/20 1:50 PM, Julien Grall wrote:
> > Hi Roman,
> >
> > On 16/08/2020 21:45, Roman Shaposhnik wrote:
> >> On Sun, Aug 16, 2020 at 7:54 AM Julien Grall  wrote:
> >>> On 15/08/2020 21:43, Roman Shaposhnik wrote:
>  Hi!
> >>>
> >>> Hi,
> >>>
>  with the recent excellent work by Anastasiia committed to the u-boot's
>  main line, we now have two different ways of bringing ARM DomUs.
> 
>  Is there any chance someone can educate the general public on pros
>  and cons of both approaches?
> 
>  In Project EVE we're still using uefi on ARM (to stay closer to the more
>  "ARM in the cloud" use case) but perhaps the situation now is more
>  nuanced?
> >>>
> >>> UEFI is just standard, so I am guessing you are referring to
> >>> Tianocore/EDK2. am I correct?
> >>
> >> Yes, but I was actually referring to both in a way (I should've been
> >> clearer tho).
> >> To be more explicit my question was around trying to compare a 
> >> "standardized"
> >> way of botting a generic DomU on ARM (and that standard is UEFI with one
> >> particular implementation that works out of the box with Xen being 
> >> TC/EDK2) with
> >> a more ad-hoc u-boot style of booting.
> >>
> >>> Recent version of U-boot are also able to partially UEFI. This means you
> >>> could easily use GRUB with U-boot.
> >>
> >> Yup -- which complicated things even more. And it is funny you should 
> >> mention
> >> it, since we actually started with TC/EDK2 for RaspberryPi4 as a board
> >> bootloader,
> >> but quickly switched to u-boot with UEFI shim layer, since it was much 
> >> smaller,
> >> better supported (still?) and gave us all we needed to boot Xen on RPi4 as 
> >> a
> >> UEFI payload.
> >>
> >>>  From my understanding, U-boot is just a bootloader. Therefore it will
> >>> not provide runtime services (such as date & time).
> >>
> >> It actually does provide some of that (see below)
> >
> > Cool! Although, it looks mostly related to the environment variable though.
> >
> >>
> >>> Furthermore, the
> >>> interface is less user friendly, you will have to know the memory layout
> >>> in order to load binaries.
> >>>
> >>> On the other hand, Tianocore/EDK2 is very similar to what non-embedded
> >>> may be used to. It will not require you to know your memory layout. But
> >>> this comes at the cost of a more complex bootloader to debug.
> >>
> >> That's literally the crux of my question -- trying to understand what use 
> >> cases
> >> either one of them is meant for. Especially given that this shim layer is 
> >> now
> >> quite capable:
> >> https://github.com/ARM-software/u-boot/blob/master/doc/README.uefi#L127
> >
> > While I can see major differences when using either on baremetal (you have 
> > better control on the Device-Tree with U-boot), it is much less clear in a 
> > guest. Maybe Anastasiia can explain why they decided to add support in 
> > U-boot? :).
>
> Well, there are many SoC vendors provide u-boot as their boot loader,
>
> so it was natural for us to add pvblock to it (Renesas, Xilinx, iMX, RPi, you 
> name it).
>
> So this is the only reason I guess

What I am wondering about (perhaps selfishly because of Project EVE)
is the availability
of VMs for u-boot.

IOW, with UEFI I can pick up a random "cloud" (or any other one
really) ARM VM image
and boot it as DomU simply because it seems that 99% of existing VMs
are packaged
with a EFI partition setup for a UEFI boot.

Stefano and I actually talked about availability of VMs that are
pre-set with u-boot, but
it seems that the only place where you can find something like that is
Xilinx (for their
Petalinux). Stefano also brought up a point that Yocto would generate
u-boot's boot.scr
scripts -- but I have no experience with that and would appreciate
other commenting.

All of that said, it would be simply awesome if we can have a wiki
page with examples
of where to get (or how to build) DomUs that would be setup for u-boot
sequence on ARM.

Thanks,
Roman.

Re: [PATCH 00/14] kernel-doc: public/arch-arm.h

2020-08-20 Thread Stefano Stabellini

On Tue, 18 Aug 2020, Ian Jackson wrote:
> Stefano Stabellini writes ("Re: [PATCH 00/14] kernel-doc: public/arch-arm.h"):
> > I am replying to this email as I have been told that the original was
> > filtered as spam due to the tarball attachment. The tarball contains
> > some example html output document files from sphinx.
> 
> Thanks.
> 
> Thanks for all your work.  This is definiteely going in the right
> direection.  I skimread all the patches and have nothing further to
> add to what others have said.

Thanks for looking into it!

> How soon can we arrange for this processing to be done automatically
> (on xenbits, I guess) ?  Would you be prepared to set this up if I add
> your ssh key to the "xendocs" account which builds the existing docs ?

Yes, I can do that.

This series was only meant to provide the basic groundwork, I wasn't
thinking of adding the kernel-doc script to xen.git or the automatic
docs build as part of it. However, I do have work in the pipeline to do
that too: right now I am experimenting with some kernel-doc changes to
produce better output docs for Xen. I am planning on sending that out
soon after this series gets in, so maybe in few weeks or a month.

Since I am here, I'd like to give you a heads up that I'll need your
help reviewing or maybe making some changes to kernel-doc because my
perl is nonexistent so I am probably doing something awful :-)

Re: [PATCH 01/14] kernel-doc: public/arch-arm.h

2020-08-20 Thread Stefano Stabellini

On Tue, 18 Aug 2020, Ian Jackson wrote:
> Stefano Stabellini writes ("[PATCH 01/14] kernel-doc: public/arch-arm.h"):
> > From: Stefano Stabellini 
> > 
> > Convert in-code comments to kernel-doc format wherever possible.
> 
> Thanks.  But, err, I think there is not yet any in-tree machinery for
> actually building and publishing these kernel-doc comments ?

No, there isn't. But you can call kernel-doc on the headers manually and
it will produce fully readable docs in RST format. (Then you can covert
RST docs to HTML with Sphinx.) Like:

  kernel-doc xen/include/public/features.h > readme-features.rst

I also gave a few more details on the plan I had in my other email
reply.

> As I said I think replacing our ad-hoc in-tree system with kernel-doc
> is a good idea, but...
> 
> > -/*
> > - * `incontents 50 arm_abi Hypercall Calling Convention
> > +/**
> > + * DOC: Hypercall Calling Convention
> 
> ... let us not replace the in-tree markup for that system until we
> have its replacement.

Ah! I didn't know what 

  `incontents 50 arm_abi

was for. I assumed it was a relic of another era and removed it.

Is it actually used (and the other markups like that)? Is there
a script somewhere that parses it in xen.git or on xenbits already?

If they are in-use, then I can try to retain them for now until we have
the kernel-doc infrastructure on xenbits -- they should be compatible
with the kernel-doc syntax.

[PATCH v2 4/8] x86/svm: drop writes to BU_CFG on revF chips

2020-08-20 Thread Roger Pau Monne

We already have special casing to handle reads of this MSR for revF
chips, so do as the comment in svm_msr_read_intercept says and drop
writes. This is in preparation for changing the default MSR write
behavior, which will instead return #GP on not explicitly handled
writes.

Signed-off-by: Roger Pau Monné 
---
Changes since v1:
 - New in this version.
---
 xen/arch/x86/hvm/svm/svm.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c
index 2d0823e7e1..7586b77268 100644
--- a/xen/arch/x86/hvm/svm/svm.c
+++ b/xen/arch/x86/hvm/svm/svm.c
@@ -2125,6 +2125,12 @@ static int svm_msr_write_intercept(unsigned int msr, 
uint64_t msr_content)
 nsvm->ns_msr_hsavepa = msr_content;
 break;
 
+case MSR_F10_BU_CFG:
+/* See comment in svm_msr_read_intercept. */
+if ( boot_cpu_data.x86 != 0xf )
+goto gpf;
+break;
+
 case MSR_AMD64_TSC_RATIO:
 if ( msr_content & TSC_RATIO_RSVD_BITS )
 goto gpf;
-- 
2.28.0

[PATCH v2 0/8] x86: switch default MSR behavior

2020-08-20 Thread Roger Pau Monne

Hello,

The current series attempts to change the current MSR default handling
behavior, which is to silently drop writes to writable MSRs, and allow
reading any MSR not explicitly handled.

After this series access to MSRs not explicitly handled will trigger a
#GP fault. I've tested this series with osstest and it doesn't introduce
any regression, at least on the boxes selected for testing:

http://logs.test-lab.xenproject.org/osstest/logs/152630/

Thanks, Roger.

Andrew Cooper (2):
  x86/hvm: Disallow access to unknown MSRs
  x86/msr: Drop compatibility #GP handling in guest_{rd,wr}msr()

Roger Pau Monne (6):
  x86/vmx: handle writes to MISC_ENABLE MSR
  x86/svm: silently drop writes to SYSCFG and related MSRs
  x86/msr: explicitly handle AMD DE_CFG
  x86/svm: drop writes to BU_CFG on revF chips
  x86/pv: allow reading FEATURE_CONTROL MSR
  x86/pv: disallow access to unknown MSRs

 xen/arch/x86/hvm/svm/svm.c | 38 ++
 xen/arch/x86/hvm/vmx/vmx.c | 31 ++-
 xen/arch/x86/msr.c | 71 +-
 xen/arch/x86/pv/emul-priv-op.c | 18 +
 4 files changed, 79 insertions(+), 79 deletions(-)

-- 
2.28.0

[PATCH v2 2/8] x86/svm: silently drop writes to SYSCFG and related MSRs

2020-08-20 Thread Roger Pau Monne

The SYSCFG, TOP_MEM1 and TOP_MEM2 MSRs are currently exposed to guests
and writes are silently discarded. Make this explicit in the SVM code
now, and just return default constant values when attempting to read
any of the MSRs, while continuing to silently drop writes.

Signed-off-by: Roger Pau Monné 
---
Changes sincxe v1:
 - Return MtrrFixDramEn in MSR_K8_SYSCFG.
---
 xen/arch/x86/hvm/svm/svm.c | 21 +
 1 file changed, 21 insertions(+)

diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c
index ca3bbfcbb3..2d0823e7e1 100644
--- a/xen/arch/x86/hvm/svm/svm.c
+++ b/xen/arch/x86/hvm/svm/svm.c
@@ -1917,6 +1917,21 @@ static int svm_msr_read_intercept(unsigned int msr, 
uint64_t *msr_content)
 goto gpf;
 break;
 
+case MSR_K8_TOP_MEM1:
+case MSR_K8_TOP_MEM2:
+*msr_content = 0;
+break;
+
+case MSR_K8_SYSCFG:
+/*
+ * Return MtrrFixDramEn: albeit the current emulated MTRR
+ * implementation doesn't support the Extended Type-Field Format having
+ * such bit set is common on AMD hardware and is harmless as long as
+ * MtrrFixDramModEn isn't set.
+ */
+*msr_content = K8_MTRRFIXRANGE_DRAM_ENABLE;
+break;
+
 case MSR_K8_VM_CR:
 *msr_content = 0;
 break;
@@ -2094,6 +2109,12 @@ static int svm_msr_write_intercept(unsigned int msr, 
uint64_t msr_content)
 goto gpf;
 break;
 
+case MSR_K8_TOP_MEM1:
+case MSR_K8_TOP_MEM2:
+case MSR_K8_SYSCFG:
+/* Drop writes. */
+break;
+
 case MSR_K8_VM_CR:
 /* ignore write. handle all bits as read-only. */
 break;
-- 
2.28.0

[PATCH v2 1/8] x86/vmx: handle writes to MISC_ENABLE MSR

2020-08-20 Thread Roger Pau Monne

Such handling consist in checking that no bits have been changed from
the read value, if that's the case silently drop the write, otherwise
inject a fault.

At least Windows guests will expect to write to the MISC_ENABLE MSR
with the same value that's been read from it.

Signed-off-by: Roger Pau Monné 
Acked-by: Andrew Cooper 
---
 xen/arch/x86/hvm/vmx/vmx.c | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index a0d58ffbe2..4717e50d4a 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -3163,7 +3163,7 @@ static int vmx_msr_write_intercept(unsigned int msr, 
uint64_t msr_content)
 
 switch ( msr )
 {
-uint64_t rsvd;
+uint64_t rsvd, tmp;
 
 case MSR_IA32_SYSENTER_CS:
 __vmwrite(GUEST_SYSENTER_CS, msr_content);
@@ -3301,6 +3301,13 @@ static int vmx_msr_write_intercept(unsigned int msr, 
uint64_t msr_content)
 /* None of these MSRs are writeable. */
 goto gp_fault;
 
+case MSR_IA32_MISC_ENABLE:
+/* Silently drop writes that don't change the reported value. */
+if ( vmx_msr_read_intercept(msr, &tmp) != X86EMUL_OKAY ||
+ tmp != msr_content )
+goto gp_fault;
+break;
+
 case MSR_P6_PERFCTR(0)...MSR_P6_PERFCTR(7):
 case MSR_P6_EVNTSEL(0)...MSR_P6_EVNTSEL(7):
 case MSR_CORE_PERF_FIXED_CTR0...MSR_CORE_PERF_FIXED_CTR2:
-- 
2.28.0

[PATCH v2 6/8] x86/pv: disallow access to unknown MSRs

2020-08-20 Thread Roger Pau Monne

Change the catch-all behavior for MSR not explicitly handled. Instead
of allow full read-access to the MSR space and silently dropping
writes return an exception when the MSR is not explicitly handled.

Signed-off-by: Roger Pau Monné 
Acked-by: Andrew Cooper 
---
 xen/arch/x86/pv/emul-priv-op.c | 18 ++
 1 file changed, 10 insertions(+), 8 deletions(-)

diff --git a/xen/arch/x86/pv/emul-priv-op.c b/xen/arch/x86/pv/emul-priv-op.c
index bcc1188f6a..d4735b4f06 100644
--- a/xen/arch/x86/pv/emul-priv-op.c
+++ b/xen/arch/x86/pv/emul-priv-op.c
@@ -972,9 +972,10 @@ static int read_msr(unsigned int reg, uint64_t *val,
 }
 /* fall through */
 default:
+gdprintk(XENLOG_WARNING, "RDMSR 0x%08x unimplemented\n", reg);
+break;
+
 normal:
-/* Everyone can read the MSR space. */
-/* gdprintk(XENLOG_WARNING, "Domain attempted RDMSR %08x\n", reg); */
 if ( rdmsr_safe(reg, *val) )
 break;
 return X86EMUL_OKAY;
@@ -1141,14 +1142,15 @@ static int write_msr(unsigned int reg, uint64_t val,
 }
 /* fall through */
 default:
-if ( rdmsr_safe(reg, temp) )
-break;
+gdprintk(XENLOG_WARNING,
+ "WRMSR 0x%08x val 0x%016"PRIx64" unimplemented\n",
+ reg, val);
+break;
 
-if ( val != temp )
 invalid:
-gdprintk(XENLOG_WARNING,
- "Domain attempted WRMSR %08x from 0x%016"PRIx64" to 
0x%016"PRIx64"\n",
- reg, temp, val);
+gdprintk(XENLOG_WARNING,
+ "Domain attempted WRMSR %08x from 0x%016"PRIx64" to 
0x%016"PRIx64"\n",
+ reg, temp, val);
 return X86EMUL_OKAY;
 }
 
-- 
2.28.0

[PATCH v2 5/8] x86/pv: allow reading FEATURE_CONTROL MSR

2020-08-20 Thread Roger Pau Monne

Linux PV guests will attempt to read the FEATURE_CONTROL MSR, so move
the handling done in VMX code into guest_rdmsr as it can be shared
between PV and HVM guests that way.

Signed-off-by: Roger Pau Monné 
---
Changes from v1:
 - Move the VMX implementation into guest_rdmsr.
---
 xen/arch/x86/hvm/vmx/vmx.c |  8 +---
 xen/arch/x86/msr.c | 13 +
 2 files changed, 14 insertions(+), 7 deletions(-)

diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index 4717e50d4a..f6657af923 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -2980,13 +2980,7 @@ static int vmx_msr_read_intercept(unsigned int msr, 
uint64_t *msr_content)
 case MSR_IA32_DEBUGCTLMSR:
 __vmread(GUEST_IA32_DEBUGCTL, msr_content);
 break;
-case MSR_IA32_FEATURE_CONTROL:
-*msr_content = IA32_FEATURE_CONTROL_LOCK;
-if ( vmce_has_lmce(curr) )
-*msr_content |= IA32_FEATURE_CONTROL_LMCE_ON;
-if ( nestedhvm_enabled(curr->domain) )
-*msr_content |= IA32_FEATURE_CONTROL_ENABLE_VMXON_OUTSIDE_SMX;
-break;
+
 case MSR_IA32_VMX_BASIC...MSR_IA32_VMX_VMFUNC:
 if ( !nvmx_msr_read_intercept(msr, msr_content) )
 goto gp_fault;
diff --git a/xen/arch/x86/msr.c b/xen/arch/x86/msr.c
index a890cb9976..bb0dd5ff0a 100644
--- a/xen/arch/x86/msr.c
+++ b/xen/arch/x86/msr.c
@@ -25,6 +25,7 @@
 #include 
 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -181,6 +182,18 @@ int guest_rdmsr(struct vcpu *v, uint32_t msr, uint64_t 
*val)
 /* Not offered to guests. */
 goto gp_fault;
 
+case MSR_IA32_FEATURE_CONTROL:
+if ( !(cp->x86_vendor & X86_VENDOR_INTEL) )
+goto gp_fault;
+
+*val = IA32_FEATURE_CONTROL_LOCK;
+if ( vmce_has_lmce(v) )
+*val |= IA32_FEATURE_CONTROL_LMCE_ON;
+if ( nestedhvm_enabled(d) )
+*val |= IA32_FEATURE_CONTROL_ENABLE_VMXON_OUTSIDE_SMX;
+break;
+
+
 case MSR_IA32_PLATFORM_ID:
 if ( !(cp->x86_vendor & X86_VENDOR_INTEL) ||
  !(boot_cpu_data.x86_vendor & X86_VENDOR_INTEL) )
-- 
2.28.0

[PATCH v2 8/8] x86/msr: Drop compatibility #GP handling in guest_{rd, wr}msr()

2020-08-20 Thread Roger Pau Monne

From: Andrew Cooper 

Now that the main PV/HVM MSR handlers raise #GP for all unknown MSRs, there is
no need to special case these MSRs any more.

Signed-off-by: Andrew Cooper 
Reviewed-by: Roger Pau Monné 
---
Changes since v1:
 - New in this version.
---
 xen/arch/x86/msr.c | 46 --
 1 file changed, 46 deletions(-)

diff --git a/xen/arch/x86/msr.c b/xen/arch/x86/msr.c
index bb0dd5ff0a..560719c2aa 100644
--- a/xen/arch/x86/msr.c
+++ b/xen/arch/x86/msr.c
@@ -159,29 +159,6 @@ int guest_rdmsr(struct vcpu *v, uint32_t msr, uint64_t 
*val)
 
 switch ( msr )
 {
-case MSR_AMD_PATCHLOADER:
-case MSR_IA32_UCODE_WRITE:
-case MSR_PRED_CMD:
-case MSR_FLUSH_CMD:
-/* Write-only */
-case MSR_TEST_CTRL:
-case MSR_CORE_CAPABILITIES:
-case MSR_TSX_FORCE_ABORT:
-case MSR_TSX_CTRL:
-case MSR_MCU_OPT_CTRL:
-case MSR_RTIT_OUTPUT_BASE ... MSR_RTIT_ADDR_B(7):
-case MSR_U_CET:
-case MSR_S_CET:
-case MSR_PL0_SSP ... MSR_INTERRUPT_SSP_TABLE:
-case MSR_AMD64_LWP_CFG:
-case MSR_AMD64_LWP_CBADDR:
-case MSR_PPIN_CTL:
-case MSR_PPIN:
-case MSR_AMD_PPIN_CTL:
-case MSR_AMD_PPIN:
-/* Not offered to guests. */
-goto gp_fault;
-
 case MSR_IA32_FEATURE_CONTROL:
 if ( !(cp->x86_vendor & X86_VENDOR_INTEL) )
 goto gp_fault;
@@ -349,29 +326,6 @@ int guest_wrmsr(struct vcpu *v, uint32_t msr, uint64_t val)
 {
 uint64_t rsvd;
 
-case MSR_IA32_PLATFORM_ID:
-case MSR_CORE_CAPABILITIES:
-case MSR_INTEL_CORE_THREAD_COUNT:
-case MSR_INTEL_PLATFORM_INFO:
-case MSR_ARCH_CAPABILITIES:
-/* Read-only */
-case MSR_TEST_CTRL:
-case MSR_TSX_FORCE_ABORT:
-case MSR_TSX_CTRL:
-case MSR_MCU_OPT_CTRL:
-case MSR_RTIT_OUTPUT_BASE ... MSR_RTIT_ADDR_B(7):
-case MSR_U_CET:
-case MSR_S_CET:
-case MSR_PL0_SSP ... MSR_INTERRUPT_SSP_TABLE:
-case MSR_AMD64_LWP_CFG:
-case MSR_AMD64_LWP_CBADDR:
-case MSR_PPIN_CTL:
-case MSR_PPIN:
-case MSR_AMD_PPIN_CTL:
-case MSR_AMD_PPIN:
-/* Not offered to guests. */
-goto gp_fault;
-
 case MSR_AMD_PATCHLEVEL:
 BUILD_BUG_ON(MSR_IA32_UCODE_REV != MSR_AMD_PATCHLEVEL);
 /*
-- 
2.28.0

[PATCH v2 3/8] x86/msr: explicitly handle AMD DE_CFG

2020-08-20 Thread Roger Pau Monne

Report the hardware value of DE_CFG on AMD hardware and silently drop
writes.

Reported-by: Andrew Cooper 
Signed-off-by: Roger Pau Monné 
---
Changes since v1:
 - New in this version.
---
 xen/arch/x86/msr.c | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/xen/arch/x86/msr.c b/xen/arch/x86/msr.c
index ca4307e19f..a890cb9976 100644
--- a/xen/arch/x86/msr.c
+++ b/xen/arch/x86/msr.c
@@ -274,6 +274,14 @@ int guest_rdmsr(struct vcpu *v, uint32_t msr, uint64_t 
*val)
 *val = msrs->tsc_aux;
 break;
 
+case MSR_AMD64_DE_CFG:
+if ( !(cp->x86_vendor & (X86_VENDOR_AMD | X86_VENDOR_HYGON)) ||
+ !(boot_cpu_data.x86_vendor & (X86_VENDOR_AMD |
+   X86_VENDOR_HYGON)) ||
+ rdmsr_safe(MSR_AMD64_DE_CFG, *val) )
+goto gp_fault;
+break;
+
 case MSR_AMD64_DR0_ADDRESS_MASK:
 case MSR_AMD64_DR1_ADDRESS_MASK ... MSR_AMD64_DR3_ADDRESS_MASK:
 if ( !cp->extd.dbext )
@@ -499,6 +507,12 @@ int guest_wrmsr(struct vcpu *v, uint32_t msr, uint64_t val)
 wrmsr_tsc_aux(val);
 break;
 
+case MSR_AMD64_DE_CFG:
+if ( !(cp->x86_vendor & (X86_VENDOR_AMD | X86_VENDOR_HYGON)) ||
+ !(boot_cpu_data.x86_vendor & (X86_VENDOR_AMD | X86_VENDOR_HYGON)) 
)
+goto gp_fault;
+break;
+
 case MSR_AMD64_DR0_ADDRESS_MASK:
 case MSR_AMD64_DR1_ADDRESS_MASK ... MSR_AMD64_DR3_ADDRESS_MASK:
 if ( !cp->extd.dbext || val != (uint32_t)val )
-- 
2.28.0

[PATCH v2 7/8] x86/hvm: Disallow access to unknown MSRs

2020-08-20 Thread Roger Pau Monne

From: Andrew Cooper 

Change the catch-all behavior for MSR not explicitly handled. Instead
of allow full read-access to the MSR space and silently dropping
writes return an exception when the MSR is not explicitly handled.

Signed-off-by: Andrew Cooper 
[remove rdmsr_safe from default case in svm_msr_read_intercept]
Signed-off-by: Roger Pau Monné 
---
Changes since v1:
 - Fold chunk to remove explicit write handling of VMX MSRs just to
   #GP.
 - Remove catch-all rdmsr_safe in svm_msr_read_intercept.
---
 xen/arch/x86/hvm/svm/svm.c | 11 ---
 xen/arch/x86/hvm/vmx/vmx.c | 16 
 2 files changed, 8 insertions(+), 19 deletions(-)

diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c
index 7586b77268..1e4458c184 100644
--- a/xen/arch/x86/hvm/svm/svm.c
+++ b/xen/arch/x86/hvm/svm/svm.c
@@ -1952,9 +1952,6 @@ static int svm_msr_read_intercept(unsigned int msr, 
uint64_t *msr_content)
 break;
 
 default:
-if ( rdmsr_safe(msr, *msr_content) == 0 )
-break;
-
 if ( boot_cpu_data.x86 == 0xf && msr == MSR_F10_BU_CFG )
 {
 /* Win2k8 x64 reads this MSR on revF chips, where it
@@ -1967,6 +1964,7 @@ static int svm_msr_read_intercept(unsigned int msr, 
uint64_t *msr_content)
 break;
 }
 
+gdprintk(XENLOG_WARNING, "RDMSR 0x%08x unimplemented\n", msr);
 goto gpf;
 }
 
@@ -2154,10 +2152,9 @@ static int svm_msr_write_intercept(unsigned int msr, 
uint64_t msr_content)
 break;
 
 default:
-/* Match up with the RDMSR side; ultimately this should go away. */
-if ( rdmsr_safe(msr, msr_content) == 0 )
-break;
-
+gdprintk(XENLOG_WARNING,
+ "WRMSR 0x%08x val 0x%016"PRIx64" unimplemented\n",
+ msr, msr_content);
 goto gpf;
 }
 
diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index f6657af923..9cc9d81c41 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -3015,9 +3015,7 @@ static int vmx_msr_read_intercept(unsigned int msr, 
uint64_t *msr_content)
 break;
 }
 
-if ( rdmsr_safe(msr, *msr_content) == 0 )
-break;
-
+gdprintk(XENLOG_WARNING, "RDMSR 0x%08x unimplemented\n", msr);
 goto gp_fault;
 }
 
@@ -3290,11 +3288,6 @@ static int vmx_msr_write_intercept(unsigned int msr, 
uint64_t msr_content)
 __vmwrite(GUEST_IA32_DEBUGCTL, msr_content);
 break;
 
-case MSR_IA32_FEATURE_CONTROL:
-case MSR_IA32_VMX_BASIC ... MSR_IA32_VMX_VMFUNC:
-/* None of these MSRs are writeable. */
-goto gp_fault;
-
 case MSR_IA32_MISC_ENABLE:
 /* Silently drop writes that don't change the reported value. */
 if ( vmx_msr_read_intercept(msr, &tmp) != X86EMUL_OKAY ||
@@ -3320,10 +3313,9 @@ static int vmx_msr_write_intercept(unsigned int msr, 
uint64_t msr_content)
  is_last_branch_msr(msr) )
 break;
 
-/* Match up with the RDMSR side; ultimately this should go away. */
-if ( rdmsr_safe(msr, msr_content) == 0 )
-break;
-
+gdprintk(XENLOG_WARNING,
+ "WRMSR 0x%08x val 0x%016"PRIx64" unimplemented\n",
+ msr, msr_content);
 goto gp_fault;
 }
 
-- 
2.28.0

Re: Xen 4.14.0 fails on Dell IoT Gateway without efi=no-rs

2020-08-20 Thread Roman Shaposhnik

On Thu, Aug 20, 2020 at 5:56 AM Andrew Cooper  wrote:
>
> On 19/08/2020 23:50, Roman Shaposhnik wrote:
> >  Hi!
> >
> > below you can see a trace of Xen 4.14.0 failing on Dell IoT Gateway 3001
> > without efi=no-rs. Please let me know if I can provide any additional
> > information.
>
> Just to be able to get all datapoints, could you build Xen with
> CONFIG_EFI_SET_VIRTUAL_ADDRESS_MAP and see if the failure mode changes?

It does. I rebuilt with the above + debug=y and here's what I got:

 Xen 4.14.0
(XEN) Xen version 4.14.0 (@) (gcc (Alpine 6.4.0) 6.4.0) debug=y  Thu
Aug 20 19:02:55 UTC 2020
(XEN) Latest ChangeSet:
(XEN) build-id: 035c23a8644576897a7380a0837505de8460d7e8
(XEN) Bootloader: GRUB 2.03
(XEN) Command line: com1=115200,8n1 console=com1
dom0_mem=1024M,max:1024M dom0_max_vcpus=1 dom0_vcpus_pin
(XEN) Xen image load base address: 0x70c0
(XEN) Video information:
(XEN)  VGA is text mode 80x25, font 8x16
(XEN) Disc information:
(XEN)  Found 0 MBR signatures
(XEN)  Found 1 EDD information structures
(XEN) CPU Vendor: Intel, Family 6 (0x6), Model 55 (0x37), Stepping 9
(raw 00030679)
(XEN) No NUMA configuration found
(XEN) Faking a node at -7900
(XEN) Domain heap initialised
(XEN) SMBIOS 3.0 present.
(XEN) DMI 3.0 present.
(XEN) Using APIC driver default
(XEN) ACPI: PM-Timer IO Port: 0x408 (24 bits)
(XEN) ACPI: v5 SLEEP INFO: control[0:0], status[0:0]
(XEN) ACPI: SLEEP INFO: pm1x_cnt[1:404,1:0], pm1x_evt[1:400,1:0]
(XEN) ACPI: 32/64X FACS address mismatch in FADT -
772dde80/, using 32
(XEN) ACPI: wakeup_vec[772dde8c], vec_size[20]
(XEN) ACPI: Local APIC address 0xfee0
(XEN) ACPI: LAPIC (acpi_id[0x01] lapic_id[0x00] enabled)
(XEN) ACPI: LAPIC (acpi_id[0x02] lapic_id[0x04] enabled)
(XEN) ACPI: LAPIC_NMI (acpi_id[0x01] high edge lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x02] high edge lint[0x1])
(XEN) ACPI: IOAPIC (id[0x01] address[0xfec0] gsi_base[0])
(XEN) IOAPIC[0]: apic_id 1, version 32, address 0xfec0, GSI 0-86
(XEN) ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
(XEN) ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
(XEN) ACPI: IRQ0 used by override.
(XEN) ACPI: IRQ2 used by override.
(XEN) ACPI: IRQ9 used by override.
(XEN) Enabling APIC mode:  Flat.  Using 1 I/O APICs
(XEN) ACPI: HPET id: 0x8086a201 base: 0xfed0
(XEN) PCI: MCFG configuration 0: base e000 segment  buses 00 - ff
(XEN) PCI: MCFG area at e000 reserved in E820
(XEN) PCI: Using MCFG for segment  bus 00-ff
(XEN) Using ACPI (MADT) for SMP configuration information
(XEN) SMP: Allowing 2 CPUs (0 hotplug CPUs)
(XEN) IRQ limits: 87 GSI, 609 MSI/MSI-X
(XEN) CPU0: 400..1000 MHz
(XEN) mce_intel.c:779: MCA Capability: firstbank 0, extended MCE MSR 0, BCAST
(XEN) Thermal monitoring handled by SMI
(XEN) CPU0: Intel machine check reporting enabled
(XEN) Fixup #GP[]: 82d0405c6b5f
[init_speculation_mitigations+0xee/0x1717] -> 82d0404f1b94
(XEN) Speculative mitigation facilities:
(XEN)   Hardware features:
(XEN)   Compiled-in support: SHADOW_PAGING
(XEN)   Xen settings: BTI-Thunk N/A, SPEC_CTRL: No, Other: BRANCH_HARDEN
(XEN)   Support for HVM VMs: RSB
(XEN)   Support for PV VMs: RSB
(XEN)   XPTI (64-bit PV only): Dom0 enabled, DomU enabled (without PCID)
(XEN)   PV L1TF shadowing: Dom0 disabled, DomU disabled
(XEN) Using scheduler: SMP Credit Scheduler rev2 (credit2)
(XEN) Initializing Credit2 scheduler
(XEN)  load_precision_shift: 18
(XEN)  load_window_shift: 30
(XEN)  underload_balance_tolerance: 0
(XEN)  overload_balance_tolerance: -3
(XEN)  runqueues arrangement: socket
(XEN)  cap enforcement granularity: 10ms
(XEN) load tracking window length 1073741824 ns
(XEN) Disabling HPET for being unreliable
(XEN) Platform timer is 3.580MHz ACPI PM Timer
(XEN) Detected 1333.394 MHz processor.
(XEN) EFI memory map:
(XEN)  0-07fff type=3 attr=000f
(XEN)  08000-0bfff type=2 attr=000f
(XEN)  0c000-2efff type=7 attr=000f
(XEN)  2f000-3efff type=2 attr=000f
(XEN)  3f000-3 type=10 attr=000f
(XEN)  4-9 type=3 attr=000f
(XEN)  00010-001c03fff type=2 attr=000f
(XEN)  001c04000-01fff type=7 attr=000f
(XEN)  02000-0200f type=0 attr=000f
(XEN)  02010-03ca89fff type=7 attr=000f
(XEN)  03ca8a000-058ff type=1 attr=000f
(XEN)  05900-05901 type=4 attr=000f
(XEN)  05902-070df type=7 attr=000f
(XEN)  070e0-0715eefff type=2 attr=000f
(XEN)  0715ef000-07167afff type=7 attr=000f
(XEN)  07167b000-07167bfff type=2 attr=000f
(XEN)  07167c000-071681fff type=7 attr=000f
(XEN)  071682000-071776fff type=1 attr=00

Re: [RFC PATCH V1 01/12] hvm/ioreq: Make x86's IOREQ feature common

2020-08-20 Thread Oleksandr




On 12.08.20 11:19, Julien Grall wrote:

Hi,


Hi Julien, Stefano




On 11/08/2020 23:48, Stefano Stabellini wrote:
I have the impression that we disagree in what the Device Emulator 
is meant to

do. IHMO, the goal of the device emulator is to emulate a device in an
arch-agnostic way.


That would be great in theory but I am not sure it is achievable: if we
use an existing emulator like QEMU, even a single device has to fit
into QEMU's view of the world, which makes assumptions about host
bridges and apertures. It is impossible today to build QEMU in an
arch-agnostic way, it has to be tied to an architecture.


AFAICT, the only reason QEMU cannot build be in an arch-agnostic way 
is because of TCG. If this wasn't built then you could easily write a 
machine that doesn't depend on the instruction set.


The proof is, today, we are using QEMU x86 to serve Arm64 guest. 
Although this is only for PV drivers.




I realize we are not building this interface for QEMU specifically, but
even if we try to make the interface arch-agnostic, in reality the
emulators won't be arch-agnostic.


This depends on your goal. If your goal is to write a standalone 
emulator for a single device, then it is entirely possible to make it 
arch-agnostic.


Per above, this would even be possible if you were emulating a set of 
devices.


What I want to avoid is requiring all the emulators to contain 
arch-specific code just because it is easier to get QEMU working on 
Xen on Arm.



If we send a port-mapped I/O request
to qemu-system-aarch64 who knows what is going to happen: it is a code
path that it is not explicitly tested.


Maybe, maybe not. To me this is mostly software issues that can easily 
be mitigated if we do proper testing...


Could we please find a common ground on whether the PIO handling needs 
to be implemented on Arm or not? At least for the current patch series.



Below my thoughts:
From one side I agree that emulator shouldn't contain any arch-specific 
code, yes it is hypervisor specific but it should be arch agnostic if 
possible. So PIO case should be handled.
From other side I tend to think that it might be possible to skip PIO 
handling for the current patch series (leave it x86 specific for now as 
we do with handle_realmode_completion()).
I think nothing will prevent us from adding PIO handling later on if 
there is a real need (use-case) for that. Please correct me if I am wrong.


I would be absolutely OK with any options.

What do you think?


--
Regards,

Oleksandr Tyshchenko

Re: Xen 4.14.0 fails on Dell IoT Gateway without efi=no-rs

2020-08-20 Thread Roman Shaposhnik

On Thu, Aug 20, 2020 at 1:34 AM Jan Beulich  wrote:
>
> On 20.08.2020 00:50, Roman Shaposhnik wrote:
> > below you can see a trace of Xen 4.14.0 failing on Dell IoT Gateway 3001
> > without efi=no-rs. Please let me know if I can provide any additional
> > information.
>
> One of the usual firmware issues:
>
> > Xen 4.14.0
> > (XEN) Xen version 4.14.0 (@) (gcc (Alpine 6.4.0) 6.4.0) debug=n  Sat Jul 25
> > 23:45:43 UTC 2020
> > (XEN) Latest ChangeSet:
> > (XEN) Bootloader: GRUB 2.03
> > (XEN) Command line: com1=115200,8n1 console=com1 dom0_mem=1024M,max:1024M
> > dom0_max_vcpus=1 dom0_vcpus_pin
> > (XEN) Xen image load base address: 0x7100
> > (XEN) Video information:
> > (XEN)  VGA is text mode 80x25, font 8x16
> > (XEN) Disc information:
> > (XEN)  Found 0 MBR signatures
> > (XEN)  Found 1 EDD information structures
> > (XEN) EFI RAM map:
> > (XEN)  [, 0003efff] (usable)
> > (XEN)  [0003f000, 0003] (ACPI NVS)
> > (XEN)  [0004, 0009] (usable)
> > (XEN)  [0010, 1fff] (usable)
> > (XEN)  [2000, 200f] (reserved)
> > (XEN)  [2010, 76ccafff] (usable)
> > (XEN)  [76ccb000, 76d42fff] (reserved)
> > (XEN)  [76d43000, 76d53fff] (ACPI data)
> > (XEN)  [76d54000, 772ddfff] (ACPI NVS)
> > (XEN)  [772de000, 775f4fff] (reserved)
> > (XEN)  [775f5000, 775f5fff] (usable)
> > (XEN)  [775f6000, 77637fff] (reserved)
> > (XEN)  [77638000, 789e4fff] (usable)
> > (XEN)  [789e5000, 78ff9fff] (reserved)
> > (XEN)  [78ffa000, 78ff] (usable)
> > (XEN)  [e000, efff] (reserved)
> > (XEN)  [fec0, fec00fff] (reserved)
> > (XEN)  [fed01000, fed01fff] (reserved)
> > (XEN)  [fed03000, fed03fff] (reserved)
> > (XEN)  [fed08000, fed08fff] (reserved)
> > (XEN)  [fed0c000, fed0] (reserved)
> > (XEN)  [fed1c000, fed1cfff] (reserved)
> > (XEN)  [fee0, fee00fff] (reserved)
> > (XEN)  [fef0, feff] (reserved)
> > (XEN)  [ff90, ] (reserved)
> > (XEN) System RAM: 1919MB (1965176kB)
> > (XEN) ACPI: RSDP 76D46000, 0024 (r2   DELL)
> > (XEN) ACPI: XSDT 76D46088, 0094 (r1   DELL AS09  1072009 AMI 10013)
> > (XEN) ACPI: FACP 76D52560, 010C (r5   DELL AS09  1072009 AMI 10013)
> > (XEN) ACPI: DSDT 76D461B0, C3AF (r2   DELL AS09  1072009 INTL 20120913)
> > (XEN) ACPI: FACS 772DDE80, 0040
> > (XEN) ACPI: APIC 76D52670, 0068 (r3   DELL AS09  1072009 AMI 10013)
> > (XEN) ACPI: FPDT 76D526D8, 0044 (r1   DELL AS09  1072009 AMI 10013)
> > (XEN) ACPI: FIDT 76D52720, 009C (r1   DELL AS09  1072009 AMI 10013)
> > (XEN) ACPI: MCFG 76D527C0, 003C (r1   DELL AS09  1072009 MSFT   97)
> > (XEN) ACPI: LPIT 76D52800, 0104 (r1   DELL AS093 VLV2  10D)
> > (XEN) ACPI: HPET 76D52908, 0038 (r1   DELL AS09  1072009 AMI.5)
> > (XEN) ACPI: SSDT 76D52940, 0763 (r1   DELL AS09 3000 INTL 20061109)
> > (XEN) ACPI: SSDT 76D530A8, 0290 (r1   DELL AS09 3000 INTL 20061109)
> > (XEN) ACPI: SSDT 76D53338, 017A (r1   DELL AS09 3000 INTL 20061109)
> > (XEN) ACPI: UEFI 76D534B8, 0042 (r1   DELL AS090 0)
> > (XEN) ACPI: CSRT 76D53500, 014C (r0   DELL AS095 INTL 20120624)
> > (XEN) ACPI: TPM2 76D53650, 0034 (r3Tpm2Tabl1 AMI 0)
> > (XEN) ACPI: SSDT 76D53688, 00C9 (r1   MSFT  RHPROXY1 INTL 20120913)
> > (XEN) Domain heap initialised
> > (XEN) ACPI: 32/64X FACS address mismatch in FADT -
> > 772dde80/, using 32
> > (XEN) IOAPIC[0]: apic_id 1, version 32, address 0xfec0, GSI 0-86
> > (XEN) Enabling APIC mode:  Flat.  Using 1 I/O APICs
> > (XEN) CPU0: 400..1000 MHz
> > (XEN) Speculative mitigation facilities:
> > (XEN)   Hardware features:
> > (XEN)   Compiled-in support: SHADOW_PAGING
> > (XEN)   Xen settings: BTI-Thunk N/A, SPEC_CTRL: No, Other: BRANCH_HARDEN
> > (XEN)   Support for HVM VMs: RSB
> > (XEN)   Support for PV VMs: RSB
> > (XEN)   XPTI (64-bit PV only): Dom0 enabled, DomU enabled (without PCID)
> > (XEN)   PV L1TF shadowing: Dom0 disabled, DomU disabled
> > (XEN) Using scheduler: SMP Credit Scheduler rev2 (credit2)
> > (XEN) Initializing Credit2 scheduler
> > (XEN) Disabling HPET for being unreliable
> > (XEN) Platform timer is 3.580MHz ACPI PM Timer
> > (XEN) Detected 1333.397 MHz processor.
> > (XEN) Unknown cachability for MFNs 0xff900-0xf
>
> The fault address falling in this range suggests you can use a less
> heavy workaround: "efi=attr=uc". (Quite possibly "efi=no-rs" or yet
> some other workaround may still be needed for your subsequent reboot
> hang.)

I just tried and efi=attr=uc and it is, indeed, a workaro

[PATCH 0/2] x86/vpic: minor fixes

2020-08-20 Thread Roger Pau Monne

Hello,

This series contains one non-functional change and one small fix for
pci-passthrough when using the 8259A PIC. I very much doubt anyone has
done pci-passthrough on guests using the legacy PIC, but nonetheless
let's aim for it to be correct.

Thanks, Roger.

Roger Pau Monne (2):
  x86/vpic: rename irq to pin in vpic_ioport_write
  x86/vpic: also execute dpci callback for non-specific EOI

 xen/arch/x86/hvm/vpic.c | 24 +---
 1 file changed, 13 insertions(+), 11 deletions(-)

-- 
2.28.0

[PATCH 1/2] x86/vpic: rename irq to pin in vpic_ioport_write

2020-08-20 Thread Roger Pau Monne

The irq variable is wrongly named, as it's used to store the pin on
the 8259 chip, but not the global irq value. While renaming reduce
it's scope and make it unsigned.

No functional change intended.

Signed-off-by: Roger Pau Monné 
---
 xen/arch/x86/hvm/vpic.c | 18 ++
 1 file changed, 10 insertions(+), 8 deletions(-)

diff --git a/xen/arch/x86/hvm/vpic.c b/xen/arch/x86/hvm/vpic.c
index 936c7b27c6..feb1db2ee3 100644
--- a/xen/arch/x86/hvm/vpic.c
+++ b/xen/arch/x86/hvm/vpic.c
@@ -184,7 +184,7 @@ static int vpic_intack(struct hvm_hw_vpic *vpic)
 static void vpic_ioport_write(
 struct hvm_hw_vpic *vpic, uint32_t addr, uint32_t val)
 {
-int priority, cmd, irq;
+int priority, cmd;
 uint8_t mask, unmasked = 0;
 
 vpic_lock(vpic);
@@ -230,6 +230,8 @@ static void vpic_ioport_write(
 }
 else
 {
+unsigned int pin;
+
 /* OCW2 */
 cmd = val >> 5;
 switch ( cmd )
@@ -246,22 +248,22 @@ static void vpic_ioport_write(
 priority = vpic_get_priority(vpic, mask);
 if ( priority == VPIC_PRIO_NONE )
 break;
-irq = (priority + vpic->priority_add) & 7;
-vpic->isr &= ~(1 << irq);
+pin = (priority + vpic->priority_add) & 7;
+vpic->isr &= ~(1 << pin);
 if ( cmd == 5 )
-vpic->priority_add = (irq + 1) & 7;
+vpic->priority_add = (pin + 1) & 7;
 break;
 case 3: /* Specific EOI*/
 case 7: /* Specific EOI & Rotate   */
-irq = val & 7;
-vpic->isr &= ~(1 << irq);
+pin = val & 7;
+vpic->isr &= ~(1 << pin);
 if ( cmd == 7 )
-vpic->priority_add = (irq + 1) & 7;
+vpic->priority_add = (pin + 1) & 7;
 /* Release lock and EOI the physical interrupt (if any). */
 vpic_update_int_output(vpic);
 vpic_unlock(vpic);
 hvm_dpci_eoi(current->domain,
- hvm_isa_irq_to_gsi((addr >> 7) ? (irq|8) : irq),
+ hvm_isa_irq_to_gsi((addr >> 7) ? (pin | 8) : pin),
  NULL);
 return; /* bail immediately */
 case 6: /* Set Priority*/
-- 
2.28.0

[PATCH 2/2] x86/vpic: also execute dpci callback for non-specific EOI

2020-08-20 Thread Roger Pau Monne

Currently the dpci EOI callback is only executed for specific EOIs.
This is wrong as non-specific EOIs will also clear the ISR bit and
thus end the interrupt. Re-arrange the code a bit so that the common
EOI handling path can be shared between all EOI modes.

Signed-off-by: Roger Pau Monné 
---
 xen/arch/x86/hvm/vpic.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/xen/arch/x86/hvm/vpic.c b/xen/arch/x86/hvm/vpic.c
index feb1db2ee3..3cf12581e9 100644
--- a/xen/arch/x86/hvm/vpic.c
+++ b/xen/arch/x86/hvm/vpic.c
@@ -249,15 +249,15 @@ static void vpic_ioport_write(
 if ( priority == VPIC_PRIO_NONE )
 break;
 pin = (priority + vpic->priority_add) & 7;
-vpic->isr &= ~(1 << pin);
-if ( cmd == 5 )
-vpic->priority_add = (pin + 1) & 7;
-break;
+goto common_eoi;
+
 case 3: /* Specific EOI*/
 case 7: /* Specific EOI & Rotate   */
 pin = val & 7;
+
+common_eoi:
 vpic->isr &= ~(1 << pin);
-if ( cmd == 7 )
+if ( cmd == 7 || cmd == 5 )
 vpic->priority_add = (pin + 1) & 7;
 /* Release lock and EOI the physical interrupt (if any). */
 vpic_update_int_output(vpic);
-- 
2.28.0

Re: [PATCH v2 26/58] xen-legacy-backend: Add missing typedef XenLegacyDevice

2020-08-20 Thread Anthony PERARD

On Wed, Aug 19, 2020 at 08:12:04PM -0400, Eduardo Habkost wrote:
> The typedef was used in the XENBACKEND_DEVICE macro, but it was
> never defined.  Define the typedef close to the type checking
> macro.
> 
> Signed-off-by: Eduardo Habkost 

Acked-by: Anthony PERARD 

Thanks,

-- 
Anthony PERARD

Re: Xen 4.14.0 fails on Dell IoT Gateway without efi=no-rs

2020-08-20 Thread Roman Shaposhnik

On Thu, Aug 20, 2020 at 6:10 AM Rich Persaud  wrote:
>
> On Aug 20, 2020, at 07:24, George Dunlap  wrote:
>
>
> 
> On Thu, Aug 20, 2020 at 9:35 AM Jan Beulich  wrote:
>>
>>
>> As far as making cases like this work by default, I'm afraid it'll
>> need to be proposed to replace me as the maintainer of EFI code in
>> Xen. I will remain on the position that it is not acceptable to
>> apply workarounds for firmware issues by default unless they're
>> entirely benign to spec-conforming systems. DMI data based enabling
>> of workarounds, for example, is acceptable in the common case, as
>> long as the matching pattern isn't unreasonably wide.
>
>
> It sort of sounds like it would be useful to have a wider discussion on this 
> then, to hash out what exactly it is we want to do as a project.
>
>  -George
>
>
> Sometimes a middle ground is possible, e.g. see this Nov 2019 thread about a 
> possible Xen Kconfig option for EFI_NONSPEC_COMPATIBILITY, targeting 
> Edge/IoT/laptop hardware:
>
> https://lists.archive.carbon60.com/xen/devel/571670#571670

Yup. Having that top-level knob is exactly what I had in mind as the first step.
We can debate whether it needs to be on or off by default later, but having
it is a very burning problem. Otherwise distros like EVE and QubesOS just
to give you two obvious examples have a very difficult time sharing "best
practices" of what works on those types of devices.

> In the years to come, edge devices will only grow in numbers.  Some will be 
> supported in production for more than a decade, which will require new 
> long-term commercial support mechanisms for device BIOS, rather than firmware 
> engineers shifting focus after a device is launched.

That's exactly what we're seeing with ZEDEDA customers.

> In parallel to (opt-in) Xen workarounds for a constrained and documented set 
> of firmware issues, we need more industry efforts to support open firmware, 
> like coreboot and OCP Open System Firmware with minimum binary blobs.  At 
> least one major x86 OEM is expected to ship open firmware in one of their 
> popular devices, which may encourage competing OEM devices to follow.
>
> PC Engines APU2 (dual-core AMD, 4GB RAM, 6W TDP, triple NIC + LTE) is one 
> available edge device which supports Xen and has open (coreboot) firmware.  
> It would be nice to include APU2 in LF Edge support, if only to provide 
> competition to OEM devices with buggy firmware. Upcoming Intel Tiger Lake 
> (Core) and Elkhart Lake (Atom Tremont) are expected to expand edge-relevant 
> security features, which would make such devices attractive to Xen 
> deployments.

Funny you should mention it -- APU2 is my weekend project for this
coming weekend to make EVE/Xen run on it out-of-the box. I'll be using
SeaBIOS payload for now, but the ultimate goal is to turn EVE into a
payload itself.

> We also need edge software vendors to encourage device OEMs to enable open 
> firmware via coreboot, OCP OSF, Intel MinPlatform and similar programs. See 
> https://software.intel.com/content/www/us/en/develop/articles/minimum-platform-architecture-open-source-uefi-firmware-for-intel-based-platforms.html
>  and other talks from the open firmware conference, https://osfc.io/archive

Thanks,
Roman.

Re: [PATCH 1/2] x86/vpic: rename irq to pin in vpic_ioport_write

2020-08-20 Thread Andrew Cooper

On 20/08/2020 16:34, Roger Pau Monne wrote:
> The irq variable is wrongly named, as it's used to store the pin on
> the 8259 chip, but not the global irq value. While renaming reduce
> it's scope and make it unsigned.
>
> No functional change intended.
>
> Signed-off-by: Roger Pau Monné 

Acked-by: Andrew Cooper

Re: [PATCH 2/2] x86/vpic: also execute dpci callback for non-specific EOI

2020-08-20 Thread Andrew Cooper

On 20/08/2020 16:34, Roger Pau Monne wrote:
> Currently the dpci EOI callback is only executed for specific EOIs.
> This is wrong as non-specific EOIs will also clear the ISR bit and
> thus end the interrupt. Re-arrange the code a bit so that the common
> EOI handling path can be shared between all EOI modes.
>
> Signed-off-by: Roger Pau Monné 
> ---
>  xen/arch/x86/hvm/vpic.c | 10 +-
>  1 file changed, 5 insertions(+), 5 deletions(-)
>
> diff --git a/xen/arch/x86/hvm/vpic.c b/xen/arch/x86/hvm/vpic.c
> index feb1db2ee3..3cf12581e9 100644
> --- a/xen/arch/x86/hvm/vpic.c
> +++ b/xen/arch/x86/hvm/vpic.c
> @@ -249,15 +249,15 @@ static void vpic_ioport_write(
>  if ( priority == VPIC_PRIO_NONE )
>  break;
>  pin = (priority + vpic->priority_add) & 7;
> -vpic->isr &= ~(1 << pin);
> -if ( cmd == 5 )
> -vpic->priority_add = (pin + 1) & 7;
> -break;
> +goto common_eoi;
> +
>  case 3: /* Specific EOI*/
>  case 7: /* Specific EOI & Rotate   */
>  pin = val & 7;

You'll need a /* Fallthrough */ here to keep various things happy.

Otherwise, Acked-by: Andrew Cooper 

Can fix on commit if you're happy.

> +
> +common_eoi:
>  vpic->isr &= ~(1 << pin);
> -if ( cmd == 7 )
> +if ( cmd == 7 || cmd == 5 )
>  vpic->priority_add = (pin + 1) & 7;
>  /* Release lock and EOI the physical interrupt (if any). */
>  vpic_update_int_output(vpic);

Re: [PATCH 2/2] x86/vpic: also execute dpci callback for non-specific EOI

2020-08-20 Thread Roger Pau Monné

On Thu, Aug 20, 2020 at 05:28:21PM +0100, Andrew Cooper wrote:
> On 20/08/2020 16:34, Roger Pau Monne wrote:
> > Currently the dpci EOI callback is only executed for specific EOIs.
> > This is wrong as non-specific EOIs will also clear the ISR bit and
> > thus end the interrupt. Re-arrange the code a bit so that the common
> > EOI handling path can be shared between all EOI modes.
> >
> > Signed-off-by: Roger Pau Monné 
> > ---
> >  xen/arch/x86/hvm/vpic.c | 10 +-
> >  1 file changed, 5 insertions(+), 5 deletions(-)
> >
> > diff --git a/xen/arch/x86/hvm/vpic.c b/xen/arch/x86/hvm/vpic.c
> > index feb1db2ee3..3cf12581e9 100644
> > --- a/xen/arch/x86/hvm/vpic.c
> > +++ b/xen/arch/x86/hvm/vpic.c
> > @@ -249,15 +249,15 @@ static void vpic_ioport_write(
> >  if ( priority == VPIC_PRIO_NONE )
> >  break;
> >  pin = (priority + vpic->priority_add) & 7;
> > -vpic->isr &= ~(1 << pin);
> > -if ( cmd == 5 )
> > -vpic->priority_add = (pin + 1) & 7;
> > -break;
> > +goto common_eoi;
> > +
> >  case 3: /* Specific EOI*/
> >  case 7: /* Specific EOI & Rotate   */
> >  pin = val & 7;
> 
> You'll need a /* Fallthrough */ here to keep various things happy.
> 
> Otherwise, Acked-by: Andrew Cooper 
> 
> Can fix on commit if you're happy.

Sure, I was borderline to add it but somehow assumed that
/* Fallthrough */ was required for cases but not labels.

Thanks, Roger.

Re: [PATCH v2 3/8] x86/msr: explicitly handle AMD DE_CFG

2020-08-20 Thread Andrew Cooper

On 20/08/2020 16:08, Roger Pau Monne wrote:

> diff --git a/xen/arch/x86/msr.c b/xen/arch/x86/msr.c
> index ca4307e19f..a890cb9976 100644
> --- a/xen/arch/x86/msr.c
> +++ b/xen/arch/x86/msr.c
> @@ -274,6 +274,14 @@ int guest_rdmsr(struct vcpu *v, uint32_t msr, uint64_t 
> *val)
>  *val = msrs->tsc_aux;
>  break;
>  
> +case MSR_AMD64_DE_CFG:
> +if ( !(cp->x86_vendor & (X86_VENDOR_AMD | X86_VENDOR_HYGON)) ||
> + !(boot_cpu_data.x86_vendor & (X86_VENDOR_AMD |
> +   X86_VENDOR_HYGON)) ||
> + rdmsr_safe(MSR_AMD64_DE_CFG, *val) )
> +goto gp_fault;
> +break;

Ah.  What I intended was to read just bit 2 and nothing else.

Leaking the full value is non-ideal from a migration point of view, and
in this case, you can avoid querying hardware entirely.

Just return AMD64_DE_CFG_LFENCE_SERIALISE here.  The only case where it
won't be true is when the hypervisor running us (i.e. Xen) failed to set
it up, and the CPU boot path failed to adjust it, at which point the
whole system has much bigger problems.

> +
>  case MSR_AMD64_DR0_ADDRESS_MASK:
>  case MSR_AMD64_DR1_ADDRESS_MASK ... MSR_AMD64_DR3_ADDRESS_MASK:
>  if ( !cp->extd.dbext )
> @@ -499,6 +507,12 @@ int guest_wrmsr(struct vcpu *v, uint32_t msr, uint64_t 
> val)
>  wrmsr_tsc_aux(val);
>  break;
>  
> +case MSR_AMD64_DE_CFG:
> +if ( !(cp->x86_vendor & (X86_VENDOR_AMD | X86_VENDOR_HYGON)) ||
> + !(boot_cpu_data.x86_vendor & (X86_VENDOR_AMD | 
> X86_VENDOR_HYGON)) )
> +goto gp_fault;
> +break;

There should be no problem yielding #GP here (i.e. dropping this hunk).

IIRC, it was the behaviour of certain hypervisors when Spectre hit, so
all guests ought to cope.  (And indeed, not try to redundantly set the
bit to start with).

~Andrew

> +
>  case MSR_AMD64_DR0_ADDRESS_MASK:
>  case MSR_AMD64_DR1_ADDRESS_MASK ... MSR_AMD64_DR3_ADDRESS_MASK:
>  if ( !cp->extd.dbext || val != (uint32_t)val )

Re: [PATCH 08/14] kernel-doc: public/memory.h

2020-08-20 Thread Stefano Stabellini

On Tue, 18 Aug 2020, Jan Beulich wrote:
> On 18.08.2020 00:56, Stefano Stabellini wrote:
> > On Mon, 17 Aug 2020, Jan Beulich wrote:
> >> On 07.08.2020 23:51, Stefano Stabellini wrote:
> >>> On Fri, 7 Aug 2020, Jan Beulich wrote:
>  On 07.08.2020 01:49, Stefano Stabellini wrote:
> > @@ -200,90 +236,115 @@ DEFINE_XEN_GUEST_HANDLE(xen_machphys_mfn_list_t);
> >   */
> >  #define XENMEM_machphys_compat_mfn_list 25
> >  
> > -/*
> > +#define XENMEM_machphys_mapping 12
> > +/**
> > + * struct xen_machphys_mapping - XENMEM_machphys_mapping
> > + *
> >   * Returns the location in virtual address space of the machine_to_phys
> >   * mapping table. Architectures which do not have a m2p table, or 
> > which do not
> >   * map it by default into guest address space, do not implement this 
> > command.
> >   * arg == addr of xen_machphys_mapping_t.
> >   */
> > -#define XENMEM_machphys_mapping 12
> >  struct xen_machphys_mapping {
> > +/** @v_start: Start virtual address */
> >  xen_ulong_t v_start, v_end; /* Start and end virtual addresses.   
> > */
> > -xen_ulong_t max_mfn;/* Maximum MFN that can be looked up. 
> > */
> > +/** @v_end: End virtual addresses */
> > +xen_ulong_t v_end;
> > +/** @max_mfn: Maximum MFN that can be looked up */
> > +xen_ulong_t max_mfn;
> >  };
> >  typedef struct xen_machphys_mapping xen_machphys_mapping_t;
> >  DEFINE_XEN_GUEST_HANDLE(xen_machphys_mapping_t);
> >  
> > -/* Source mapping space. */
> > +/**
> > + * DOC: Source mapping space.
> > + *
> > + * - XENMAPSPACE_shared_info:  shared info page
> > + * - XENMAPSPACE_grant_table:  grant table page
> > + * - XENMAPSPACE_gmfn: GMFN
> > + * - XENMAPSPACE_gmfn_range:   GMFN range, XENMEM_add_to_physmap only.
> > + * - XENMAPSPACE_gmfn_foreign: GMFN from another dom,
> > + * XENMEM_add_to_physmap_batch only.
> > + * - XENMAPSPACE_dev_mmio: device mmio region ARM only; the region 
> > is mapped
> > + * in Stage-2 using the Normal 
> > MemoryInner/Outer
> > + * Write-Back Cacheable memory attribute.
> > + */
> >  /* ` enum phys_map_space { */
> 
>  Isn't this and ...
> 
> > -#define XENMAPSPACE_shared_info  0 /* shared info page */
> > -#define XENMAPSPACE_grant_table  1 /* grant table page */
> > -#define XENMAPSPACE_gmfn 2 /* GMFN */
> > -#define XENMAPSPACE_gmfn_range   3 /* GMFN range, 
> > XENMEM_add_to_physmap only. */
> > -#define XENMAPSPACE_gmfn_foreign 4 /* GMFN from another dom,
> > -* XENMEM_add_to_physmap_batch 
> > only. */
> > -#define XENMAPSPACE_dev_mmio 5 /* device mmio region
> > -  ARM only; the region is mapped in
> > -  Stage-2 using the Normal Memory
> > -  Inner/Outer Write-Back Cacheable
> > -  memory attribute. */
> > +#define XENMAPSPACE_shared_info  0
> > +#define XENMAPSPACE_grant_table  1
> > +#define XENMAPSPACE_gmfn 2
> > +#define XENMAPSPACE_gmfn_range   3
> > +#define XENMAPSPACE_gmfn_foreign 4
> > +#define XENMAPSPACE_dev_mmio 5
> >  /* ` } */
> 
>  ... this also something that wants converting?
> >>>
> >>> For clarity, I take you are talking about these two enum-related
> >>> comments:
> >>>
> >>> /* ` enum phys_map_space { */
> >>> [... various #defines ... ]
> >>> /* ` } */
> >>>
> >>> Is this something we want to convert to kernel-doc? I don't know. I
> >>> couldn't see an obvious value in doing it, in the sense that it doesn't
> >>> necessarely make things clearer.
> >>>
> >>> I took a second look at the header and the following would work:
> >>>
> >>> /**
> >>>  * DOC: Source mapping space.
> >>>  *
> >>>  * enum phys_map_space {
> >>>  *
> >>>  * - XENMAPSPACE_shared_info:  shared info page
> >>>  * - XENMAPSPACE_grant_table:  grant table page
> >>>  * - XENMAPSPACE_gmfn: GMFN
> >>>  * - XENMAPSPACE_gmfn_range:   GMFN range, XENMEM_add_to_physmap only.
> >>>  * - XENMAPSPACE_gmfn_foreign: GMFN from another dom,
> >>>  * XENMEM_add_to_physmap_batch only.
> >>>  * - XENMAPSPACE_dev_mmio: device mmio region ARM only; the region is 
> >>> mapped
> >>>  * in Stage-2 using the Normal 
> >>> MemoryInner/Outer
> >>>  * Write-Back Cacheable memory attribute.
> >>>  * }
> >>>  */
> >>>
> >>> Note the blank line after "enum phys_map_space {" is required.
> >>>
> >>>
> >>> All in all I am in favor of *not* converting the enum comment to
> >>> kernel-doc, but I'd be OK with it anyway.

[linux-linus test] 152629: regressions - FAIL

2020-08-20 Thread osstest service owner

flight 152629 linux-linus real [real]
http://logs.test-lab.xenproject.org/osstest/logs/152629/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-i386-qemut-rhel6hvm-intel  7 xen-boot fail REGR. vs. 152332
 test-amd64-i386-xl-xsm7 xen-boot fail REGR. vs. 152332
 test-amd64-i386-xl-qemut-debianhvm-amd64  7 xen-boot fail REGR. vs. 152332
 test-amd64-i386-xl-qemuu-dmrestrict-amd64-dmrestrict 7 xen-boot fail REGR. vs. 
152332
 test-amd64-i386-xl-qemuu-ws16-amd64  7 xen-boot  fail REGR. vs. 152332
 test-amd64-i386-qemuu-rhel6hvm-intel  7 xen-boot fail REGR. vs. 152332
 test-amd64-i386-xl-qemuu-debianhvm-amd64-shadow 7 xen-boot fail REGR. vs. 
152332
 test-amd64-i386-pair 10 xen-boot/src_hostfail REGR. vs. 152332
 test-amd64-i386-pair 11 xen-boot/dst_hostfail REGR. vs. 152332
 test-amd64-i386-xl-qemuu-debianhvm-i386-xsm  7 xen-boot  fail REGR. vs. 152332
 test-amd64-i386-libvirt   7 xen-boot fail REGR. vs. 152332
 test-amd64-i386-examine   8 reboot   fail REGR. vs. 152332
 test-amd64-i386-xl-qemut-ws16-amd64  7 xen-boot  fail REGR. vs. 152332
 test-amd64-i386-qemut-rhel6hvm-amd  7 xen-boot   fail REGR. vs. 152332
 test-amd64-i386-xl7 xen-boot fail REGR. vs. 152332
 test-amd64-coresched-i386-xl  7 xen-boot fail REGR. vs. 152332
 test-amd64-i386-xl-qemuu-debianhvm-amd64  7 xen-boot fail REGR. vs. 152332
 test-amd64-i386-libvirt-xsm   7 xen-boot fail REGR. vs. 152332
 test-amd64-i386-qemuu-rhel6hvm-amd  7 xen-boot   fail REGR. vs. 152332
 test-amd64-i386-freebsd10-i386  7 xen-boot   fail REGR. vs. 152332
 test-amd64-i386-xl-raw7 xen-boot fail REGR. vs. 152332
 test-amd64-i386-freebsd10-amd64  7 xen-boot  fail REGR. vs. 152332
 test-amd64-i386-xl-pvshim 7 xen-boot fail REGR. vs. 152332
 test-amd64-i386-xl-qemut-debianhvm-i386-xsm  7 xen-boot  fail REGR. vs. 152332
 test-amd64-i386-xl-qemuu-win7-amd64  7 xen-boot  fail REGR. vs. 152332
 test-amd64-i386-xl-shadow 7 xen-boot fail REGR. vs. 152332
 test-amd64-i386-xl-qemuu-ovmf-amd64  7 xen-boot  fail REGR. vs. 152332
 test-amd64-i386-xl-qemut-win7-amd64  7 xen-boot  fail REGR. vs. 152332
 test-amd64-i386-libvirt-pair 10 xen-boot/src_hostfail REGR. vs. 152332
 test-amd64-i386-libvirt-pair 11 xen-boot/dst_hostfail REGR. vs. 152332
 test-amd64-i386-xl-qemut-stubdom-debianhvm-amd64-xsm 7 xen-boot fail REGR. vs. 
152332
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 7 xen-boot fail REGR. vs. 
152332
 test-armhf-armhf-xl-credit1   7 xen-boot fail REGR. vs. 152332
 test-armhf-armhf-examine  8 reboot   fail REGR. vs. 152332
 test-armhf-armhf-xl-multivcpu  7 xen-bootfail REGR. vs. 152332
 test-armhf-armhf-xl   7 xen-boot fail REGR. vs. 152332
 test-armhf-armhf-xl-arndale   7 xen-boot fail REGR. vs. 152332
 test-armhf-armhf-libvirt-raw 10 debian-di-installfail REGR. vs. 152332

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-qemut-win7-amd64 17 guest-stopfail like 152332
 test-amd64-amd64-xl-qemut-ws16-amd64 17 guest-stopfail like 152332
 test-amd64-amd64-xl-qemuu-win7-amd64 17 guest-stopfail like 152332
 test-armhf-armhf-xl-rtds 16 guest-start/debian.repeatfail  like 152332
 test-amd64-amd64-xl-qemuu-ws16-amd64 17 guest-stopfail like 152332
 test-armhf-armhf-libvirt 14 saverestore-support-checkfail  like 152332
 test-arm64-arm64-xl-seattle  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  14 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt 13 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  14 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  14 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  14 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 13 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 14 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 14 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl  13 migrate-support-checkfail   never pass
 tes

Re: [RFC PATCH V1 01/12] hvm/ioreq: Make x86's IOREQ feature common

2020-08-20 Thread Stefano Stabellini

On Thu, 20 Aug 2020, Oleksandr wrote:
> > On 11/08/2020 23:48, Stefano Stabellini wrote:
> > > > I have the impression that we disagree in what the Device Emulator is
> > > > meant to
> > > > do. IHMO, the goal of the device emulator is to emulate a device in an
> > > > arch-agnostic way.
> > > 
> > > That would be great in theory but I am not sure it is achievable: if we
> > > use an existing emulator like QEMU, even a single device has to fit
> > > into QEMU's view of the world, which makes assumptions about host
> > > bridges and apertures. It is impossible today to build QEMU in an
> > > arch-agnostic way, it has to be tied to an architecture.
> > 
> > AFAICT, the only reason QEMU cannot build be in an arch-agnostic way is
> > because of TCG. If this wasn't built then you could easily write a machine
> > that doesn't depend on the instruction set.
> > 
> > The proof is, today, we are using QEMU x86 to serve Arm64 guest. Although
> > this is only for PV drivers.
> > 
> > > 
> > > I realize we are not building this interface for QEMU specifically, but
> > > even if we try to make the interface arch-agnostic, in reality the
> > > emulators won't be arch-agnostic.
> > 
> > This depends on your goal. If your goal is to write a standalone emulator
> > for a single device, then it is entirely possible to make it arch-agnostic.
> > 
> > Per above, this would even be possible if you were emulating a set of
> > devices.
> > 
> > What I want to avoid is requiring all the emulators to contain arch-specific
> > code just because it is easier to get QEMU working on Xen on Arm.
> > 
> > > If we send a port-mapped I/O request
> > > to qemu-system-aarch64 who knows what is going to happen: it is a code
> > > path that it is not explicitly tested.
> > 
> > Maybe, maybe not. To me this is mostly software issues that can easily be
> > mitigated if we do proper testing...
> 
> Could we please find a common ground on whether the PIO handling needs to be
> implemented on Arm or not? At least for the current patch series.

Can you do a test on QEMU to verify which address space the PIO BARs are
using on ARM? I don't know if there is an easy way to test it but it
would be very useful for this conversation.


> Below my thoughts:
> From one side I agree that emulator shouldn't contain any arch-specific code,
> yes it is hypervisor specific but it should be arch agnostic if possible. So
> PIO case should be handled.
> From other side I tend to think that it might be possible to skip PIO handling
> for the current patch series (leave it x86 specific for now as we do with
> handle_realmode_completion()).
> I think nothing will prevent us from adding PIO handling later on if there is
> a real need (use-case) for that. Please correct me if I am wrong.
> 
> I would be absolutely OK with any options.
> 
> What do you think?

I agree that PIO handling is not the most critical thing right now given
that we have quite a few other important TODOs in the series. I'd be
fine reviewing another version of the series with this issue still
pending.


Of course, PIO needs to be handled. The key to me is that QEMU (or other
emulator) should *not* emulate in/out instructions on ARM. PIO ioreq
requests should not be satisfied by using address_space_io directly (the
PIO address space that requires special instructions to access it). In
QEMU the PIO reads/writes should be done via address_space_memory (the
normal memory mapped address space).

So either way of the following approaches should be OK:

1) Xen sends out PIO addresses as memory mapped addresses, QEMU simply
   reads/writes on them
2) Xen sends out PIO addresses as address_space_io, QEMU finds the
   mapping to address_space_memory, then reads/writes on
   address_space_memory

>From an interface and implementation perspective, 1) means that
IOREQ_TYPE_PIO is unused on ARM, while 2) means that IOREQ_TYPE_PIO is
still used as part of the ioreq interface, even if QEMU doesn't directly
operate on those addresses.

My preference is 1) because it leads to a simpler solution.

Re: [RFC PATCH V1 05/12] hvm/dm: Introduce xendevicemodel_set_irq_level DM op

2020-08-20 Thread Stefano Stabellini

On Tue, 18 Aug 2020, Julien Grall wrote:
> On 11/08/2020 23:48, Stefano Stabellini wrote:
> > On Tue, 11 Aug 2020, Julien Grall wrote:
> > > >   I vaguely
> > > > recall a bug 10+ years ago about this with QEMU on x86 and a line that
> > > > could be both active-high and active-low. So QEMU would raise the
> > > > interrupt but Xen would actually think that QEMU stopped the interrupt.
> > > > 
> > > > To do this right, we would have to introduce an interface between Xen
> > > > and QEMU to propagate the trigger type. Xen would have to tell QEMU when
> > > > the guest changed the configuration. That would work, but it would be
> > > > better if we can figure out a way to do without it to reduce complexity.
> > > Per above, I don't think this is necessary.
> > > 
> > > > 
> > > > Instead, given that QEMU and other emulators don't actually care about
> > > > active-high or active-low, if we have a Xen interface that just says
> > > > "fire the interrupt" we get away from this kind of troubles. It would
> > > > also be more efficient because the total number of hypercalls required
> > > > would be lower.
> > > 
> > > I read "fire interrupt" the interrupt as "Please generate an interrupt
> > > once".
> > > Is it what you definition you expect?
> > 
> > Yes, that is the idea. It would have to take into account the edge/level
> > semantic difference: level would have "start it" and a "stop it".
> 
> I am still struggling to see how this can work:
> - At the moment, QEMU is only providing us the line state. How can we
> deduce the type of the interrupt? Would it mean a major modification of the
> QEMU API?

Good question. 

I don't think we would need any major modifications of the QEMU APIs.
QEMU already uses two different function calls to trigger an edge
interrupt and to trigger a level interrupt.

Edge interrupts are triggered with qemu_irq_pulse; level interrupts with
qemu_irq_raise/qemu_irq_lower.

It is also possible for devices to call qemu_set_irq directly which
only has the state of the line represented by the "level" argument.
As far as I can tell all interrupts emulated in QEMU (at least the ones
we care about) are active-high.

We have a couple of choices in the implementation, like hooking into
qemu_irq_pulse, and/or checking if the interrupt is level or edge in the
xen interrupt injection function. The latter shouldn't require any
changes in QEMU common code.

FYI looking into the code there is something "strange" in virtio-mmio.c:
it only ever calls qemu_set_irq to start a notification. It doesn't look
like it ever calls qemu_set_irq to stop a notification at all. It is
possible that the state of the line is not accurately emulated for
virtio-mmio.c.

> - Can you provide a rough sketch how this could be implemented in Xen?

It would work similarly to other emulated interrupt injections on the
Xen side, calling vgic_inject_irq.  We have matching info about
level/edge and active-high/active-low in Xen too, so we could do more
precise emulation of the interrupt flow, although I am aware of the
current limitations of the vgic in that regard.

But I have the feeling I didn't address your concern :-)

Re: Xen 4.14.0 is busted on Dell 300x IoT Gateways

2020-08-20 Thread Stefano Stabellini

On Tue, 18 Aug 2020, Roman Shaposhnik wrote:
> Hi!
> first things first -- booting on those devices have always
> required efi=no-rs -- but it seems that Xen 4.14 is now 
> busted at a more fundamental level. I'm attaching two
> boot sequences (one with kernel 4.19.5 and one with 5.4.51)
> in the hopes that this may provide some clues right away.
> 
> Any help would be greatly appreciated!
> 
> Oh, and finally it appears that this is NOT a regression from
> Xen 4.13 -- it fails the same way. I haven't tried Xen's earlier
> than that.

FYI Roman and I tracked down the issue and it is due to the gpio
controller driver (drivers/pinctrl/intel/pinctrl-baytrail.c) overwriting
the interrupt handler data used by Xen to store the irq_data structure.

I have a very small tentative workaround, see below. It allows the
kernel to boot successfully as dom0 and gpio writes work. I am still
thinking on how to fix the issue properly in an upstreamable way, but I
wanted to send this out to the list right away in case somebody else is
stuck on this problem.


diff --git a/drivers/pinctrl/intel/pinctrl-baytrail.c 
b/drivers/pinctrl/intel/pinctrl-baytrail.c
index f38d596efa05..acd28a9e6a8a 100644
--- a/drivers/pinctrl/intel/pinctrl-baytrail.c
+++ b/drivers/pinctrl/intel/pinctrl-baytrail.c
@@ -1604,8 +1604,8 @@ static struct irq_chip byt_irqchip = {
 static void byt_gpio_irq_handler(struct irq_desc *desc)
 {
struct irq_data *data = irq_desc_get_irq_data(desc);
-   struct byt_gpio *vg = gpiochip_get_data(
-   irq_desc_get_handler_data(desc));
+   struct gpio_chip *gc = irq_desc_get_chip_data(desc);
+   struct byt_gpio *vg = (struct byt_gpio *)gc;
struct irq_chip *chip = irq_data_get_irq_chip(data);
u32 base, pin;
void __iomem *reg;
diff --git a/kernel/irq/chip.c b/kernel/irq/chip.c
index a2b3d9de999c..b9551fb41ed1 100644
--- a/kernel/irq/chip.c
+++ b/kernel/irq/chip.c
@@ -1003,7 +1003,8 @@ irq_set_chained_handler_and_data(unsigned int irq, 
irq_flow_handler_t handle,
if (!desc)
return;
 
-   desc->irq_common_data.handler_data = data;
+   if (!desc->irq_common_data.handler_data)
+   desc->irq_common_data.handler_data = data;
__irq_do_set_handler(desc, handle, 1, NULL);
 
irq_put_desc_busunlock(desc, flags);

[patch RFC 07/38] iommu/irq_remapping: Consolidate irq domain lookup

2020-08-20 Thread Thomas Gleixner

Now that the iommu implementations handle the X86_*_GET_PARENT_DOMAIN
types, consolidate the two getter functions. 

Signed-off-by: Thomas Gleixner 
Cc: Wei Liu 
Cc: Joerg Roedel 
Cc: linux-hyp...@vger.kernel.org
Cc: io...@lists.linux-foundation.org
Cc: "K. Y. Srinivasan" 
Cc: Haiyang Zhang 
Cc: Jon Derrick 
Cc: Lu Baolu 
---
 arch/x86/include/asm/irq_remapping.h |8 
 arch/x86/kernel/apic/io_apic.c   |2 +-
 arch/x86/kernel/apic/msi.c   |2 +-
 drivers/iommu/amd/iommu.c|1 -
 drivers/iommu/hyperv-iommu.c |4 ++--
 drivers/iommu/intel/irq_remapping.c  |1 -
 drivers/iommu/irq_remapping.c|   23 +--
 drivers/iommu/irq_remapping.h|5 +
 8 files changed, 6 insertions(+), 40 deletions(-)

--- a/arch/x86/include/asm/irq_remapping.h
+++ b/arch/x86/include/asm/irq_remapping.h
@@ -45,8 +45,6 @@ extern int irq_remap_enable_fault_handli
 extern void panic_if_irq_remap(const char *msg);
 
 extern struct irq_domain *
-irq_remapping_get_ir_irq_domain(struct irq_alloc_info *info);
-extern struct irq_domain *
 irq_remapping_get_irq_domain(struct irq_alloc_info *info);
 
 /* Create PCI MSI/MSIx irqdomain, use @parent as the parent irqdomain. */
@@ -74,12 +72,6 @@ static inline void panic_if_irq_remap(co
 }
 
 static inline struct irq_domain *
-irq_remapping_get_ir_irq_domain(struct irq_alloc_info *info)
-{
-   return NULL;
-}
-
-static inline struct irq_domain *
 irq_remapping_get_irq_domain(struct irq_alloc_info *info)
 {
return NULL;
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -2298,7 +2298,7 @@ static int mp_irqdomain_create(int ioapi
init_irq_alloc_info(&info, NULL);
info.type = X86_IRQ_ALLOC_TYPE_IOAPIC_GET_PARENT;
info.ioapic_id = mpc_ioapic_id(ioapic);
-   parent = irq_remapping_get_ir_irq_domain(&info);
+   parent = irq_remapping_get_irq_domain(&info);
if (!parent)
parent = x86_vector_domain;
else
--- a/arch/x86/kernel/apic/msi.c
+++ b/arch/x86/kernel/apic/msi.c
@@ -478,7 +478,7 @@ struct irq_domain *hpet_create_irq_domai
init_irq_alloc_info(&info, NULL);
info.type = X86_IRQ_ALLOC_TYPE_HPET_GET_PARENT;
info.hpet_id = hpet_id;
-   parent = irq_remapping_get_ir_irq_domain(&info);
+   parent = irq_remapping_get_irq_domain(&info);
if (parent == NULL)
parent = x86_vector_domain;
else
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -3561,7 +3561,6 @@ struct irq_remap_ops amd_iommu_irq_ops =
.disable= amd_iommu_disable,
.reenable   = amd_iommu_reenable,
.enable_faulting= amd_iommu_enable_faulting,
-   .get_ir_irq_domain  = get_irq_domain,
.get_irq_domain = get_irq_domain,
 };
 
--- a/drivers/iommu/hyperv-iommu.c
+++ b/drivers/iommu/hyperv-iommu.c
@@ -182,7 +182,7 @@ static int __init hyperv_enable_irq_rema
return IRQ_REMAP_X2APIC_MODE;
 }
 
-static struct irq_domain *hyperv_get_ir_irq_domain(struct irq_alloc_info *info)
+static struct irq_domain *hyperv_get_irq_domain(struct irq_alloc_info *info)
 {
if (info->type == X86_IRQ_ALLOC_TYPE_IOAPIC_GET_PARENT)
return ioapic_ir_domain;
@@ -193,7 +193,7 @@ static struct irq_domain *hyperv_get_ir_
 struct irq_remap_ops hyperv_irq_remap_ops = {
.prepare= hyperv_prepare_irq_remapping,
.enable = hyperv_enable_irq_remapping,
-   .get_ir_irq_domain  = hyperv_get_ir_irq_domain,
+   .get_irq_domain = hyperv_get_irq_domain,
 };
 
 #endif
--- a/drivers/iommu/intel/irq_remapping.c
+++ b/drivers/iommu/intel/irq_remapping.c
@@ -1131,7 +1131,6 @@ struct irq_remap_ops intel_irq_remap_ops
.disable= disable_irq_remapping,
.reenable   = reenable_irq_remapping,
.enable_faulting= enable_drhd_fault_handling,
-   .get_ir_irq_domain  = intel_get_irq_domain,
.get_irq_domain = intel_get_irq_domain,
 };
 
--- a/drivers/iommu/irq_remapping.c
+++ b/drivers/iommu/irq_remapping.c
@@ -160,33 +160,12 @@ void panic_if_irq_remap(const char *msg)
 }
 
 /**
- * irq_remapping_get_ir_irq_domain - Get the irqdomain associated with the 
IOMMU
- *  device serving request @info
- * @info: interrupt allocation information, used to identify the IOMMU device
- *
- * It's used to get parent irqdomain for HPET and IOAPIC irqdomains.
- * Returns pointer to IRQ domain, or NULL on failure.
- */
-struct irq_domain *
-irq_remapping_get_ir_irq_domain(struct irq_alloc_info *info)
-{
-   if (!remap_ops || !remap_ops->get_ir_irq_domain)
-   return NULL;
-
-   return remap_ops->get_ir_irq_domain(info);
-}
-
-/**
  * irq_remapping_get_irq_domain - Get the irqdomain serving the request @info
  * @info: interrupt allocation information,

[patch RFC 00/38] x86, PCI, XEN, genirq ...: Prepare for device MSI

2020-08-20 Thread Thomas Gleixner

First of all, sorry for the horrible long Cc list, which was
unfortunately unavoidable as this touches the world and some more.

This patch series aims to provide a base to support device MSI (non
PCI based) in a halfways architecture independent way.

It's a mixed bag of bug fixes, cleanups and general improvements which
are worthwhile independent of the device MSI stuff. Unfortunately this
also comes with an evil abuse of the irqdomain system to coerce XEN on
x86 into compliance without rewriting XEN from scratch.

As discussed in length in this mail thread:

  https://lore.kernel.org/r/87h7tcgbs2@nanos.tec.linutronix.de

the initial attempt of piggypacking device MSI support on platform MSI
is doomed for various reasons, but creating independent interrupt
domains for these upcoming magic PCI subdevices which are not PCI, but
might be exposed as PCI devices is not as trivial as it seems.

The initially suggested and evaluated approach of extending platform
MSI turned out to be the completely wrong direction and in fact
platform MSI should be rewritten on top of device MSI or completely
replaced by it.

One of the main issues is that x86 does not support the concept of irq
domains associations stored in device::msi_domain and still relies on
the arch_*_msi_irqs() fallback implementations which has it's own set
of problems as outlined in

  https://lore.kernel.org/r/87bljg7u4f@nanos.tec.linutronix.de/

in the very same thread.

The main obstacle of storing that pointer is XEN which has it's own
historical notiion of handling PCI MSI interupts.

This series tries to address these issues in several steps:

 1) Accidental bug fixes
iommu/amd: Prevent NULL pointer dereference

 2) Janitoring
x86/init: Remove unused init ops

 3) Simplification of the x86 specific interrupt allocation mechanism

x86/irq: Rename X86_IRQ_ALLOC_TYPE_MSI* to reflect PCI dependency
x86/irq: Add allocation type for parent domain retrieval
iommu/vt-d: Consolidate irq domain getter
iommu/amd: Consolidate irq domain getter
iommu/irq_remapping: Consolidate irq domain lookup

 4) Consolidation of the X86 specific interrupt allocation mechanism to be as 
close
as possible to the generic MSI allocation mechanism which allows to get rid
of quite a bunch of x86'isms which are pointless

x86/irq: Prepare consolidation of irq_alloc_info
x86/msi: Consolidate HPET allocation
x86/ioapic: Consolidate IOAPIC allocation
x86/irq: Consolidate DMAR irq allocation
x86/irq: Consolidate UV domain allocation
PCI: MSI: Rework pci_msi_domain_calc_hwirq()
x86/msi: Consolidate MSI allocation
x86/msi: Use generic MSI domain ops

  5) x86 specific cleanups to remove the dependency on arch_*_msi_irqs()

x86/irq: Move apic_post_init() invocation to one place
z86/pci: Reducde #ifdeffery in PCI init code
x86/irq: Initialize PCI/MSI domain at PCI init time
irqdomain/msi: Provide DOMAIN_BUS_VMD_MSI
PCI: vmd: Mark VMD irqdomain with DOMAIN_BUS_VMD_MSI
PCI: MSI: Provide pci_dev_has_special_msi_domain() helper
x86/xen: Make xen_msi_init() static and rename it to xen_hvm_msi_init()
x86/xen: Rework MSI teardown
x86/xen: Consolidate XEN-MSI init
irqdomain/msi: Allow to override msi_domain_alloc/free_irqs()
x86/xen: Wrap XEN MSI management into irqdomain
iommm/vt-d: Store irq domain in struct device
iommm/amd: Store irq domain in struct device
x86/pci: Set default irq domain in pcibios_add_device()
PCI/MSI: Allow to disable arch fallbacks
x86/irq: Cleanup the arch_*_msi_irqs() leftovers
x86/irq: Make most MSI ops XEN private

This one is paving the way to device MSI support, but it comes
with an ugly and evil hack. The ability of overriding the default
allocation/free functions of an MSI irq domain is useful in general as
(hopefully) demonstrated with the device MSI POC, but the abuse
in context of XEN is evil. OTOH without enough XENology and without
rewriting XEN from scratch wrapping XEN MSI handling into a pseudo
irq domain is a reasonable step forward for mere mortals with severly
limited XENology. One day the XEN folks might make it a real irq domain.
Perhaps when they have to support the same mess on other architectures.
Hope dies last...

At least the mechanism to override alloc/free turned out to be useful
for implementing the base infrastructure for device MSI. So it's not a
completely lost case.

  6) X86 specific preparation for device MSI

   x86/irq: Add DEV_MSI allocation type
   x86/msi: Let pci_msi_prepare() handle non-PCI MSI

  7) Generic device MSI infrastructure

   platform-msi: Provide default irq_chip:ack
   platform-msi: Add device MSI infrastructure

  8) Infrastructure for and a POC of an IMS (Interrupt Message

[patch RFC 02/38] x86/init: Remove unused init ops

2020-08-20 Thread Thomas Gleixner

Some past platform removal forgot to get rid of this unused ballast.

Signed-off-by: Thomas Gleixner 
---
 arch/x86/include/asm/mpspec.h   |   10 --
 arch/x86/include/asm/x86_init.h |   10 --
 arch/x86/kernel/mpparse.c   |   26 --
 arch/x86/kernel/x86_init.c  |4 
 4 files changed, 4 insertions(+), 46 deletions(-)

--- a/arch/x86/include/asm/mpspec.h
+++ b/arch/x86/include/asm/mpspec.h
@@ -67,21 +67,11 @@ static inline void find_smp_config(void)
 #ifdef CONFIG_X86_MPPARSE
 extern void e820__memblock_alloc_reserved_mpc_new(void);
 extern int enable_update_mptable;
-extern int default_mpc_apic_id(struct mpc_cpu *m);
-extern void default_smp_read_mpc_oem(struct mpc_table *mpc);
-# ifdef CONFIG_X86_IO_APIC
-extern void default_mpc_oem_bus_info(struct mpc_bus *m, char *str);
-# else
-#  define default_mpc_oem_bus_info NULL
-# endif
 extern void default_find_smp_config(void);
 extern void default_get_smp_config(unsigned int early);
 #else
 static inline void e820__memblock_alloc_reserved_mpc_new(void) { }
 #define enable_update_mptable 0
-#define default_mpc_apic_id NULL
-#define default_smp_read_mpc_oem NULL
-#define default_mpc_oem_bus_info NULL
 #define default_find_smp_config x86_init_noop
 #define default_get_smp_config x86_init_uint_noop
 #endif
--- a/arch/x86/include/asm/x86_init.h
+++ b/arch/x86/include/asm/x86_init.h
@@ -11,22 +11,12 @@ struct cpuinfo_x86;
 
 /**
  * struct x86_init_mpparse - platform specific mpparse ops
- * @mpc_record:platform specific mpc record accounting
  * @setup_ioapic_ids:  platform specific ioapic id override
- * @mpc_apic_id:   platform specific mpc apic id assignment
- * @smp_read_mpc_oem:  platform specific oem mpc table setup
- * @mpc_oem_pci_bus:   platform specific pci bus setup (default NULL)
- * @mpc_oem_bus_info:  platform specific mpc bus info
  * @find_smp_config:   find the smp configuration
  * @get_smp_config:get the smp configuration
  */
 struct x86_init_mpparse {
-   void (*mpc_record)(unsigned int mode);
void (*setup_ioapic_ids)(void);
-   int (*mpc_apic_id)(struct mpc_cpu *m);
-   void (*smp_read_mpc_oem)(struct mpc_table *mpc);
-   void (*mpc_oem_pci_bus)(struct mpc_bus *m);
-   void (*mpc_oem_bus_info)(struct mpc_bus *m, char *name);
void (*find_smp_config)(void);
void (*get_smp_config)(unsigned int early);
 };
--- a/arch/x86/kernel/mpparse.c
+++ b/arch/x86/kernel/mpparse.c
@@ -46,11 +46,6 @@ static int __init mpf_checksum(unsigned
return sum & 0xFF;
 }
 
-int __init default_mpc_apic_id(struct mpc_cpu *m)
-{
-   return m->apicid;
-}
-
 static void __init MP_processor_info(struct mpc_cpu *m)
 {
int apicid;
@@ -61,7 +56,7 @@ static void __init MP_processor_info(str
return;
}
 
-   apicid = x86_init.mpparse.mpc_apic_id(m);
+   apicid = m->apicid;
 
if (m->cpuflag & CPU_BOOTPROCESSOR) {
bootup_cpu = " (Bootup-CPU)";
@@ -73,7 +68,7 @@ static void __init MP_processor_info(str
 }
 
 #ifdef CONFIG_X86_IO_APIC
-void __init default_mpc_oem_bus_info(struct mpc_bus *m, char *str)
+static void __init mpc_oem_bus_info(struct mpc_bus *m, char *str)
 {
memcpy(str, m->bustype, 6);
str[6] = 0;
@@ -84,7 +79,7 @@ static void __init MP_bus_info(struct mp
 {
char str[7];
 
-   x86_init.mpparse.mpc_oem_bus_info(m, str);
+   mpc_oem_bus_info(m, str);
 
 #if MAX_MP_BUSSES < 256
if (m->busid >= MAX_MP_BUSSES) {
@@ -100,9 +95,6 @@ static void __init MP_bus_info(struct mp
mp_bus_id_to_type[m->busid] = MP_BUS_ISA;
 #endif
} else if (strncmp(str, BUSTYPE_PCI, sizeof(BUSTYPE_PCI) - 1) == 0) {
-   if (x86_init.mpparse.mpc_oem_pci_bus)
-   x86_init.mpparse.mpc_oem_pci_bus(m);
-
clear_bit(m->busid, mp_bus_not_pci);
 #ifdef CONFIG_EISA
mp_bus_id_to_type[m->busid] = MP_BUS_PCI;
@@ -198,8 +190,6 @@ static void __init smp_dump_mptable(stru
1, mpc, mpc->length, 1);
 }
 
-void __init default_smp_read_mpc_oem(struct mpc_table *mpc) { }
-
 static int __init smp_read_mpc(struct mpc_table *mpc, unsigned early)
 {
char str[16];
@@ -218,14 +208,7 @@ static int __init smp_read_mpc(struct mp
if (early)
return 1;
 
-   if (mpc->oemptr)
-   x86_init.mpparse.smp_read_mpc_oem(mpc);
-
-   /*
-*  Now process the configuration blocks.
-*/
-   x86_init.mpparse.mpc_record(0);
-
+   /* Now process the configuration blocks. */
while (count < mpc->length) {
switch (*mpt) {
case MP_PROCESSOR:
@@ -256,7 +239,6 @@ static int __init smp_read_mpc(struct mp
count = mpc->length;
break;
}
-   x86_init.mpparse.mpc_record(1)

[patch RFC 09/38] x86/msi: Consolidate HPET allocation

2020-08-20 Thread Thomas Gleixner

None of the magic HPET fields are required in any way.

Signed-off-by: Thomas Gleixner 
Cc: Joerg Roedel 
Cc: io...@lists.linux-foundation.org
Cc: Lu Baolu 
---
 arch/x86/include/asm/hw_irq.h   |7 ---
 arch/x86/kernel/apic/msi.c  |   14 +++---
 drivers/iommu/amd/iommu.c   |2 +-
 drivers/iommu/intel/irq_remapping.c |4 ++--
 4 files changed, 10 insertions(+), 17 deletions(-)

--- a/arch/x86/include/asm/hw_irq.h
+++ b/arch/x86/include/asm/hw_irq.h
@@ -65,13 +65,6 @@ struct irq_alloc_info {
 
union {
int unused;
-#ifdef CONFIG_HPET_TIMER
-   struct {
-   int hpet_id;
-   int hpet_index;
-   void*hpet_data;
-   };
-#endif
 #ifdef CONFIG_PCI_MSI
struct {
struct pci_dev  *msi_dev;
--- a/arch/x86/kernel/apic/msi.c
+++ b/arch/x86/kernel/apic/msi.c
@@ -427,7 +427,7 @@ static struct irq_chip hpet_msi_controll
 static irq_hw_number_t hpet_msi_get_hwirq(struct msi_domain_info *info,
  msi_alloc_info_t *arg)
 {
-   return arg->hpet_index;
+   return arg->hwirq;
 }
 
 static int hpet_msi_init(struct irq_domain *domain,
@@ -435,8 +435,8 @@ static int hpet_msi_init(struct irq_doma
 irq_hw_number_t hwirq, msi_alloc_info_t *arg)
 {
irq_set_status_flags(virq, IRQ_MOVE_PCNTXT);
-   irq_domain_set_info(domain, virq, arg->hpet_index, info->chip, NULL,
-   handle_edge_irq, arg->hpet_data, "edge");
+   irq_domain_set_info(domain, virq, arg->hwirq, info->chip, NULL,
+   handle_edge_irq, arg->data, "edge");
 
return 0;
 }
@@ -477,7 +477,7 @@ struct irq_domain *hpet_create_irq_domai
 
init_irq_alloc_info(&info, NULL);
info.type = X86_IRQ_ALLOC_TYPE_HPET_GET_PARENT;
-   info.hpet_id = hpet_id;
+   info.devid = hpet_id;
parent = irq_remapping_get_irq_domain(&info);
if (parent == NULL)
parent = x86_vector_domain;
@@ -506,9 +506,9 @@ int hpet_assign_irq(struct irq_domain *d
 
init_irq_alloc_info(&info, NULL);
info.type = X86_IRQ_ALLOC_TYPE_HPET;
-   info.hpet_data = hc;
-   info.hpet_id = hpet_dev_id(domain);
-   info.hpet_index = dev_num;
+   info.data = hc;
+   info.devid = hpet_dev_id(domain);
+   info.hwirq = dev_num;
 
return irq_domain_alloc_irqs(domain, 1, NUMA_NO_NODE, &info);
 }
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -3511,7 +3511,7 @@ static int get_devid(struct irq_alloc_in
return get_ioapic_devid(info->ioapic_id);
case X86_IRQ_ALLOC_TYPE_HPET:
case X86_IRQ_ALLOC_TYPE_HPET_GET_PARENT:
-   return get_hpet_devid(info->hpet_id);
+   return get_hpet_devid(info->devid);
case X86_IRQ_ALLOC_TYPE_PCI_MSI:
case X86_IRQ_ALLOC_TYPE_PCI_MSIX:
return get_device_id(&info->msi_dev->dev);
--- a/drivers/iommu/intel/irq_remapping.c
+++ b/drivers/iommu/intel/irq_remapping.c
@@ -1115,7 +1115,7 @@ static struct irq_domain *intel_get_irq_
case X86_IRQ_ALLOC_TYPE_IOAPIC_GET_PARENT:
return map_ioapic_to_ir(info->ioapic_id);
case X86_IRQ_ALLOC_TYPE_HPET_GET_PARENT:
-   return map_hpet_to_ir(info->hpet_id);
+   return map_hpet_to_ir(info->devid);
case X86_IRQ_ALLOC_TYPE_PCI_MSI:
case X86_IRQ_ALLOC_TYPE_PCI_MSIX:
return map_dev_to_ir(info->msi_dev);
@@ -1285,7 +1285,7 @@ static void intel_irq_remapping_prepare_
case X86_IRQ_ALLOC_TYPE_PCI_MSI:
case X86_IRQ_ALLOC_TYPE_PCI_MSIX:
if (info->type == X86_IRQ_ALLOC_TYPE_HPET)
-   set_hpet_sid(irte, info->hpet_id);
+   set_hpet_sid(irte, info->devid);
else
set_msi_sid(irte, info->msi_dev);

[patch RFC 01/38] iommu/amd: Prevent NULL pointer dereference

2020-08-20 Thread Thomas Gleixner

Dereferencing irq_data before checking it for NULL is suboptimal.

Signed-off-by: Thomas Gleixner 
Cc: Joerg Roedel 
Cc: io...@lists.linux-foundation.org
---
 drivers/iommu/amd/iommu.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -3717,8 +3717,8 @@ static int irq_remapping_alloc(struct ir
 
for (i = 0; i < nr_irqs; i++) {
irq_data = irq_domain_get_irq_data(domain, virq + i);
-   cfg = irqd_cfg(irq_data);
-   if (!irq_data || !cfg) {
+   cfg = irq_data ? irqd_cfg(irq_data) : NULL;
+   if (!cfg) {
ret = -EINVAL;
goto out_free_data;
}

[patch RFC 11/38] x86/irq: Consolidate DMAR irq allocation

2020-08-20 Thread Thomas Gleixner

None of the DMAR specific fields are required.

Signed-off-by: Thomas Gleixner 
---
 arch/x86/include/asm/hw_irq.h |6 --
 arch/x86/kernel/apic/msi.c|   10 +-
 2 files changed, 5 insertions(+), 11 deletions(-)

--- a/arch/x86/include/asm/hw_irq.h
+++ b/arch/x86/include/asm/hw_irq.h
@@ -83,12 +83,6 @@ struct irq_alloc_info {
irq_hw_number_t msi_hwirq;
};
 #endif
-#ifdef CONFIG_DMAR_TABLE
-   struct {
-   int dmar_id;
-   void*dmar_data;
-   };
-#endif
 #ifdef CONFIG_X86_UV
struct {
int uv_limit;
--- a/arch/x86/kernel/apic/msi.c
+++ b/arch/x86/kernel/apic/msi.c
@@ -329,15 +329,15 @@ static struct irq_chip dmar_msi_controll
 static irq_hw_number_t dmar_msi_get_hwirq(struct msi_domain_info *info,
  msi_alloc_info_t *arg)
 {
-   return arg->dmar_id;
+   return arg->hwirq;
 }
 
 static int dmar_msi_init(struct irq_domain *domain,
 struct msi_domain_info *info, unsigned int virq,
 irq_hw_number_t hwirq, msi_alloc_info_t *arg)
 {
-   irq_domain_set_info(domain, virq, arg->dmar_id, info->chip, NULL,
-   handle_edge_irq, arg->dmar_data, "edge");
+   irq_domain_set_info(domain, virq, arg->devid, info->chip, NULL,
+   handle_edge_irq, arg->data, "edge");
 
return 0;
 }
@@ -384,8 +384,8 @@ int dmar_alloc_hwirq(int id, int node, v
 
init_irq_alloc_info(&info, NULL);
info.type = X86_IRQ_ALLOC_TYPE_DMAR;
-   info.dmar_id = id;
-   info.dmar_data = arg;
+   info.devid = id;
+   info.data = arg;
 
return irq_domain_alloc_irqs(domain, 1, node, &info);
 }

[patch RFC 18/38] x86/irq: Initialize PCI/MSI domain at PCI init time

2020-08-20 Thread Thomas Gleixner

No point in initializing the default PCI/MSI interrupt domain early and no
point to create it when XEN PV/HVM/DOM0 are active.

Move the initialization to pci_arch_init() and convert it to init ops so
that XEN can override it as XEN has it's own PCI/MSI management. The XEN
override comes in a later step.

Signed-off-by: Thomas Gleixner 
Cc: linux-...@vger.kernel.org
---
 arch/x86/include/asm/irqdomain.h |6 --
 arch/x86/include/asm/x86_init.h  |3 +++
 arch/x86/kernel/apic/msi.c   |   26 --
 arch/x86/kernel/apic/vector.c|2 --
 arch/x86/kernel/x86_init.c   |3 ++-
 arch/x86/pci/init.c  |3 +++
 6 files changed, 28 insertions(+), 15 deletions(-)

--- a/arch/x86/include/asm/irqdomain.h
+++ b/arch/x86/include/asm/irqdomain.h
@@ -51,9 +51,11 @@ extern int mp_irqdomain_ioapic_idx(struc
 #endif /* CONFIG_X86_IO_APIC */
 
 #ifdef CONFIG_PCI_MSI
-extern void arch_init_msi_domain(struct irq_domain *domain);
+void x86_create_pci_msi_domain(void);
+struct irq_domain *native_create_pci_msi_domain(void);
 #else
-static inline void arch_init_msi_domain(struct irq_domain *domain) { }
+static inline void x86_create_pci_msi_domain(void) { }
+#define native_create_pci_msi_domain   NULL
 #endif
 
 #endif
--- a/arch/x86/include/asm/x86_init.h
+++ b/arch/x86/include/asm/x86_init.h
@@ -8,6 +8,7 @@ struct mpc_bus;
 struct mpc_cpu;
 struct mpc_table;
 struct cpuinfo_x86;
+struct irq_domain;
 
 /**
  * struct x86_init_mpparse - platform specific mpparse ops
@@ -42,12 +43,14 @@ struct x86_init_resources {
  * @intr_init: interrupt init code
  * @intr_mode_select:  interrupt delivery mode selection
  * @intr_mode_init:interrupt delivery mode setup
+ * @create_pci_msi_domain: Create the PCI/MSI interrupt domain
  */
 struct x86_init_irqs {
void (*pre_vector_init)(void);
void (*intr_init)(void);
void (*intr_mode_select)(void);
void (*intr_mode_init)(void);
+   struct irq_domain *(*create_pci_msi_domain)(void);
 };
 
 /**
--- a/arch/x86/kernel/apic/msi.c
+++ b/arch/x86/kernel/apic/msi.c
@@ -21,7 +21,7 @@
 #include 
 #include 
 
-static struct irq_domain *msi_default_domain;
+static struct irq_domain *x86_pci_msi_default_domain __ro_after_init;
 
 static void __irq_msi_compose_msg(struct irq_cfg *cfg, struct msi_msg *msg)
 {
@@ -192,7 +192,7 @@ int native_setup_msi_irqs(struct pci_dev
 
domain = irq_remapping_get_irq_domain(&info);
if (domain == NULL)
-   domain = msi_default_domain;
+   domain = x86_pci_msi_default_domain;
if (domain == NULL)
return -ENOSYS;
 
@@ -243,25 +243,31 @@ static struct msi_domain_info pci_msi_do
.handler_name   = "edge",
 };
 
-void __init arch_init_msi_domain(struct irq_domain *parent)
+struct irq_domain * __init native_create_pci_msi_domain(void)
 {
struct fwnode_handle *fn;
+   struct irq_domain *d;
 
if (disable_apic)
-   return;
+   return NULL;
 
fn = irq_domain_alloc_named_fwnode("PCI-MSI");
if (fn) {
-   msi_default_domain =
-   pci_msi_create_irq_domain(fn, &pci_msi_domain_info,
- parent);
+   d = pci_msi_create_irq_domain(fn, &pci_msi_domain_info,
+ x86_vector_domain);
}
-   if (!msi_default_domain) {
+   if (!d) {
irq_domain_free_fwnode(fn);
-   pr_warn("failed to initialize irqdomain for MSI/MSI-x.\n");
+   pr_warn("Failed to initialize PCI-MSI irqdomain.\n");
} else {
-   msi_default_domain->flags |= IRQ_DOMAIN_MSI_NOMASK_QUIRK;
+   d->flags |= IRQ_DOMAIN_MSI_NOMASK_QUIRK;
}
+   return d;
+}
+
+void __init x86_create_pci_msi_domain(void)
+{
+   x86_pci_msi_default_domain = x86_init.irqs.create_pci_msi_domain();
 }
 
 #ifdef CONFIG_IRQ_REMAP
--- a/arch/x86/kernel/apic/vector.c
+++ b/arch/x86/kernel/apic/vector.c
@@ -713,8 +713,6 @@ int __init arch_early_irq_init(void)
BUG_ON(x86_vector_domain == NULL);
irq_set_default_host(x86_vector_domain);
 
-   arch_init_msi_domain(x86_vector_domain);
-
BUG_ON(!alloc_cpumask_var(&vector_searchmask, GFP_KERNEL));
 
/*
--- a/arch/x86/kernel/x86_init.c
+++ b/arch/x86/kernel/x86_init.c
@@ -76,7 +76,8 @@ struct x86_init_ops x86_init __initdata
.pre_vector_init= init_ISA_irqs,
.intr_init  = native_init_IRQ,
.intr_mode_select   = apic_intr_mode_select,
-   .intr_mode_init = apic_intr_mode_init
+   .intr_mode_init = apic_intr_mode_init,
+   .create_pci_msi_domain  = native_create_pci_msi_domain,
},
 
.oem = {
--- a/arch/x86/pci/init.c
+++ b/arch/x86/pci/init.c
@@ -3,6 +3,7 @@
 #include 
 #i

[patch RFC 08/38] x86/irq: Prepare consolidation of irq_alloc_info

2020-08-20 Thread Thomas Gleixner

struct irq_alloc_info is a horrible zoo of unnamed structs in a union. Many
of the struct fields can be generic and don't have to be type specific like
hpet_id, ioapic_id...

Provide a generic set of members to prepare for the consolidation. The goal
is to make irq_alloc_info have the same basic member as the generic
msi_alloc_info so generic MSI domain ops can be reused and yet more mess
can be avoided when (non-PCI) device MSI support comes along.

Signed-off-by: Thomas Gleixner 
---
 arch/x86/include/asm/hw_irq.h |   22 --
 1 file changed, 16 insertions(+), 6 deletions(-)

--- a/arch/x86/include/asm/hw_irq.h
+++ b/arch/x86/include/asm/hw_irq.h
@@ -44,10 +44,25 @@ enum irq_alloc_type {
X86_IRQ_ALLOC_TYPE_HPET_GET_PARENT,
 };
 
+/**
+ * irq_alloc_info - X86 specific interrupt allocation info
+ * @type:  X86 specific allocation type
+ * @flags: Flags for allocation tweaks
+ * @devid: Device ID for allocations
+ * @hwirq: Associated hw interrupt number in the domain
+ * @mask:  CPU mask for vector allocation
+ * @desc:  Pointer to msi descriptor
+ * @data:  Allocation specific data
+ */
 struct irq_alloc_info {
enum irq_alloc_type type;
u32 flags;
-   const struct cpumask*mask;  /* CPU mask for vector allocation */
+   u32 devid;
+   irq_hw_number_t hwirq;
+   const struct cpumask*mask;
+   struct msi_desc *desc;
+   void*data;
+
union {
int unused;
 #ifdef CONFIG_HPET_TIMER
@@ -88,11 +103,6 @@ struct irq_alloc_info {
char*uv_name;
};
 #endif
-#if IS_ENABLED(CONFIG_VMD)
-   struct {
-   struct msi_desc *desc;
-   };
-#endif
};
 };

[patch RFC 16/38] x86/irq: Move apic_post_init() invocation to one place

2020-08-20 Thread Thomas Gleixner

No point to call it from both 32bit and 64bit implementations of
default_setup_apic_routing(). Move it to the caller.

Signed-off-by: Thomas Gleixner 
---
 arch/x86/kernel/apic/apic.c |3 +++
 arch/x86/kernel/apic/probe_32.c |3 ---
 arch/x86/kernel/apic/probe_64.c |3 ---
 3 files changed, 3 insertions(+), 6 deletions(-)

--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -1429,6 +1429,9 @@ void __init apic_intr_mode_init(void)
break;
}
 
+   if (x86_platform.apic_post_init)
+   x86_platform.apic_post_init();
+
apic_bsp_setup(upmode);
 }
 
--- a/arch/x86/kernel/apic/probe_32.c
+++ b/arch/x86/kernel/apic/probe_32.c
@@ -170,9 +170,6 @@ void __init default_setup_apic_routing(v
 
if (apic->setup_apic_routing)
apic->setup_apic_routing();
-
-   if (x86_platform.apic_post_init)
-   x86_platform.apic_post_init();
 }
 
 void __init generic_apic_probe(void)
--- a/arch/x86/kernel/apic/probe_64.c
+++ b/arch/x86/kernel/apic/probe_64.c
@@ -32,9 +32,6 @@ void __init default_setup_apic_routing(v
break;
}
}
-
-   if (x86_platform.apic_post_init)
-   x86_platform.apic_post_init();
 }
 
 int __init default_acpi_madt_oem_check(char *oem_id, char *oem_table_id)

[patch RFC 04/38] x86/irq: Add allocation type for parent domain retrieval

2020-08-20 Thread Thomas Gleixner

irq_remapping_ir_irq_domain() is used to retrieve the remapping parent
domain for an allocation type. irq_remapping_irq_domain() is for retrieving
the actual device domain for allocating interrupts for a device.

The two functions are similar and can be unified by using explicit modes
for parent irq domain retrieval.

Add X86_IRQ_ALLOC_TYPE_IOAPIC/HPET_GET_PARENT and use it in the iommu
implementations. Drop the parent domain retrieval for PCI_MSI/X as that is
unused.

Signed-off-by: Thomas Gleixner 
Cc: Joerg Roedel 
Cc: x...@kernel.org
Cc: linux-hyp...@vger.kernel.org
Cc: io...@lists.linux-foundation.org
Cc: Haiyang Zhang 
Cc: Jon Derrick 
Cc: Lu Baolu 
---
 arch/x86/include/asm/hw_irq.h   |2 ++
 arch/x86/kernel/apic/io_apic.c  |2 +-
 arch/x86/kernel/apic/msi.c  |2 +-
 drivers/iommu/amd/iommu.c   |8 
 drivers/iommu/hyperv-iommu.c|2 +-
 drivers/iommu/intel/irq_remapping.c |8 ++--
 6 files changed, 15 insertions(+), 9 deletions(-)

--- a/arch/x86/include/asm/hw_irq.h
+++ b/arch/x86/include/asm/hw_irq.h
@@ -40,6 +40,8 @@ enum irq_alloc_type {
X86_IRQ_ALLOC_TYPE_PCI_MSIX,
X86_IRQ_ALLOC_TYPE_DMAR,
X86_IRQ_ALLOC_TYPE_UV,
+   X86_IRQ_ALLOC_TYPE_IOAPIC_GET_PARENT,
+   X86_IRQ_ALLOC_TYPE_HPET_GET_PARENT,
 };
 
 struct irq_alloc_info {
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -2296,7 +2296,7 @@ static int mp_irqdomain_create(int ioapi
return 0;
 
init_irq_alloc_info(&info, NULL);
-   info.type = X86_IRQ_ALLOC_TYPE_IOAPIC;
+   info.type = X86_IRQ_ALLOC_TYPE_IOAPIC_GET_PARENT;
info.ioapic_id = mpc_ioapic_id(ioapic);
parent = irq_remapping_get_ir_irq_domain(&info);
if (!parent)
--- a/arch/x86/kernel/apic/msi.c
+++ b/arch/x86/kernel/apic/msi.c
@@ -476,7 +476,7 @@ struct irq_domain *hpet_create_irq_domai
domain_info->data = (void *)(long)hpet_id;
 
init_irq_alloc_info(&info, NULL);
-   info.type = X86_IRQ_ALLOC_TYPE_HPET;
+   info.type = X86_IRQ_ALLOC_TYPE_HPET_GET_PARENT;
info.hpet_id = hpet_id;
parent = irq_remapping_get_ir_irq_domain(&info);
if (parent == NULL)
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -3534,6 +3534,14 @@ static struct irq_domain *get_ir_irq_dom
if (!info)
return NULL;
 
+   switch (info->type) {
+   case X86_IRQ_ALLOC_TYPE_IOAPIC_GET_PARENT:
+   case X86_IRQ_ALLOC_TYPE_HPET_GET_PARENT:
+   break;
+   default:
+   return NULL;
+   }
+
devid = get_devid(info);
if (devid >= 0) {
iommu = amd_iommu_rlookup_table[devid];
--- a/drivers/iommu/hyperv-iommu.c
+++ b/drivers/iommu/hyperv-iommu.c
@@ -184,7 +184,7 @@ static int __init hyperv_enable_irq_rema
 
 static struct irq_domain *hyperv_get_ir_irq_domain(struct irq_alloc_info *info)
 {
-   if (info->type == X86_IRQ_ALLOC_TYPE_IOAPIC)
+   if (info->type == X86_IRQ_ALLOC_TYPE_IOAPIC_GET_PARENT)
return ioapic_ir_domain;
else
return NULL;
--- a/drivers/iommu/intel/irq_remapping.c
+++ b/drivers/iommu/intel/irq_remapping.c
@@ -1109,16 +1109,12 @@ static struct irq_domain *intel_get_ir_i
return NULL;
 
switch (info->type) {
-   case X86_IRQ_ALLOC_TYPE_IOAPIC:
+   case X86_IRQ_ALLOC_TYPE_IOAPIC_GET_PARENT:
iommu = map_ioapic_to_ir(info->ioapic_id);
break;
-   case X86_IRQ_ALLOC_TYPE_HPET:
+   case X86_IRQ_ALLOC_TYPE_HPET_GET_PARENT:
iommu = map_hpet_to_ir(info->hpet_id);
break;
-   case X86_IRQ_ALLOC_TYPE_PCI_MSI:
-   case X86_IRQ_ALLOC_TYPE_PCI_MSIX:
-   iommu = map_dev_to_ir(info->msi_dev);
-   break;
default:
BUG_ON(1);
break;

[patch RFC 03/38] x86/irq: Rename X86_IRQ_ALLOC_TYPE_MSI* to reflect PCI dependency

2020-08-20 Thread Thomas Gleixner

No functional change.

Signed-off-by: Thomas Gleixner 
Cc: Joerg Roedel 
Cc: io...@lists.linux-foundation.org
---
 arch/x86/include/asm/hw_irq.h   |4 ++--
 arch/x86/kernel/apic/msi.c  |6 +++---
 drivers/iommu/amd/iommu.c   |   24 
 drivers/iommu/intel/irq_remapping.c |   18 +-
 4 files changed, 26 insertions(+), 26 deletions(-)

--- a/arch/x86/include/asm/hw_irq.h
+++ b/arch/x86/include/asm/hw_irq.h
@@ -36,8 +36,8 @@ struct msi_desc;
 enum irq_alloc_type {
X86_IRQ_ALLOC_TYPE_IOAPIC = 1,
X86_IRQ_ALLOC_TYPE_HPET,
-   X86_IRQ_ALLOC_TYPE_MSI,
-   X86_IRQ_ALLOC_TYPE_MSIX,
+   X86_IRQ_ALLOC_TYPE_PCI_MSI,
+   X86_IRQ_ALLOC_TYPE_PCI_MSIX,
X86_IRQ_ALLOC_TYPE_DMAR,
X86_IRQ_ALLOC_TYPE_UV,
 };
--- a/arch/x86/kernel/apic/msi.c
+++ b/arch/x86/kernel/apic/msi.c
@@ -188,7 +188,7 @@ int native_setup_msi_irqs(struct pci_dev
struct irq_alloc_info info;
 
init_irq_alloc_info(&info, NULL);
-   info.type = X86_IRQ_ALLOC_TYPE_MSI;
+   info.type = X86_IRQ_ALLOC_TYPE_PCI_MSI;
info.msi_dev = dev;
 
domain = irq_remapping_get_irq_domain(&info);
@@ -220,9 +220,9 @@ int pci_msi_prepare(struct irq_domain *d
init_irq_alloc_info(arg, NULL);
arg->msi_dev = pdev;
if (desc->msi_attrib.is_msix) {
-   arg->type = X86_IRQ_ALLOC_TYPE_MSIX;
+   arg->type = X86_IRQ_ALLOC_TYPE_PCI_MSIX;
} else {
-   arg->type = X86_IRQ_ALLOC_TYPE_MSI;
+   arg->type = X86_IRQ_ALLOC_TYPE_PCI_MSI;
arg->flags |= X86_IRQ_ALLOC_CONTIGUOUS_VECTORS;
}
 
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -3514,8 +3514,8 @@ static int get_devid(struct irq_alloc_in
case X86_IRQ_ALLOC_TYPE_HPET:
devid = get_hpet_devid(info->hpet_id);
break;
-   case X86_IRQ_ALLOC_TYPE_MSI:
-   case X86_IRQ_ALLOC_TYPE_MSIX:
+   case X86_IRQ_ALLOC_TYPE_PCI_MSI:
+   case X86_IRQ_ALLOC_TYPE_PCI_MSIX:
devid = get_device_id(&info->msi_dev->dev);
break;
default:
@@ -3553,8 +3553,8 @@ static struct irq_domain *get_irq_domain
return NULL;
 
switch (info->type) {
-   case X86_IRQ_ALLOC_TYPE_MSI:
-   case X86_IRQ_ALLOC_TYPE_MSIX:
+   case X86_IRQ_ALLOC_TYPE_PCI_MSI:
+   case X86_IRQ_ALLOC_TYPE_PCI_MSIX:
devid = get_device_id(&info->msi_dev->dev);
if (devid < 0)
return NULL;
@@ -3615,8 +3615,8 @@ static void irq_remapping_prepare_irte(s
break;
 
case X86_IRQ_ALLOC_TYPE_HPET:
-   case X86_IRQ_ALLOC_TYPE_MSI:
-   case X86_IRQ_ALLOC_TYPE_MSIX:
+   case X86_IRQ_ALLOC_TYPE_PCI_MSI:
+   case X86_IRQ_ALLOC_TYPE_PCI_MSIX:
msg->address_hi = MSI_ADDR_BASE_HI;
msg->address_lo = MSI_ADDR_BASE_LO;
msg->data = irte_info->index;
@@ -3660,15 +3660,15 @@ static int irq_remapping_alloc(struct ir
 
if (!info)
return -EINVAL;
-   if (nr_irqs > 1 && info->type != X86_IRQ_ALLOC_TYPE_MSI &&
-   info->type != X86_IRQ_ALLOC_TYPE_MSIX)
+   if (nr_irqs > 1 && info->type != X86_IRQ_ALLOC_TYPE_PCI_MSI &&
+   info->type != X86_IRQ_ALLOC_TYPE_PCI_MSIX)
return -EINVAL;
 
/*
 * With IRQ remapping enabled, don't need contiguous CPU vectors
 * to support multiple MSI interrupts.
 */
-   if (info->type == X86_IRQ_ALLOC_TYPE_MSI)
+   if (info->type == X86_IRQ_ALLOC_TYPE_PCI_MSI)
info->flags &= ~X86_IRQ_ALLOC_CONTIGUOUS_VECTORS;
 
devid = get_devid(info);
@@ -3700,9 +3700,9 @@ static int irq_remapping_alloc(struct ir
} else {
index = -ENOMEM;
}
-   } else if (info->type == X86_IRQ_ALLOC_TYPE_MSI ||
-  info->type == X86_IRQ_ALLOC_TYPE_MSIX) {
-   bool align = (info->type == X86_IRQ_ALLOC_TYPE_MSI);
+   } else if (info->type == X86_IRQ_ALLOC_TYPE_PCI_MSI ||
+  info->type == X86_IRQ_ALLOC_TYPE_PCI_MSIX) {
+   bool align = (info->type == X86_IRQ_ALLOC_TYPE_PCI_MSI);
 
index = alloc_irq_index(devid, nr_irqs, align, info->msi_dev);
} else {
--- a/drivers/iommu/intel/irq_remapping.c
+++ b/drivers/iommu/intel/irq_remapping.c
@@ -1115,8 +1115,8 @@ static struct irq_domain *intel_get_ir_i
case X86_IRQ_ALLOC_TYPE_HPET:
iommu = map_hpet_to_ir(info->hpet_id);
break;
-   case X86_IRQ_ALLOC_TYPE_MSI:
-   case X86_IRQ_ALLOC_TYPE_MSIX:
+   case X86_IRQ_ALLOC_TYPE_PCI_MSI:
+   case X86_IRQ_ALLOC_TYPE_PCI_MSIX:
iommu = map_dev_to_ir(info->msi_dev);
break;
default:
@@ -1135,8 +1135,8 @@ static struct irq_domain *intel_get_irq_
return N

[patch RFC 06/38] iommu/amd: Consolidate irq domain getter

2020-08-20 Thread Thomas Gleixner

The irq domain request mode is now indicated in irq_alloc_info::type.

Consolidate the two getter functions into one.

Signed-off-by: Thomas Gleixner 
Cc: Joerg Roedel 
Cc: io...@lists.linux-foundation.org
---
 drivers/iommu/amd/iommu.c |   65 ++
 1 file changed, 21 insertions(+), 44 deletions(-)

--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -3505,77 +3505,54 @@ static void irte_ga_clear_allocated(stru
 
 static int get_devid(struct irq_alloc_info *info)
 {
-   int devid = -1;
-
switch (info->type) {
case X86_IRQ_ALLOC_TYPE_IOAPIC:
-   devid = get_ioapic_devid(info->ioapic_id);
-   break;
+   case X86_IRQ_ALLOC_TYPE_IOAPIC_GET_PARENT:
+   return get_ioapic_devid(info->ioapic_id);
case X86_IRQ_ALLOC_TYPE_HPET:
-   devid = get_hpet_devid(info->hpet_id);
-   break;
+   case X86_IRQ_ALLOC_TYPE_HPET_GET_PARENT:
+   return get_hpet_devid(info->hpet_id);
case X86_IRQ_ALLOC_TYPE_PCI_MSI:
case X86_IRQ_ALLOC_TYPE_PCI_MSIX:
-   devid = get_device_id(&info->msi_dev->dev);
-   break;
+   return get_device_id(&info->msi_dev->dev);
default:
-   BUG_ON(1);
-   break;
+   WARN_ON_ONCE(1);
+   return -1;
}
-
-   return devid;
 }
 
-static struct irq_domain *get_ir_irq_domain(struct irq_alloc_info *info)
+static struct irq_domain *get_irq_domain_for_devid(struct irq_alloc_info *info,
+  int devid)
 {
-   struct amd_iommu *iommu;
-   int devid;
+   struct amd_iommu *iommu = amd_iommu_rlookup_table[devid];
 
-   if (!info)
+   if (!iommu)
return NULL;
 
switch (info->type) {
case X86_IRQ_ALLOC_TYPE_IOAPIC_GET_PARENT:
case X86_IRQ_ALLOC_TYPE_HPET_GET_PARENT:
-   break;
+   return iommu->ir_domain;
+   case X86_IRQ_ALLOC_TYPE_PCI_MSI:
+   case X86_IRQ_ALLOC_TYPE_PCI_MSIX:
+   return iommu->msi_domain;
default:
+   WARN_ON_ONCE(1);
return NULL;
}
-
-   devid = get_devid(info);
-   if (devid >= 0) {
-   iommu = amd_iommu_rlookup_table[devid];
-   if (iommu)
-   return iommu->ir_domain;
-   }
-
-   return NULL;
 }
 
 static struct irq_domain *get_irq_domain(struct irq_alloc_info *info)
 {
-   struct amd_iommu *iommu;
int devid;
 
if (!info)
return NULL;
 
-   switch (info->type) {
-   case X86_IRQ_ALLOC_TYPE_PCI_MSI:
-   case X86_IRQ_ALLOC_TYPE_PCI_MSIX:
-   devid = get_device_id(&info->msi_dev->dev);
-   if (devid < 0)
-   return NULL;
-
-   iommu = amd_iommu_rlookup_table[devid];
-   if (iommu)
-   return iommu->msi_domain;
-   break;
-   default:
-   break;
-   }
-
-   return NULL;
+   devid = get_devid(info);
+   if (devid < 0)
+   return NULL;
+   return get_irq_domain_for_devid(info, devid);
 }
 
 struct irq_remap_ops amd_iommu_irq_ops = {
@@ -3584,7 +3561,7 @@ struct irq_remap_ops amd_iommu_irq_ops =
.disable= amd_iommu_disable,
.reenable   = amd_iommu_reenable,
.enable_faulting= amd_iommu_enable_faulting,
-   .get_ir_irq_domain  = get_ir_irq_domain,
+   .get_ir_irq_domain  = get_irq_domain,
.get_irq_domain = get_irq_domain,
 };

[patch RFC 12/38] x86/irq: Consolidate UV domain allocation

2020-08-20 Thread Thomas Gleixner

Move the UV specific fields into their own struct for readability sake. Get
rid of the #ifdeffery as it does not matter at all whether the alloc info
is a couple of bytes longer or not.

Signed-off-by: Thomas Gleixner 
Cc: Steve Wahl 
Cc:  Dimitri Sivanich 
Cc: Russ Anderson 
---
 arch/x86/include/asm/hw_irq.h |   21 -
 arch/x86/platform/uv/uv_irq.c |   16 
 2 files changed, 20 insertions(+), 17 deletions(-)

--- a/arch/x86/include/asm/hw_irq.h
+++ b/arch/x86/include/asm/hw_irq.h
@@ -53,6 +53,14 @@ struct ioapic_alloc_info {
struct IO_APIC_route_entry  *entry;
 };
 
+struct uv_alloc_info {
+   int limit;
+   int blade;
+   unsigned long   offset;
+   char*name;
+
+};
+
 /**
  * irq_alloc_info - X86 specific interrupt allocation info
  * @type:  X86 specific allocation type
@@ -64,7 +72,8 @@ struct ioapic_alloc_info {
  * @data:  Allocation specific data
  *
  * @ioapic:IOAPIC specific allocation data
- */
+ * @uv:UV specific allocation data
+*/
 struct irq_alloc_info {
enum irq_alloc_type type;
u32 flags;
@@ -76,6 +85,8 @@ struct irq_alloc_info {
 
union {
struct ioapic_alloc_infoioapic;
+   struct uv_alloc_infouv;
+
int unused;
 #ifdef CONFIG_PCI_MSI
struct {
@@ -83,14 +94,6 @@ struct irq_alloc_info {
irq_hw_number_t msi_hwirq;
};
 #endif
-#ifdef CONFIG_X86_UV
-   struct {
-   int uv_limit;
-   int uv_blade;
-   unsigned long   uv_offset;
-   char*uv_name;
-   };
-#endif
};
 };
 
--- a/arch/x86/platform/uv/uv_irq.c
+++ b/arch/x86/platform/uv/uv_irq.c
@@ -90,15 +90,15 @@ static int uv_domain_alloc(struct irq_do
 
ret = irq_domain_alloc_irqs_parent(domain, virq, nr_irqs, arg);
if (ret >= 0) {
-   if (info->uv_limit == UV_AFFINITY_CPU)
+   if (info->uv.limit == UV_AFFINITY_CPU)
irq_set_status_flags(virq, IRQ_NO_BALANCING);
else
irq_set_status_flags(virq, IRQ_MOVE_PCNTXT);
 
-   chip_data->pnode = uv_blade_to_pnode(info->uv_blade);
-   chip_data->offset = info->uv_offset;
+   chip_data->pnode = uv_blade_to_pnode(info->uv.blade);
+   chip_data->offset = info->uv.offset;
irq_domain_set_info(domain, virq, virq, &uv_irq_chip, chip_data,
-   handle_percpu_irq, NULL, info->uv_name);
+   handle_percpu_irq, NULL, info->uv.name);
} else {
kfree(chip_data);
}
@@ -193,10 +193,10 @@ int uv_setup_irq(char *irq_name, int cpu
 
init_irq_alloc_info(&info, cpumask_of(cpu));
info.type = X86_IRQ_ALLOC_TYPE_UV;
-   info.uv_limit = limit;
-   info.uv_blade = mmr_blade;
-   info.uv_offset = mmr_offset;
-   info.uv_name = irq_name;
+   info.uv.limit = limit;
+   info.uv.blade = mmr_blade;
+   info.uv.offset = mmr_offset;
+   info.uv.name = irq_name;
 
return irq_domain_alloc_irqs(domain, 1,
 uv_blade_to_memory_nid(mmr_blade), &info);

[patch RFC 05/38] iommu/vt-d: Consolidate irq domain getter

2020-08-20 Thread Thomas Gleixner

The irq domain request mode is now indicated in irq_alloc_info::type.

Consolidate the two getter functions into one.

Signed-off-by: Thomas Gleixner 
Cc: Joerg Roedel 
Cc: io...@lists.linux-foundation.org
Cc: Lu Baolu 
---
 drivers/iommu/intel/irq_remapping.c |   67 
 1 file changed, 24 insertions(+), 43 deletions(-)

--- a/drivers/iommu/intel/irq_remapping.c
+++ b/drivers/iommu/intel/irq_remapping.c
@@ -204,35 +204,40 @@ static int modify_irte(struct irq_2_iomm
return rc;
 }
 
-static struct intel_iommu *map_hpet_to_ir(u8 hpet_id)
+static struct irq_domain *map_hpet_to_ir(u8 hpet_id)
 {
int i;
 
-   for (i = 0; i < MAX_HPET_TBS; i++)
+   for (i = 0; i < MAX_HPET_TBS; i++) {
if (ir_hpet[i].id == hpet_id && ir_hpet[i].iommu)
-   return ir_hpet[i].iommu;
+   return ir_hpet[i].iommu->ir_domain;
+   }
return NULL;
 }
 
-static struct intel_iommu *map_ioapic_to_ir(int apic)
+static struct intel_iommu *map_ioapic_to_iommu(int apic)
 {
int i;
 
-   for (i = 0; i < MAX_IO_APICS; i++)
+   for (i = 0; i < MAX_IO_APICS; i++) {
if (ir_ioapic[i].id == apic && ir_ioapic[i].iommu)
return ir_ioapic[i].iommu;
+   }
return NULL;
 }
 
-static struct intel_iommu *map_dev_to_ir(struct pci_dev *dev)
+static struct irq_domain *map_ioapic_to_ir(int apic)
 {
-   struct dmar_drhd_unit *drhd;
+   struct intel_iommu *iommu = map_ioapic_to_iommu(apic);
 
-   drhd = dmar_find_matched_drhd_unit(dev);
-   if (!drhd)
-   return NULL;
+   return iommu ? iommu->ir_domain : NULL;
+}
+
+static struct irq_domain *map_dev_to_ir(struct pci_dev *dev)
+{
+   struct dmar_drhd_unit *drhd = dmar_find_matched_drhd_unit(dev);
 
-   return drhd->iommu;
+   return drhd ? drhd->iommu->ir_msi_domain : NULL;
 }
 
 static int clear_entries(struct irq_2_iommu *irq_iommu)
@@ -996,7 +1001,7 @@ static int __init parse_ioapics_under_ir
 
for (ioapic_idx = 0; ioapic_idx < nr_ioapics; ioapic_idx++) {
int ioapic_id = mpc_ioapic_id(ioapic_idx);
-   if (!map_ioapic_to_ir(ioapic_id)) {
+   if (!map_ioapic_to_iommu(ioapic_id)) {
pr_err(FW_BUG "ioapic %d has no mapping iommu, "
   "interrupt remapping will be disabled\n",
   ioapic_id);
@@ -1101,47 +1106,23 @@ static void prepare_irte(struct irte *ir
irte->redir_hint = 1;
 }
 
-static struct irq_domain *intel_get_ir_irq_domain(struct irq_alloc_info *info)
+static struct irq_domain *intel_get_irq_domain(struct irq_alloc_info *info)
 {
-   struct intel_iommu *iommu = NULL;
-
if (!info)
return NULL;
 
switch (info->type) {
case X86_IRQ_ALLOC_TYPE_IOAPIC_GET_PARENT:
-   iommu = map_ioapic_to_ir(info->ioapic_id);
-   break;
+   return map_ioapic_to_ir(info->ioapic_id);
case X86_IRQ_ALLOC_TYPE_HPET_GET_PARENT:
-   iommu = map_hpet_to_ir(info->hpet_id);
-   break;
-   default:
-   BUG_ON(1);
-   break;
-   }
-
-   return iommu ? iommu->ir_domain : NULL;
-}
-
-static struct irq_domain *intel_get_irq_domain(struct irq_alloc_info *info)
-{
-   struct intel_iommu *iommu;
-
-   if (!info)
-   return NULL;
-
-   switch (info->type) {
+   return map_hpet_to_ir(info->hpet_id);
case X86_IRQ_ALLOC_TYPE_PCI_MSI:
case X86_IRQ_ALLOC_TYPE_PCI_MSIX:
-   iommu = map_dev_to_ir(info->msi_dev);
-   if (iommu)
-   return iommu->ir_msi_domain;
-   break;
+   return map_dev_to_ir(info->msi_dev);
default:
-   break;
+   WARN_ON_ONCE(1);
+   return NULL;
}
-
-   return NULL;
 }
 
 struct irq_remap_ops intel_irq_remap_ops = {
@@ -1150,7 +1131,7 @@ struct irq_remap_ops intel_irq_remap_ops
.disable= disable_irq_remapping,
.reenable   = reenable_irq_remapping,
.enable_faulting= enable_drhd_fault_handling,
-   .get_ir_irq_domain  = intel_get_ir_irq_domain,
+   .get_ir_irq_domain  = intel_get_irq_domain,
.get_irq_domain = intel_get_irq_domain,
 };

[patch RFC 10/38] x86/ioapic: Consolidate IOAPIC allocation

2020-08-20 Thread Thomas Gleixner

Move the IOAPIC specific fields into their own struct and reuse the common
devid. Get rid of the #ifdeffery as it does not matter at all whether the
alloc info is a couple of bytes longer or not.

Signed-off-by: Thomas Gleixner 
Cc: Wei Liu 
Cc: "K. Y. Srinivasan" 
Cc: Stephen Hemminger 
Cc: Joerg Roedel 
Cc: linux-hyp...@vger.kernel.org
Cc: io...@lists.linux-foundation.org
Cc: Haiyang Zhang 
Cc: Jon Derrick 
Cc: Lu Baolu 
---
 arch/x86/include/asm/hw_irq.h   |   23 ++-
 arch/x86/kernel/apic/io_apic.c  |   70 ++--
 drivers/iommu/amd/iommu.c   |   14 +++
 drivers/iommu/hyperv-iommu.c|2 -
 drivers/iommu/intel/irq_remapping.c |   18 -
 5 files changed, 64 insertions(+), 63 deletions(-)

--- a/arch/x86/include/asm/hw_irq.h
+++ b/arch/x86/include/asm/hw_irq.h
@@ -44,6 +44,15 @@ enum irq_alloc_type {
X86_IRQ_ALLOC_TYPE_HPET_GET_PARENT,
 };
 
+struct ioapic_alloc_info {
+   int pin;
+   int node;
+   u32 trigger : 1;
+   u32 polarity : 1;
+   u32 valid : 1;
+   struct IO_APIC_route_entry  *entry;
+};
+
 /**
  * irq_alloc_info - X86 specific interrupt allocation info
  * @type:  X86 specific allocation type
@@ -53,6 +62,8 @@ enum irq_alloc_type {
  * @mask:  CPU mask for vector allocation
  * @desc:  Pointer to msi descriptor
  * @data:  Allocation specific data
+ *
+ * @ioapic:IOAPIC specific allocation data
  */
 struct irq_alloc_info {
enum irq_alloc_type type;
@@ -64,6 +75,7 @@ struct irq_alloc_info {
void*data;
 
union {
+   struct ioapic_alloc_infoioapic;
int unused;
 #ifdef CONFIG_PCI_MSI
struct {
@@ -71,17 +83,6 @@ struct irq_alloc_info {
irq_hw_number_t msi_hwirq;
};
 #endif
-#ifdef CONFIG_X86_IO_APIC
-   struct {
-   int ioapic_id;
-   int ioapic_pin;
-   int ioapic_node;
-   u32 ioapic_trigger : 1;
-   u32 ioapic_polarity : 1;
-   u32 ioapic_valid : 1;
-   struct IO_APIC_route_entry *ioapic_entry;
-   };
-#endif
 #ifdef CONFIG_DMAR_TABLE
struct {
int dmar_id;
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -860,10 +860,10 @@ void ioapic_set_alloc_attr(struct irq_al
 {
init_irq_alloc_info(info, NULL);
info->type = X86_IRQ_ALLOC_TYPE_IOAPIC;
-   info->ioapic_node = node;
-   info->ioapic_trigger = trigger;
-   info->ioapic_polarity = polarity;
-   info->ioapic_valid = 1;
+   info->ioapic.node = node;
+   info->ioapic.trigger = trigger;
+   info->ioapic.polarity = polarity;
+   info->ioapic.valid = 1;
 }
 
 #ifndef CONFIG_ACPI
@@ -878,32 +878,32 @@ static void ioapic_copy_alloc_attr(struc
 
copy_irq_alloc_info(dst, src);
dst->type = X86_IRQ_ALLOC_TYPE_IOAPIC;
-   dst->ioapic_id = mpc_ioapic_id(ioapic_idx);
-   dst->ioapic_pin = pin;
-   dst->ioapic_valid = 1;
-   if (src && src->ioapic_valid) {
-   dst->ioapic_node = src->ioapic_node;
-   dst->ioapic_trigger = src->ioapic_trigger;
-   dst->ioapic_polarity = src->ioapic_polarity;
+   dst->devid = mpc_ioapic_id(ioapic_idx);
+   dst->ioapic.pin = pin;
+   dst->ioapic.valid = 1;
+   if (src && src->ioapic.valid) {
+   dst->ioapic.node = src->ioapic.node;
+   dst->ioapic.trigger = src->ioapic.trigger;
+   dst->ioapic.polarity = src->ioapic.polarity;
} else {
-   dst->ioapic_node = NUMA_NO_NODE;
+   dst->ioapic.node = NUMA_NO_NODE;
if (acpi_get_override_irq(gsi, &trigger, &polarity) >= 0) {
-   dst->ioapic_trigger = trigger;
-   dst->ioapic_polarity = polarity;
+   dst->ioapic.trigger = trigger;
+   dst->ioapic.polarity = polarity;
} else {
/*
 * PCI interrupts are always active low level
 * triggered.
 */
-   dst->ioapic_trigger = IOAPIC_LEVEL;
-   dst->ioapic_polarity = IOAPIC_POL_LOW;
+   dst->ioapic.trigger = IOAPIC_LEVEL;
+   dst->ioapic.polarity = IOAPIC_POL_LOW;
}
}
 }
 
 static int ioapic_alloc_attr_node(struct irq_alloc_info *info)
 {
-   return (info && info->ioapic_valid) ? info->ioapic_node : NUMA_NO_NODE;
+

[patch RFC 13/38] PCI: MSI: Rework pci_msi_domain_calc_hwirq()

2020-08-20 Thread Thomas Gleixner

Retrieve the PCI device from the msi descriptor instead of doing so at the
call sites.

Signed-off-by: Thomas Gleixner 
Cc: linux-...@vger.kernel.org
---
 arch/x86/kernel/apic/msi.c |2 +-
 drivers/pci/msi.c  |   13 ++---
 include/linux/msi.h|3 +--
 3 files changed, 8 insertions(+), 10 deletions(-)

--- a/arch/x86/kernel/apic/msi.c
+++ b/arch/x86/kernel/apic/msi.c
@@ -232,7 +232,7 @@ EXPORT_SYMBOL_GPL(pci_msi_prepare);
 
 void pci_msi_set_desc(msi_alloc_info_t *arg, struct msi_desc *desc)
 {
-   arg->msi_hwirq = pci_msi_domain_calc_hwirq(arg->msi_dev, desc);
+   arg->msi_hwirq = pci_msi_domain_calc_hwirq(desc);
 }
 EXPORT_SYMBOL_GPL(pci_msi_set_desc);
 
--- a/drivers/pci/msi.c
+++ b/drivers/pci/msi.c
@@ -1346,17 +1346,17 @@ void pci_msi_domain_write_msg(struct irq
 
 /**
  * pci_msi_domain_calc_hwirq - Generate a unique ID for an MSI source
- * @dev:   Pointer to the PCI device
  * @desc:  Pointer to the MSI descriptor
  *
  * The ID number is only used within the irqdomain.
  */
-irq_hw_number_t pci_msi_domain_calc_hwirq(struct pci_dev *dev,
- struct msi_desc *desc)
+irq_hw_number_t pci_msi_domain_calc_hwirq(struct msi_desc *desc)
 {
+   struct pci_dev *pdev = msi_desc_to_pci_dev(desc);
+
return (irq_hw_number_t)desc->msi_attrib.entry_nr |
-   pci_dev_id(dev) << 11 |
-   (pci_domain_nr(dev->bus) & 0x) << 27;
+   pci_dev_id(pdev) << 11 |
+   (pci_domain_nr(pdev->bus) & 0x) << 27;
 }
 
 static inline bool pci_msi_desc_is_multi_msi(struct msi_desc *desc)
@@ -1406,8 +1406,7 @@ static void pci_msi_domain_set_desc(msi_
struct msi_desc *desc)
 {
arg->desc = desc;
-   arg->hwirq = pci_msi_domain_calc_hwirq(msi_desc_to_pci_dev(desc),
-  desc);
+   arg->hwirq = pci_msi_domain_calc_hwirq(desc);
 }
 #else
 #define pci_msi_domain_set_descNULL
--- a/include/linux/msi.h
+++ b/include/linux/msi.h
@@ -369,8 +369,7 @@ void pci_msi_domain_write_msg(struct irq
 struct irq_domain *pci_msi_create_irq_domain(struct fwnode_handle *fwnode,
 struct msi_domain_info *info,
 struct irq_domain *parent);
-irq_hw_number_t pci_msi_domain_calc_hwirq(struct pci_dev *dev,
- struct msi_desc *desc);
+irq_hw_number_t pci_msi_domain_calc_hwirq(struct msi_desc *desc);
 int pci_msi_domain_check_cap(struct irq_domain *domain,
 struct msi_domain_info *info, struct device *dev);
 u32 pci_msi_domain_get_msi_rid(struct irq_domain *domain, struct pci_dev 
*pdev);

[patch RFC 15/38] x86/msi: Use generic MSI domain ops

2020-08-20 Thread Thomas Gleixner

pci_msi_get_hwirq() and pci_msi_set_desc are not longer special. Enable the
generic MSI domain ops in the core and PCI MSI code unconditionally and get
rid of the x86 specific implementations in the X86 MSI code and in the
hyperv PCI driver.

Signed-off-by: Thomas Gleixner 
Cc: Wei Liu 
Cc: Stephen Hemminger 
Cc: Haiyang Zhang 
Cc: linux-...@vger.kernel.org
Cc: linux-hyp...@vger.kernel.org
---
 arch/x86/include/asm/msi.h  |2 --
 arch/x86/kernel/apic/msi.c  |   15 ---
 drivers/pci/controller/pci-hyperv.c |8 
 drivers/pci/msi.c   |4 
 kernel/irq/msi.c|6 --
 5 files changed, 35 deletions(-)

--- a/arch/x86/include/asm/msi.h
+++ b/arch/x86/include/asm/msi.h
@@ -9,6 +9,4 @@ typedef struct irq_alloc_info msi_alloc_
 int pci_msi_prepare(struct irq_domain *domain, struct device *dev, int nvec,
msi_alloc_info_t *arg);
 
-void pci_msi_set_desc(msi_alloc_info_t *arg, struct msi_desc *desc);
-
 #endif /* _ASM_X86_MSI_H */
--- a/arch/x86/kernel/apic/msi.c
+++ b/arch/x86/kernel/apic/msi.c
@@ -204,12 +204,6 @@ void native_teardown_msi_irq(unsigned in
irq_domain_free_irqs(irq, 1);
 }
 
-static irq_hw_number_t pci_msi_get_hwirq(struct msi_domain_info *info,
-msi_alloc_info_t *arg)
-{
-   return arg->hwirq;
-}
-
 int pci_msi_prepare(struct irq_domain *domain, struct device *dev, int nvec,
msi_alloc_info_t *arg)
 {
@@ -228,17 +222,8 @@ int pci_msi_prepare(struct irq_domain *d
 }
 EXPORT_SYMBOL_GPL(pci_msi_prepare);
 
-void pci_msi_set_desc(msi_alloc_info_t *arg, struct msi_desc *desc)
-{
-   arg->desc = desc;
-   arg->hwirq = pci_msi_domain_calc_hwirq(desc);
-}
-EXPORT_SYMBOL_GPL(pci_msi_set_desc);
-
 static struct msi_domain_ops pci_msi_domain_ops = {
-   .get_hwirq  = pci_msi_get_hwirq,
.msi_prepare= pci_msi_prepare,
-   .set_desc   = pci_msi_set_desc,
 };
 
 static struct msi_domain_info pci_msi_domain_info = {
--- a/drivers/pci/controller/pci-hyperv.c
+++ b/drivers/pci/controller/pci-hyperv.c
@@ -1531,16 +1531,8 @@ static struct irq_chip hv_msi_irq_chip =
.irq_unmask = hv_irq_unmask,
 };
 
-static irq_hw_number_t hv_msi_domain_ops_get_hwirq(struct msi_domain_info 
*info,
-  msi_alloc_info_t *arg)
-{
-   return arg->hwirq;
-}
-
 static struct msi_domain_ops hv_msi_ops = {
-   .get_hwirq  = hv_msi_domain_ops_get_hwirq,
.msi_prepare= pci_msi_prepare,
-   .set_desc   = pci_msi_set_desc,
.msi_free   = hv_msi_free,
 };
 
--- a/drivers/pci/msi.c
+++ b/drivers/pci/msi.c
@@ -1401,16 +1401,12 @@ static int pci_msi_domain_handle_error(s
return error;
 }
 
-#ifdef GENERIC_MSI_DOMAIN_OPS
 static void pci_msi_domain_set_desc(msi_alloc_info_t *arg,
struct msi_desc *desc)
 {
arg->desc = desc;
arg->hwirq = pci_msi_domain_calc_hwirq(desc);
 }
-#else
-#define pci_msi_domain_set_descNULL
-#endif
 
 static struct msi_domain_ops pci_msi_domain_ops_default = {
.set_desc   = pci_msi_domain_set_desc,
--- a/kernel/irq/msi.c
+++ b/kernel/irq/msi.c
@@ -187,7 +187,6 @@ static const struct irq_domain_ops msi_d
.deactivate = msi_domain_deactivate,
 };
 
-#ifdef GENERIC_MSI_DOMAIN_OPS
 static irq_hw_number_t msi_domain_ops_get_hwirq(struct msi_domain_info *info,
msi_alloc_info_t *arg)
 {
@@ -206,11 +205,6 @@ static void msi_domain_ops_set_desc(msi_
 {
arg->desc = desc;
 }
-#else
-#define msi_domain_ops_get_hwirq   NULL
-#define msi_domain_ops_prepare NULL
-#define msi_domain_ops_set_descNULL
-#endif /* !GENERIC_MSI_DOMAIN_OPS */
 
 static int msi_domain_ops_init(struct irq_domain *domain,
   struct msi_domain_info *info,

[patch RFC 30/38] PCI/MSI: Allow to disable arch fallbacks

2020-08-20 Thread Thomas Gleixner

If an architecture does not require the MSI setup/teardown fallback
functions, then allow them to be replaced by stub functions which emit a
warning.

Signed-off-by: Thomas Gleixner 
Cc: Bjorn Helgaas 
Cc: linux-...@vger.kernel.org
---
 drivers/pci/Kconfig |3 +++
 drivers/pci/msi.c   |3 ++-
 include/linux/msi.h |   31 ++-
 3 files changed, 31 insertions(+), 6 deletions(-)

--- a/drivers/pci/Kconfig
+++ b/drivers/pci/Kconfig
@@ -56,6 +56,9 @@ config PCI_MSI_IRQ_DOMAIN
depends on PCI_MSI
select GENERIC_MSI_IRQ_DOMAIN
 
+config PCI_MSI_DISABLE_ARCH_FALLBACKS
+   bool
+
 config PCI_QUIRKS
default y
bool "Enable PCI quirk workarounds" if EXPERT
--- a/drivers/pci/msi.c
+++ b/drivers/pci/msi.c
@@ -58,8 +58,8 @@ static void pci_msi_teardown_msi_irqs(st
 #define pci_msi_teardown_msi_irqs  arch_teardown_msi_irqs
 #endif
 
+#ifndef CONFIG_PCI_MSI_DISABLE_ARCH_FALLBACKS
 /* Arch hooks */
-
 int __weak arch_setup_msi_irq(struct pci_dev *dev, struct msi_desc *desc)
 {
struct msi_controller *chip = dev->bus->msi;
@@ -132,6 +132,7 @@ void __weak arch_teardown_msi_irqs(struc
 {
return default_teardown_msi_irqs(dev);
 }
+#endif /* !CONFIG_PCI_MSI_DISABLE_ARCH_FALLBACKS */
 
 static void default_restore_msi_irq(struct pci_dev *dev, int irq)
 {
--- a/include/linux/msi.h
+++ b/include/linux/msi.h
@@ -193,17 +193,38 @@ void pci_msi_mask_irq(struct irq_data *d
 void pci_msi_unmask_irq(struct irq_data *data);
 
 /*
- * The arch hooks to setup up msi irqs. Those functions are
- * implemented as weak symbols so that they /can/ be overriden by
- * architecture specific code if needed.
+ * The arch hooks to setup up msi irqs. Default functions are implemented
+ * as weak symbols so that they /can/ be overriden by architecture specific
+ * code if needed.
+ *
+ * They can be replaced by stubs with warnings via
+ * CONFIG_PCI_MSI_DISABLE_ARCH_FALLBACKS when the architecture fully
+ * utilizes direct irqdomain based setup.
  */
+#ifndef CONFIG_PCI_MSI_DISABLE_ARCH_FALLBACKS
 int arch_setup_msi_irq(struct pci_dev *dev, struct msi_desc *desc);
 void arch_teardown_msi_irq(unsigned int irq);
 int arch_setup_msi_irqs(struct pci_dev *dev, int nvec, int type);
 void arch_teardown_msi_irqs(struct pci_dev *dev);
-void arch_restore_msi_irqs(struct pci_dev *dev);
-
 void default_teardown_msi_irqs(struct pci_dev *dev);
+#else
+static inline int arch_setup_msi_irqs(struct pci_dev *dev, int nvec, int type)
+{
+   WARN_ON_ONCE(1);
+   return -ENODEV;
+}
+
+static inline void arch_teardown_msi_irqs(struct pci_dev *dev)
+{
+   WARN_ON_ONCE(1);
+}
+#endif
+
+/*
+ * The restore hooks are still available as they are useful even
+ * for fully irq domain based setups. Courtesy to XEN/X86.
+ */
+void arch_restore_msi_irqs(struct pci_dev *dev);
 void default_restore_msi_irqs(struct pci_dev *dev);
 
 struct msi_controller {

[patch RFC 24/38] x86/xen: Consolidate XEN-MSI init

2020-08-20 Thread Thomas Gleixner

X86 cannot store the irq domain pointer in struct device without breaking
XEN because the irq domain pointer takes precedence over arch_*_msi_irqs()
fallbacks.

To achieve this XEN MSI interrupt management needs to be wrapped into an
irq domain.

Move the x86_msi ops setup into a single function to prepare for this.

Signed-off-by: Thomas Gleixner 
---
 arch/x86/pci/xen.c |   51 ---
 1 file changed, 32 insertions(+), 19 deletions(-)

--- a/arch/x86/pci/xen.c
+++ b/arch/x86/pci/xen.c
@@ -371,7 +371,10 @@ static void xen_initdom_restore_msi_irqs
WARN(ret && ret != -ENOSYS, "restore_msi -> %d\n", ret);
}
 }
-#endif
+#else /* CONFIG_XEN_DOM0 */
+#define xen_initdom_setup_msi_irqs NULL
+#define xen_initdom_restore_msi_irqs   NULL
+#endif /* !CONFIG_XEN_DOM0 */
 
 static void xen_teardown_msi_irqs(struct pci_dev *dev)
 {
@@ -403,7 +406,31 @@ static void xen_teardown_msi_irq(unsigne
WARN_ON_ONCE(1);
 }
 
-#endif
+static __init void xen_setup_pci_msi(void)
+{
+   if (xen_initial_domain()) {
+   x86_msi.setup_msi_irqs = xen_initdom_setup_msi_irqs;
+   x86_msi.teardown_msi_irqs = xen_teardown_msi_irqs;
+   x86_msi.restore_msi_irqs = xen_initdom_restore_msi_irqs;
+   pci_msi_ignore_mask = 1;
+   } else if (xen_pv_domain()) {
+   x86_msi.setup_msi_irqs = xen_setup_msi_irqs;
+   x86_msi.teardown_msi_irqs = xen_pv_teardown_msi_irqs;
+   pci_msi_ignore_mask = 1;
+   } else if (xen_hvm_domain()) {
+   x86_msi.setup_msi_irqs = xen_hvm_setup_msi_irqs;
+   x86_msi.teardown_msi_irqs = xen_teardown_msi_irqs;
+   } else {
+   WARN_ON_ONCE(1);
+   return;
+   }
+
+   x86_msi.teardown_msi_irq = xen_teardown_msi_irq;
+}
+
+#else /* CONFIG_PCI_MSI */
+static inline void xen_setup_pci_msi(void) { }
+#endif /* CONFIG_PCI_MSI */
 
 int __init pci_xen_init(void)
 {
@@ -420,12 +447,7 @@ int __init pci_xen_init(void)
/* Keep ACPI out of the picture */
acpi_noirq_set();
 
-#ifdef CONFIG_PCI_MSI
-   x86_msi.setup_msi_irqs = xen_setup_msi_irqs;
-   x86_msi.teardown_msi_irq = xen_teardown_msi_irq;
-   x86_msi.teardown_msi_irqs = xen_pv_teardown_msi_irqs;
-   pci_msi_ignore_mask = 1;
-#endif
+   xen_setup_pci_msi();
return 0;
 }
 
@@ -445,10 +467,7 @@ static void __init xen_hvm_msi_init(void
((eax & XEN_HVM_CPUID_APIC_ACCESS_VIRT) && 
boot_cpu_has(X86_FEATURE_APIC)))
return;
}
-
-   x86_msi.setup_msi_irqs = xen_hvm_setup_msi_irqs;
-   x86_msi.teardown_msi_irqs = xen_teardown_msi_irqs;
-   x86_msi.teardown_msi_irq = xen_teardown_msi_irq;
+   xen_setup_pci_msi();
 }
 #endif
 
@@ -481,13 +500,7 @@ int __init pci_xen_initial_domain(void)
 {
int irq;
 
-#ifdef CONFIG_PCI_MSI
-   x86_msi.setup_msi_irqs = xen_initdom_setup_msi_irqs;
-   x86_msi.teardown_msi_irq = xen_teardown_msi_irq;
-   x86_msi.teardown_msi_irqs = xen_teardown_msi_irqs;
-   x86_msi.restore_msi_irqs = xen_initdom_restore_msi_irqs;
-   pci_msi_ignore_mask = 1;
-#endif
+   xen_setup_pci_msi();
__acpi_register_gsi = acpi_register_gsi_xen;
__acpi_unregister_gsi = NULL;
/*

[patch RFC 14/38] x86/msi: Consolidate MSI allocation

2020-08-20 Thread Thomas Gleixner

Convert the interrupt remap drivers to retrieve the pci device from the msi
descriptor and use info::hwirq.

This is the first step to prepare x86 for using the generic MSI domain ops.

Signed-off-by: Thomas Gleixner 
Cc: Wei Liu 
Cc: Stephen Hemminger 
Cc: Joerg Roedel 
Cc: linux-...@vger.kernel.org
Cc: linux-hyp...@vger.kernel.org
Cc: io...@lists.linux-foundation.org
Cc: Haiyang Zhang 
Cc: Lu Baolu 
---
 arch/x86/include/asm/hw_irq.h   |8 
 arch/x86/kernel/apic/msi.c  |7 +++
 drivers/iommu/amd/iommu.c   |5 +++--
 drivers/iommu/intel/irq_remapping.c |4 ++--
 drivers/pci/controller/pci-hyperv.c |2 +-
 5 files changed, 9 insertions(+), 17 deletions(-)

--- a/arch/x86/include/asm/hw_irq.h
+++ b/arch/x86/include/asm/hw_irq.h
@@ -85,14 +85,6 @@ struct irq_alloc_info {
union {
struct ioapic_alloc_infoioapic;
struct uv_alloc_infouv;
-
-   int unused;
-#ifdef CONFIG_PCI_MSI
-   struct {
-   struct pci_dev  *msi_dev;
-   irq_hw_number_t msi_hwirq;
-   };
-#endif
};
 };
 
--- a/arch/x86/kernel/apic/msi.c
+++ b/arch/x86/kernel/apic/msi.c
@@ -189,7 +189,6 @@ int native_setup_msi_irqs(struct pci_dev
 
init_irq_alloc_info(&info, NULL);
info.type = X86_IRQ_ALLOC_TYPE_PCI_MSI;
-   info.msi_dev = dev;
 
domain = irq_remapping_get_irq_domain(&info);
if (domain == NULL)
@@ -208,7 +207,7 @@ void native_teardown_msi_irq(unsigned in
 static irq_hw_number_t pci_msi_get_hwirq(struct msi_domain_info *info,
 msi_alloc_info_t *arg)
 {
-   return arg->msi_hwirq;
+   return arg->hwirq;
 }
 
 int pci_msi_prepare(struct irq_domain *domain, struct device *dev, int nvec,
@@ -218,7 +217,6 @@ int pci_msi_prepare(struct irq_domain *d
struct msi_desc *desc = first_pci_msi_entry(pdev);
 
init_irq_alloc_info(arg, NULL);
-   arg->msi_dev = pdev;
if (desc->msi_attrib.is_msix) {
arg->type = X86_IRQ_ALLOC_TYPE_PCI_MSIX;
} else {
@@ -232,7 +230,8 @@ EXPORT_SYMBOL_GPL(pci_msi_prepare);
 
 void pci_msi_set_desc(msi_alloc_info_t *arg, struct msi_desc *desc)
 {
-   arg->msi_hwirq = pci_msi_domain_calc_hwirq(desc);
+   arg->desc = desc;
+   arg->hwirq = pci_msi_domain_calc_hwirq(desc);
 }
 EXPORT_SYMBOL_GPL(pci_msi_set_desc);
 
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -3514,7 +3514,7 @@ static int get_devid(struct irq_alloc_in
return get_hpet_devid(info->devid);
case X86_IRQ_ALLOC_TYPE_PCI_MSI:
case X86_IRQ_ALLOC_TYPE_PCI_MSIX:
-   return get_device_id(&info->msi_dev->dev);
+   return get_device_id(msi_desc_to_dev(info->desc));
default:
WARN_ON_ONCE(1);
return -1;
@@ -3688,7 +3688,8 @@ static int irq_remapping_alloc(struct ir
   info->type == X86_IRQ_ALLOC_TYPE_PCI_MSIX) {
bool align = (info->type == X86_IRQ_ALLOC_TYPE_PCI_MSI);
 
-   index = alloc_irq_index(devid, nr_irqs, align, info->msi_dev);
+   index = alloc_irq_index(devid, nr_irqs, align,
+   msi_desc_to_pci_dev(info->desc));
} else {
index = alloc_irq_index(devid, nr_irqs, false, NULL);
}
--- a/drivers/iommu/intel/irq_remapping.c
+++ b/drivers/iommu/intel/irq_remapping.c
@@ -1118,7 +1118,7 @@ static struct irq_domain *intel_get_irq_
return map_hpet_to_ir(info->devid);
case X86_IRQ_ALLOC_TYPE_PCI_MSI:
case X86_IRQ_ALLOC_TYPE_PCI_MSIX:
-   return map_dev_to_ir(info->msi_dev);
+   return map_dev_to_ir(msi_desc_to_pci_dev(info->desc));
default:
WARN_ON_ONCE(1);
return NULL;
@@ -1287,7 +1287,7 @@ static void intel_irq_remapping_prepare_
if (info->type == X86_IRQ_ALLOC_TYPE_HPET)
set_hpet_sid(irte, info->devid);
else
-   set_msi_sid(irte, info->msi_dev);
+   set_msi_sid(irte, msi_desc_to_pci_dev(info->desc));
 
msg->address_hi = MSI_ADDR_BASE_HI;
msg->data = sub_handle;
--- a/drivers/pci/controller/pci-hyperv.c
+++ b/drivers/pci/controller/pci-hyperv.c
@@ -1534,7 +1534,7 @@ static struct irq_chip hv_msi_irq_chip =
 static irq_hw_number_t hv_msi_domain_ops_get_hwirq(struct msi_domain_info 
*info,
   msi_alloc_info_t *arg)
 {
-   return arg->msi_hwirq;
+   return arg->hwirq;
 }
 
 static struct msi_domain_ops hv_msi_ops = {

[patch RFC 23/38] x86/xen: Rework MSI teardown

2020-08-20 Thread Thomas Gleixner

X86 cannot store the irq domain pointer in struct device without breaking
XEN because the irq domain pointer takes precedence over arch_*_msi_irqs()
fallbacks.

XENs MSI teardown relies on default_teardown_msi_irqs() which invokes
arch_teardown_msi_irq(). default_teardown_msi_irqs() is a trivial iterator
over the msi entries associated to a device.

Implement this loop in xen_teardown_msi_irqs() to prepare for removal of
the fallbacks for X86.

This is a preparatory step to wrap XEN MSI alloc/free into a irq domain
which in turn allows to store the irq domain pointer in struct device and
to use the irq domain functions directly.

Signed-off-by: Thomas Gleixner 
---
 arch/x86/pci/xen.c |   23 ++-
 1 file changed, 18 insertions(+), 5 deletions(-)

--- a/arch/x86/pci/xen.c
+++ b/arch/x86/pci/xen.c
@@ -376,20 +376,31 @@ static void xen_initdom_restore_msi_irqs
 static void xen_teardown_msi_irqs(struct pci_dev *dev)
 {
struct msi_desc *msidesc;
+   int i;
+
+   for_each_pci_msi_entry(msidesc, dev) {
+   if (msidesc->irq) {
+   for (i = 0; i < msidesc->nvec_used; i++)
+   xen_destroy_irq(msidesc->irq + i);
+   }
+   }
+}
+
+static void xen_pv_teardown_msi_irqs(struct pci_dev *dev)
+{
+   struct msi_desc *msidesc = first_pci_msi_entry(dev);
 
-   msidesc = first_pci_msi_entry(dev);
if (msidesc->msi_attrib.is_msix)
xen_pci_frontend_disable_msix(dev);
else
xen_pci_frontend_disable_msi(dev);
 
-   /* Free the IRQ's and the msidesc using the generic code. */
-   default_teardown_msi_irqs(dev);
+   xen_teardown_msi_irqs(dev);
 }
 
 static void xen_teardown_msi_irq(unsigned int irq)
 {
-   xen_destroy_irq(irq);
+   WARN_ON_ONCE(1);
 }
 
 #endif
@@ -412,7 +423,7 @@ int __init pci_xen_init(void)
 #ifdef CONFIG_PCI_MSI
x86_msi.setup_msi_irqs = xen_setup_msi_irqs;
x86_msi.teardown_msi_irq = xen_teardown_msi_irq;
-   x86_msi.teardown_msi_irqs = xen_teardown_msi_irqs;
+   x86_msi.teardown_msi_irqs = xen_pv_teardown_msi_irqs;
pci_msi_ignore_mask = 1;
 #endif
return 0;
@@ -436,6 +447,7 @@ static void __init xen_hvm_msi_init(void
}
 
x86_msi.setup_msi_irqs = xen_hvm_setup_msi_irqs;
+   x86_msi.teardown_msi_irqs = xen_teardown_msi_irqs;
x86_msi.teardown_msi_irq = xen_teardown_msi_irq;
 }
 #endif
@@ -472,6 +484,7 @@ int __init pci_xen_initial_domain(void)
 #ifdef CONFIG_PCI_MSI
x86_msi.setup_msi_irqs = xen_initdom_setup_msi_irqs;
x86_msi.teardown_msi_irq = xen_teardown_msi_irq;
+   x86_msi.teardown_msi_irqs = xen_teardown_msi_irqs;
x86_msi.restore_msi_irqs = xen_initdom_restore_msi_irqs;
pci_msi_ignore_mask = 1;
 #endif

[patch RFC 35/38] platform-msi: Provide default irq_chip::ack

2020-08-20 Thread Thomas Gleixner

For the upcoming device MSI support it's required to have a default
irq_chip::ack implementation (irq_chip_ack_parent) so the drivers do not
need to care.

Signed-off-by: Thomas Gleixner 
Cc: Greg Kroah-Hartman 
---
 drivers/base/platform-msi.c |2 ++
 1 file changed, 2 insertions(+)

--- a/drivers/base/platform-msi.c
+++ b/drivers/base/platform-msi.c
@@ -95,6 +95,8 @@ static void platform_msi_update_chip_ops
chip->irq_mask = irq_chip_mask_parent;
if (!chip->irq_unmask)
chip->irq_unmask = irq_chip_unmask_parent;
+   if (!chip->irq_ack)
+   chip->irq_ack = irq_chip_ack_parent;
if (!chip->irq_eoi)
chip->irq_eoi = irq_chip_eoi_parent;
if (!chip->irq_set_affinity)

[patch RFC 33/38] x86/irq: Add DEV_MSI allocation type

2020-08-20 Thread Thomas Gleixner

For the upcoming device MSI support a new allocation type is
required.

Signed-off-by: Thomas Gleixner 
---
 arch/x86/include/asm/hw_irq.h |1 +
 1 file changed, 1 insertion(+)

--- a/arch/x86/include/asm/hw_irq.h
+++ b/arch/x86/include/asm/hw_irq.h
@@ -40,6 +40,7 @@ enum irq_alloc_type {
X86_IRQ_ALLOC_TYPE_PCI_MSIX,
X86_IRQ_ALLOC_TYPE_DMAR,
X86_IRQ_ALLOC_TYPE_UV,
+   X86_IRQ_ALLOC_TYPE_DEV_MSI,
X86_IRQ_ALLOC_TYPE_IOAPIC_GET_PARENT,
X86_IRQ_ALLOC_TYPE_HPET_GET_PARENT,
 };

[patch RFC 38/38] irqchip: Add IMS array driver - NOT FOR MERGING

2020-08-20 Thread Thomas Gleixner

A generic IMS irq chip and irq domain implementation for IMS based devices
which utilize a MSI message store array on chip.

Allows IMS devices with a MSI message store array to reuse this code for
different array sizes.

Allocation and freeing of interrupts happens via the generic
msi_domain_alloc/free_irqs() interface. No special purpose IMS magic
required as long as the interrupt domain is stored in the underlying device
struct.

Completely untested of course and mostly for illustration and educational
purpose. This should of course be a modular irq chip, but adding that
support is left as an exercise for the people who care about this deeply.

Signed-off-by: Thomas Gleixner 
Cc: Marc Zyngier 
Cc: Megha Dey 
Cc: Jason Gunthorpe 
Cc: Dave Jiang 
Cc: Alex Williamson 
Cc: Jacob Pan 
Cc: Baolu Lu 
Cc: Kevin Tian 
Cc: Dan Williams 
---
 drivers/irqchip/Kconfig |8 +
 drivers/irqchip/Makefile|1 
 drivers/irqchip/irq-ims-msi.c   |  169 
 include/linux/irqchip/irq-ims-msi.h |   41 
 4 files changed, 219 insertions(+)

--- a/drivers/irqchip/Kconfig
+++ b/drivers/irqchip/Kconfig
@@ -571,4 +571,12 @@ config LOONGSON_PCH_MSI
help
  Support for the Loongson PCH MSI Controller.
 
+config IMS_MSI
+   bool "IMS Interrupt Message Store MSI controller"
+   depends on PCI
+   select DEVICE_MSI
+   help
+ Support for IMS Interrupt Message Store MSI controller
+ with IMS slot storage in a slot array
+
 endmenu
--- a/drivers/irqchip/Makefile
+++ b/drivers/irqchip/Makefile
@@ -111,3 +111,4 @@ obj-$(CONFIG_LOONGSON_HTPIC)+= irq-loo
 obj-$(CONFIG_LOONGSON_HTVEC)   += irq-loongson-htvec.o
 obj-$(CONFIG_LOONGSON_PCH_PIC) += irq-loongson-pch-pic.o
 obj-$(CONFIG_LOONGSON_PCH_MSI) += irq-loongson-pch-msi.o
+obj-$(CONFIG_IMS_MSI)  += irq-ims-msi.o
--- /dev/null
+++ b/drivers/irqchip/irq-ims-msi.c
@@ -0,0 +1,169 @@
+// SPDX-License-Identifier: GPL-2.0
+// (C) Copyright 2020 Thomas Gleixner 
+/*
+ * Shared interrupt chip and irq domain for Intel IMS devices
+ */
+#include 
+#include 
+#include 
+#include 
+
+#include 
+
+struct ims_data {
+   struct ims_array_info   info;
+   unsigned long   map[0];
+};
+
+static void ims_mask_irq(struct irq_data *data)
+{
+   struct msi_desc *desc = irq_data_get_msi_desc(data);
+   struct ims_array_slot __iomem *slot = desc->device_msi.priv_iomem;
+   u32 __iomem *ctrl = &slot->ctrl;
+
+   iowrite32(ioread32(ctrl) & ~IMS_VECTOR_CTRL_UNMASK, ctrl);
+}
+
+static void ims_unmask_irq(struct irq_data *data)
+{
+   struct msi_desc *desc = irq_data_get_msi_desc(data);
+   struct ims_array_slot __iomem *slot = desc->device_msi.priv_iomem;
+   u32 __iomem *ctrl = &slot->ctrl;
+
+   iowrite32(ioread32(ctrl) | IMS_VECTOR_CTRL_UNMASK, ctrl);
+}
+
+static void ims_write_msi_msg(struct irq_data *data, struct msi_msg *msg)
+{
+   struct msi_desc *desc = irq_data_get_msi_desc(data);
+   struct ims_array_slot __iomem *slot = desc->device_msi.priv_iomem;
+
+   iowrite32(msg->address_lo, &slot->address_lo);
+   iowrite32(msg->address_hi, &slot->address_hi);
+   iowrite32(msg->data, &slot->data);
+}
+
+static const struct irq_chip ims_msi_controller = {
+   .name   = "IMS",
+   .irq_mask   = ims_mask_irq,
+   .irq_unmask = ims_unmask_irq,
+   .irq_write_msi_msg  = ims_write_msi_msg,
+   .irq_retrigger  = irq_chip_retrigger_hierarchy,
+   .flags  = IRQCHIP_SKIP_SET_WAKE,
+};
+
+static void ims_reset_slot(struct ims_array_slot __iomem *slot)
+{
+   iowrite32(0, &slot->address_lo);
+   iowrite32(0, &slot->address_hi);
+   iowrite32(0, &slot->data);
+   iowrite32(0, &slot->ctrl);
+}
+
+static void ims_free_msi_store(struct irq_domain *domain, struct device *dev)
+{
+   struct msi_domain_info *info = domain->host_data;
+   struct ims_data *ims = info->data;
+   struct msi_desc *entry;
+
+   for_each_msi_entry(entry, dev) {
+   if (entry->device_msi.priv_iomem) {
+   clear_bit(entry->device_msi.hwirq, ims->map);
+   ims_reset_slot(entry->device_msi.priv_iomem);
+   entry->device_msi.priv_iomem = NULL;
+   entry->device_msi.hwirq = 0;
+   }
+   }
+}
+
+static int ims_alloc_msi_store(struct irq_domain *domain, struct device *dev,
+  int nvec)
+{
+   struct msi_domain_info *info = domain->host_data;
+   struct ims_data *ims = info->data;
+   struct msi_desc *entry;
+
+   for_each_msi_entry(entry, dev) {
+   unsigned int idx;
+
+   idx = find_first_zero_bit(ims->map, ims->info.max_slots);
+   if (idx >= ims->info.max_slots)
+   goto fail;
+   set_

[patch RFC 22/38] x86/xen: Make xen_msi_init() static and rename it to xen_hvm_msi_init()

2020-08-20 Thread Thomas Gleixner

The only user is in the same file and the name is too generic because this
function is only ever used for HVM domains.

Signed-off-by: Thomas Gleixner 
Cc: Konrad Rzeszutek Wilk 
Cc: linux-...@vger.kernel.org
Cc: xen-devel@lists.xenproject.org
Cc: Juergen Gross 
Cc: Boris Ostrovsky 
Cc: Stefano Stabellini 

---
 arch/x86/pci/xen.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

--- a/arch/x86/pci/xen.c
+++ b/arch/x86/pci/xen.c
@@ -419,7 +419,7 @@ int __init pci_xen_init(void)
 }
 
 #ifdef CONFIG_PCI_MSI
-void __init xen_msi_init(void)
+static void __init xen_hvm_msi_init(void)
 {
if (!disable_apic) {
/*
@@ -459,7 +459,7 @@ int __init pci_xen_hvm_init(void)
 * We need to wait until after x2apic is initialized
 * before we can set MSI IRQ ops.
 */
-   x86_platform.apic_post_init = xen_msi_init;
+   x86_platform.apic_post_init = xen_hvm_msi_init;
 #endif
return 0;
 }

[patch RFC 36/38] platform-msi: Add device MSI infrastructure

2020-08-20 Thread Thomas Gleixner

Add device specific MSI domain infrastructure for devices which have their
own resource management and interrupt chip. These devices are not related
to PCI and contrary to platform MSI they do not share a common resource and
interrupt chip. They provide their own domain specific resource management
and interrupt chip.

This utilizes the new alloc/free override in a non evil way which avoids
having yet another set of specialized alloc/free functions. Just using
msi_domain_alloc/free_irqs() is sufficient

While initially it was suggested and tried to piggyback device MSI on
platform MSI, the better variant is to reimplement platform MSI on top of
device MSI.

Signed-off-by: Thomas Gleixner 
Cc: Greg Kroah-Hartman 
Cc: Marc Zyngier 
Cc: "Rafael J. Wysocki" 
---
 drivers/base/platform-msi.c |  129 
 include/linux/irqdomain.h   |1 
 include/linux/msi.h |   24 
 kernel/irq/Kconfig  |4 +
 4 files changed, 158 insertions(+)

--- a/drivers/base/platform-msi.c
+++ b/drivers/base/platform-msi.c
@@ -412,3 +412,132 @@ int platform_msi_domain_alloc(struct irq
 
return err;
 }
+
+#ifdef CONFIG_DEVICE_MSI
+/*
+ * Device specific MSI domain infrastructure for devices which have their
+ * own resource management and interrupt chip. These devices are not
+ * related to PCI and contrary to platform MSI they do not share a common
+ * resource and interrupt chip. They provide their own domain specific
+ * resource management and interrupt chip.
+ */
+
+static void device_msi_free_msi_entries(struct device *dev)
+{
+   struct list_head *msi_list = dev_to_msi_list(dev);
+   struct msi_desc *entry, *tmp;
+
+   list_for_each_entry_safe(entry, tmp, msi_list, list) {
+   list_del(&entry->list);
+   free_msi_entry(entry);
+   }
+}
+
+/**
+ * device_msi_free_irqs - Free MSI interrupts assigned to  a device
+ * @dev:   Pointer to the device
+ *
+ * Frees the interrupt and the MSI descriptors.
+ */
+static void device_msi_free_irqs(struct irq_domain *domain, struct device *dev)
+{
+   __msi_domain_free_irqs(domain, dev);
+   device_msi_free_msi_entries(dev);
+}
+
+/**
+ * device_msi_alloc_irqs - Allocate MSI interrupts for a device
+ * @dev:   Pointer to the device
+ * @nvec:  Number of vectors
+ *
+ * Allocates the required number of MSI descriptors and the corresponding
+ * interrupt descriptors.
+ */
+static int device_msi_alloc_irqs(struct irq_domain *domain, struct device 
*dev, int nvec)
+{
+   int i, ret = -ENOMEM;
+
+   for (i = 0; i < nvec; i++) {
+   struct msi_desc *entry = alloc_msi_entry(dev, 1, NULL);
+
+   if (!entry)
+   goto fail;
+   list_add_tail(&entry->list, dev_to_msi_list(dev));
+   }
+
+   ret = __msi_domain_alloc_irqs(domain, dev, nvec);
+   if (!ret)
+   return 0;
+fail:
+   device_msi_free_msi_entries(dev);
+   return ret;
+}
+
+static void device_msi_update_dom_ops(struct msi_domain_info *info)
+{
+   if (!info->ops->domain_alloc_irqs)
+   info->ops->domain_alloc_irqs = device_msi_alloc_irqs;
+   if (!info->ops->domain_free_irqs)
+   info->ops->domain_free_irqs = device_msi_free_irqs;
+   if (!info->ops->msi_prepare)
+   info->ops->msi_prepare = arch_msi_prepare;
+}
+
+/**
+ * device_msi_create_msi_irq_domain - Create an irq domain for devices
+ * @fwnode:Firmware node of the interrupt controller
+ * @info:  MSI domain info to configure the new domain
+ * @parent:Parent domain
+ */
+struct irq_domain *device_msi_create_irq_domain(struct fwnode_handle *fn,
+   struct msi_domain_info *info,
+   struct irq_domain *parent)
+{
+   struct irq_domain *domain;
+
+   if (info->flags & MSI_FLAG_USE_DEF_CHIP_OPS)
+   platform_msi_update_chip_ops(info);
+
+   if (info->flags & MSI_FLAG_USE_DEF_DOM_OPS)
+   device_msi_update_dom_ops(info);
+
+   domain = msi_create_irq_domain(fn, info, parent);
+   if (domain)
+   irq_domain_update_bus_token(domain, DOMAIN_BUS_DEVICE_MSI);
+   return domain;
+}
+
+#ifdef CONFIG_PCI
+#include 
+
+/**
+ * pci_subdevice_msi_create_irq_domain - Create an irq domain for subdevices
+ * @pdev:  Pointer to PCI device for which the subdevice domain is created
+ * @info:  MSI domain info to configure the new domain
+ */
+struct irq_domain *pci_subdevice_msi_create_irq_domain(struct pci_dev *pdev,
+  struct msi_domain_info 
*info)
+{
+   struct irq_domain *domain, *pdev_msi;
+   struct fwnode_handle *fn;
+
+   /*
+* Retrieve the parent domain of the underlying PCI device's MSI
+* domain. This is going to be the parent of the new subdevice
+* domain as well.
+

[patch RFC 17/38] x86/pci: Reducde #ifdeffery in PCI init code

2020-08-20 Thread Thomas Gleixner

Adding a function call before the first #ifdef in arch_pci_init() triggers
a 'mixed declarations and code' warning if PCI_DIRECT is enabled.

Use stub functions and move the #ifdeffery to the header file where it is
not in the way.

Signed-off-by: Thomas Gleixner 
Cc: linux-...@vger.kernel.org
---
 arch/x86/include/asm/pci_x86.h |   11 +++
 arch/x86/pci/init.c|   10 +++---
 2 files changed, 14 insertions(+), 7 deletions(-)

--- a/arch/x86/include/asm/pci_x86.h
+++ b/arch/x86/include/asm/pci_x86.h
@@ -114,9 +114,20 @@ extern const struct pci_raw_ops pci_dire
 extern bool port_cf9_safe;
 
 /* arch_initcall level */
+#ifdef CONFIG_PCI_DIRECT
 extern int pci_direct_probe(void);
 extern void pci_direct_init(int type);
+#else
+static inline int pci_direct_probe(void) { return -1; }
+static inline  void pci_direct_init(int type) { }
+#endif
+
+#ifdef CONFIG_PCI_BIOS
 extern void pci_pcbios_init(void);
+#else
+static inline void pci_pcbios_init(void) { }
+#endif
+
 extern void __init dmi_check_pciprobe(void);
 extern void __init dmi_check_skip_isa_align(void);
 
--- a/arch/x86/pci/init.c
+++ b/arch/x86/pci/init.c
@@ -8,11 +8,9 @@
in the right sequence from here. */
 static __init int pci_arch_init(void)
 {
-#ifdef CONFIG_PCI_DIRECT
-   int type = 0;
+   int type;
 
type = pci_direct_probe();
-#endif
 
if (!(pci_probe & PCI_PROBE_NOEARLY))
pci_mmcfg_early_init();
@@ -20,18 +18,16 @@ static __init int pci_arch_init(void)
if (x86_init.pci.arch_init && !x86_init.pci.arch_init())
return 0;
 
-#ifdef CONFIG_PCI_BIOS
pci_pcbios_init();
-#endif
+
/*
 * don't check for raw_pci_ops here because we want pcbios as last
 * fallback, yet it's needed to run first to set pcibios_last_bus
 * in case legacy PCI probing is used. otherwise detecting peer busses
 * fails.
 */
-#ifdef CONFIG_PCI_DIRECT
pci_direct_init(type);
-#endif
+
if (!raw_pci_ops && !raw_pci_ext_ops)
printk(KERN_ERR
"PCI: Fatal: No config space access function found\n");

[patch RFC 20/38] PCI: vmd: Mark VMD irqdomain with DOMAIN_BUS_VMD_MSI

2020-08-20 Thread Thomas Gleixner

Devices on the VMD bus use their own MSI irq domain, but it is not
distinguishable from regular PCI/MSI irq domains. This is required
to exclude VMD devices from getting the irq domain pointer set by
interrupt remapping.

Override the default bus token.

Signed-off-by: Thomas Gleixner 
Cc: Bjorn Helgaas 
Cc: Lorenzo Pieralisi 
Cc: Jonathan Derrick 
Cc: linux-...@vger.kernel.org
---
 drivers/pci/controller/vmd.c |6 ++
 1 file changed, 6 insertions(+)

--- a/drivers/pci/controller/vmd.c
+++ b/drivers/pci/controller/vmd.c
@@ -579,6 +579,12 @@ static int vmd_enable_domain(struct vmd_
return -ENODEV;
}
 
+   /*
+* Override the irq domain bus token so the domain can be distinguished
+* from a regular PCI/MSI domain.
+*/
+   irq_domain_update_bus_token(vmd->irq_domain, DOMAIN_BUS_VMD_MSI);
+
pci_add_resource(&resources, &vmd->resources[0]);
pci_add_resource_offset(&resources, &vmd->resources[1], offset[0]);
pci_add_resource_offset(&resources, &vmd->resources[2], offset[1]);

[patch RFC 32/38] x86/irq: Make most MSI ops XEN private

2020-08-20 Thread Thomas Gleixner

Nothing except XEN uses the setup/teardown ops. Hide them there.

Signed-off-by: Thomas Gleixner 
Cc: xen-devel@lists.xenproject.org
Cc: linux-...@vger.kernel.org
---
 arch/x86/include/asm/x86_init.h |2 --
 arch/x86/pci/xen.c  |   23 +++
 2 files changed, 15 insertions(+), 10 deletions(-)

--- a/arch/x86/include/asm/x86_init.h
+++ b/arch/x86/include/asm/x86_init.h
@@ -276,8 +276,6 @@ struct x86_platform_ops {
 struct pci_dev;
 
 struct x86_msi_ops {
-   int (*setup_msi_irqs)(struct pci_dev *dev, int nvec, int type);
-   void (*teardown_msi_irqs)(struct pci_dev *dev);
void (*restore_msi_irqs)(struct pci_dev *dev);
 };
 
--- a/arch/x86/pci/xen.c
+++ b/arch/x86/pci/xen.c
@@ -156,6 +156,13 @@ static int acpi_register_gsi_xen(struct
 struct xen_pci_frontend_ops *xen_pci_frontend;
 EXPORT_SYMBOL_GPL(xen_pci_frontend);
 
+struct xen_msi_ops {
+   int (*setup_msi_irqs)(struct pci_dev *dev, int nvec, int type);
+   void (*teardown_msi_irqs)(struct pci_dev *dev);
+};
+
+static struct xen_msi_ops xen_msi_ops __ro_after_init;
+
 static int xen_setup_msi_irqs(struct pci_dev *dev, int nvec, int type)
 {
int irq, ret, i;
@@ -414,7 +421,7 @@ static int xen_msi_domain_alloc_irqs(str
else
type = PCI_CAP_ID_MSI;
 
-   return x86_msi.setup_msi_irqs(to_pci_dev(dev), nvec, type);
+   return xen_msi_ops.setup_msi_irqs(to_pci_dev(dev), nvec, type);
 }
 
 static void xen_msi_domain_free_irqs(struct irq_domain *domain,
@@ -423,7 +430,7 @@ static void xen_msi_domain_free_irqs(str
if (WARN_ON_ONCE(!dev_is_pci(dev)))
return;
 
-   x86_msi.teardown_msi_irqs(to_pci_dev(dev));
+   xen_msi_ops.teardown_msi_irqs(to_pci_dev(dev));
 }
 
 static struct msi_domain_ops xen_pci_msi_domain_ops = {
@@ -461,17 +468,17 @@ static __init struct irq_domain *xen_cre
 static __init void xen_setup_pci_msi(void)
 {
if (xen_initial_domain()) {
-   x86_msi.setup_msi_irqs = xen_initdom_setup_msi_irqs;
-   x86_msi.teardown_msi_irqs = xen_teardown_msi_irqs;
+   xen_msi_ops.setup_msi_irqs = xen_initdom_setup_msi_irqs;
+   xen_msi_ops.teardown_msi_irqs = xen_teardown_msi_irqs;
x86_msi.restore_msi_irqs = xen_initdom_restore_msi_irqs;
pci_msi_ignore_mask = 1;
} else if (xen_pv_domain()) {
-   x86_msi.setup_msi_irqs = xen_setup_msi_irqs;
-   x86_msi.teardown_msi_irqs = xen_pv_teardown_msi_irqs;
+   xen_msi_ops.setup_msi_irqs = xen_setup_msi_irqs;
+   xen_msi_ops.teardown_msi_irqs = xen_pv_teardown_msi_irqs;
pci_msi_ignore_mask = 1;
} else if (xen_hvm_domain()) {
-   x86_msi.setup_msi_irqs = xen_hvm_setup_msi_irqs;
-   x86_msi.teardown_msi_irqs = xen_teardown_msi_irqs;
+   xen_msi_ops.setup_msi_irqs = xen_hvm_setup_msi_irqs;
+   xen_msi_ops.teardown_msi_irqs = xen_teardown_msi_irqs;
} else {
WARN_ON_ONCE(1);
return;

[patch RFC 26/38] x86/xen: Wrap XEN MSI management into irqdomain

2020-08-20 Thread Thomas Gleixner

To allow utilizing the irq domain pointer in struct device it is necessary
to make XEN/MSI irq domain compatible.

While the right solution would be to truly convert XEN to irq domains, this
is an exercise which is not possible for mere mortals with limited XENology.

Provide a plain irqdomain wrapper around XEN. While this is blatant
violation of the irqdomain design, it's the only solution for a XEN igorant
person to make progress on the issue which triggered this change.

Signed-off-by: Thomas Gleixner 
Cc: linux-...@vger.kernel.org
Cc: xen-devel@lists.xenproject.org
---
Note: This is completely untested, but it compiles so it must be perfect.
---
 arch/x86/pci/xen.c |   63 +
 1 file changed, 63 insertions(+)

--- a/arch/x86/pci/xen.c
+++ b/arch/x86/pci/xen.c
@@ -406,6 +406,63 @@ static void xen_teardown_msi_irq(unsigne
WARN_ON_ONCE(1);
 }
 
+static int xen_msi_domain_alloc_irqs(struct irq_domain *domain,
+struct device *dev,  int nvec)
+{
+   int type;
+
+   if (WARN_ON_ONCE(!dev_is_pci(dev)))
+   return -EINVAL;
+
+   if (first_msi_entry(dev)->msi_attrib.is_msix)
+   type = PCI_CAP_ID_MSIX;
+   else
+   type = PCI_CAP_ID_MSI;
+
+   return x86_msi.setup_msi_irqs(to_pci_dev(dev), nvec, type);
+}
+
+static void xen_msi_domain_free_irqs(struct irq_domain *domain,
+struct device *dev)
+{
+   if (WARN_ON_ONCE(!dev_is_pci(dev)))
+   return;
+
+   x86_msi.teardown_msi_irqs(to_pci_dev(dev));
+}
+
+static struct msi_domain_ops xen_pci_msi_domain_ops = {
+   .domain_alloc_irqs  = xen_msi_domain_alloc_irqs,
+   .domain_free_irqs   = xen_msi_domain_free_irqs,
+};
+
+static struct msi_domain_info xen_pci_msi_domain_info = {
+   .ops= &xen_pci_msi_domain_ops,
+};
+
+/*
+ * This irq domain is a blatant violation of the irq domain design, but
+ * distangling XEN into real irq domains is not a job for mere mortals with
+ * limited XENology. But it's the least dangerous way for a mere mortal to
+ * get rid of the arch_*_msi_irqs() hackery in order to store the irq
+ * domain pointer in struct device. This irq domain wrappery allows to do
+ * that without breaking XEN terminally.
+ */
+static __init struct irq_domain *xen_create_pci_msi_domain(void)
+{
+   struct irq_domain *d = NULL;
+   struct fwnode_handle *fn;
+
+   fn = irq_domain_alloc_named_fwnode("XEN-MSI");
+   if (fn)
+   d = msi_create_irq_domain(fn, &xen_pci_msi_domain_info, NULL);
+
+   /* FIXME: No idea how to survive if this fails */
+   BUG_ON(!d);
+
+   return d;
+}
+
 static __init void xen_setup_pci_msi(void)
 {
if (xen_initial_domain()) {
@@ -426,6 +483,12 @@ static __init void xen_setup_pci_msi(voi
}
 
x86_msi.teardown_msi_irq = xen_teardown_msi_irq;
+
+   /*
+* Override the PCI/MSI irq domain init function. No point
+* in allocating the native domain and never use it.
+*/
+   x86_init.irqs.create_pci_msi_domain = xen_create_pci_msi_domain;
 }
 
 #else /* CONFIG_PCI_MSI */

[patch RFC 34/38] x86/msi: Let pci_msi_prepare() handle non-PCI MSI

2020-08-20 Thread Thomas Gleixner

Rename it to x86_msi_prepare() and handle the allocation type setup
depending on the device type.

Add a new arch_msi_prepare define which will be utilized by the upcoming
device MSI support. Define it to NULL if not provided by an architecture in
the generic MSI header.

One arch specific function for MSI support is truly enough.

Signed-off-by: Thomas Gleixner 
Cc: linux-...@vger.kernel.org
Cc: linux-hyp...@vger.kernel.org
---
 arch/x86/include/asm/msi.h  |4 +++-
 arch/x86/kernel/apic/msi.c  |   27 ---
 drivers/pci/controller/pci-hyperv.c |2 +-
 include/linux/msi.h |4 
 4 files changed, 28 insertions(+), 9 deletions(-)

--- a/arch/x86/include/asm/msi.h
+++ b/arch/x86/include/asm/msi.h
@@ -6,7 +6,9 @@
 
 typedef struct irq_alloc_info msi_alloc_info_t;
 
-int pci_msi_prepare(struct irq_domain *domain, struct device *dev, int nvec,
+int x86_msi_prepare(struct irq_domain *domain, struct device *dev, int nvec,
msi_alloc_info_t *arg);
 
+#define arch_msi_prepare   x86_msi_prepare
+
 #endif /* _ASM_X86_MSI_H */
--- a/arch/x86/kernel/apic/msi.c
+++ b/arch/x86/kernel/apic/msi.c
@@ -182,26 +182,39 @@ static struct irq_chip pci_msi_controlle
.flags  = IRQCHIP_SKIP_SET_WAKE,
 };
 
-int pci_msi_prepare(struct irq_domain *domain, struct device *dev, int nvec,
-   msi_alloc_info_t *arg)
+static void pci_msi_prepare(struct device *dev, msi_alloc_info_t *arg)
 {
-   struct pci_dev *pdev = to_pci_dev(dev);
-   struct msi_desc *desc = first_pci_msi_entry(pdev);
+   struct msi_desc *desc = first_msi_entry(dev);
 
-   init_irq_alloc_info(arg, NULL);
if (desc->msi_attrib.is_msix) {
arg->type = X86_IRQ_ALLOC_TYPE_PCI_MSIX;
} else {
arg->type = X86_IRQ_ALLOC_TYPE_PCI_MSI;
arg->flags |= X86_IRQ_ALLOC_CONTIGUOUS_VECTORS;
}
+}
+
+static void dev_msi_prepare(struct device *dev, msi_alloc_info_t *arg)
+{
+   arg->type = X86_IRQ_ALLOC_TYPE_DEV_MSI;
+}
+
+int x86_msi_prepare(struct irq_domain *domain, struct device *dev, int nvec,
+   msi_alloc_info_t *arg)
+{
+   init_irq_alloc_info(arg, NULL);
+
+   if (dev_is_pci(dev))
+   pci_msi_prepare(dev, arg);
+   else
+   dev_msi_prepare(dev, arg);
 
return 0;
 }
-EXPORT_SYMBOL_GPL(pci_msi_prepare);
+EXPORT_SYMBOL_GPL(x86_msi_prepare);
 
 static struct msi_domain_ops pci_msi_domain_ops = {
-   .msi_prepare= pci_msi_prepare,
+   .msi_prepare= x86_msi_prepare,
 };
 
 static struct msi_domain_info pci_msi_domain_info = {
--- a/drivers/pci/controller/pci-hyperv.c
+++ b/drivers/pci/controller/pci-hyperv.c
@@ -1532,7 +1532,7 @@ static struct irq_chip hv_msi_irq_chip =
 };
 
 static struct msi_domain_ops hv_msi_ops = {
-   .msi_prepare= pci_msi_prepare,
+   .msi_prepare= arch_msi_prepare,
.msi_free   = hv_msi_free,
 };
 
--- a/include/linux/msi.h
+++ b/include/linux/msi.h
@@ -430,4 +430,8 @@ static inline struct irq_domain *pci_msi
 }
 #endif /* CONFIG_PCI_MSI_IRQ_DOMAIN */
 
+#ifndef arch_msi_prepare
+# define arch_msi_prepare  NULL
+#endif
+
 #endif /* LINUX_MSI_H */

[patch RFC 29/38] x86/pci: Set default irq domain in pcibios_add_device()

2020-08-20 Thread Thomas Gleixner

Now that interrupt remapping sets the irqdomain pointer when a PCI device
is added it's possible to store the default irq domain in the device struct
in pcibios_add_device().

If the bus to which a device is connected has an irq domain associated then
this domain is used otherwise the default domain (PCI/MSI native or XEN
PCI/MSI) is used. Using the bus domain ensures that special MSI bus domains
like VMD work.

This makes XEN and the non-remapped native case work solely based on the
irq domain pointer in struct device for PCI/MSI and allows to remove the
arch fallback and make most of the x86_msi ops private to XEN in the next
steps.

Signed-off-by: Thomas Gleixner 
Cc: linux-...@vger.kernel.org
---
 arch/x86/include/asm/irqdomain.h |2 ++
 arch/x86/kernel/apic/msi.c   |2 +-
 arch/x86/pci/common.c|   18 +-
 3 files changed, 20 insertions(+), 2 deletions(-)

--- a/arch/x86/include/asm/irqdomain.h
+++ b/arch/x86/include/asm/irqdomain.h
@@ -53,9 +53,11 @@ extern int mp_irqdomain_ioapic_idx(struc
 #ifdef CONFIG_PCI_MSI
 void x86_create_pci_msi_domain(void);
 struct irq_domain *native_create_pci_msi_domain(void);
+extern struct irq_domain *x86_pci_msi_default_domain;
 #else
 static inline void x86_create_pci_msi_domain(void) { }
 #define native_create_pci_msi_domain   NULL
+#define x86_pci_msi_default_domain NULL
 #endif
 
 #endif
--- a/arch/x86/kernel/apic/msi.c
+++ b/arch/x86/kernel/apic/msi.c
@@ -21,7 +21,7 @@
 #include 
 #include 
 
-static struct irq_domain *x86_pci_msi_default_domain __ro_after_init;
+struct irq_domain *x86_pci_msi_default_domain __ro_after_init;
 
 static void __irq_msi_compose_msg(struct irq_cfg *cfg, struct msi_msg *msg)
 {
--- a/arch/x86/pci/common.c
+++ b/arch/x86/pci/common.c
@@ -19,6 +19,7 @@
 #include 
 #include 
 #include 
+#include 
 
 unsigned int pci_probe = PCI_PROBE_BIOS | PCI_PROBE_CONF1 | PCI_PROBE_CONF2 |
PCI_PROBE_MMCONF;
@@ -633,8 +634,9 @@ static void set_dev_domain_options(struc
 
 int pcibios_add_device(struct pci_dev *dev)
 {
-   struct setup_data *data;
struct pci_setup_rom *rom;
+   struct irq_domain *msidom;
+   struct setup_data *data;
u64 pa_data;
 
pa_data = boot_params.hdr.setup_data;
@@ -661,6 +663,20 @@ int pcibios_add_device(struct pci_dev *d
memunmap(data);
}
set_dev_domain_options(dev);
+
+   /*
+* Setup the initial MSI domain of the device. If the underlying
+* bus has a PCI/MSI irqdomain associated use the bus domain,
+* otherwise set the default domain. This ensures that special irq
+* domains e.g. VMD are preserved. The default ensures initial
+* operation if irq remapping is not active. If irq remapping is
+* active it will overwrite the domain pointer when the device is
+* associated to a remapping domain.
+*/
+   msidom = dev_get_msi_domain(&dev->bus->dev);
+   if (!msidom)
+   msidom = x86_pci_msi_default_domain;
+   dev_set_msi_domain(&dev->dev, msidom);
return 0;
 }

[patch RFC 37/38] irqdomain/msi: Provide msi_alloc/free_store() callbacks

2020-08-20 Thread Thomas Gleixner

For devices which don't have a standard storage for MSI messages like the
upcoming IMS (Interrupt Message Storm) it's required to allocate storage
space before allocating interrupts and after freeing them.

This could be achieved with the existing callbacks, but that would be
awkward because they operate on msi_alloc_info_t which is not uniform
accross architectures. Also these callbacks are invoked per interrupt but
the allocation might have bulk requirements depending on the device.

As such devices can operate on different architectures it is simpler to
have seperate callbacks which operate on struct device. The resulting
storage information has to be stored in struct msi_desc so the underlying
irq chip implementation can retrieve it for the relevant operations.

Signed-off-by: Thomas Gleixner 
Cc: Marc Zyngier 
---
 include/linux/msi.h |8 
 kernel/irq/msi.c|   11 +++
 2 files changed, 19 insertions(+)

--- a/include/linux/msi.h
+++ b/include/linux/msi.h
@@ -279,6 +279,10 @@ struct msi_domain_info;
  * function.
  * @domain_free_irqs:  Optional function to override the default free
  * function.
+ * @msi_alloc_store:   Optional callback to allocate storage in a device
+ * specific non-standard MSI store
+ * @msi_alloc_free:Optional callback to free storage in a device
+ * specific non-standard MSI store
  *
  * @get_hwirq, @msi_init and @msi_free are callbacks used by
  * msi_create_irq_domain() and related interfaces
@@ -328,6 +332,10 @@ struct msi_domain_ops {
 struct device *dev, int nvec);
void(*domain_free_irqs)(struct irq_domain *domain,
struct device *dev);
+   int (*msi_alloc_store)(struct irq_domain *domain,
+  struct device *dev, int nvec);
+   void(*msi_free_store)(struct irq_domain *domain,
+   struct device *dev);
 };
 
 /**
--- a/kernel/irq/msi.c
+++ b/kernel/irq/msi.c
@@ -410,6 +410,12 @@ int __msi_domain_alloc_irqs(struct irq_d
if (ret)
return ret;
 
+   if (ops->msi_alloc_store) {
+   ret = ops->msi_alloc_store(domain, dev, nvec);
+   if (ret)
+   return ret;
+   }
+
for_each_msi_entry(desc, dev) {
ops->set_desc(&arg, desc);
 
@@ -509,6 +515,8 @@ int msi_domain_alloc_irqs(struct irq_dom
 
 void __msi_domain_free_irqs(struct irq_domain *domain, struct device *dev)
 {
+   struct msi_domain_info *info = domain->host_data;
+   struct msi_domain_ops *ops = info->ops;
struct msi_desc *desc;
 
for_each_msi_entry(desc, dev) {
@@ -522,6 +530,9 @@ void __msi_domain_free_irqs(struct irq_d
desc->irq = 0;
}
}
+
+   if (ops->msi_free_store)
+   ops->msi_free_store(domain, dev);
 }
 
 /**

[patch RFC 19/38] irqdomain/msi: Provide DOMAIN_BUS_VMD_MSI

2020-08-20 Thread Thomas Gleixner

PCI devices behind a VMD bus are not subject to interrupt remapping, but
the irq domain for VMD MSI cannot be distinguished from a regular PCI/MSI
irq domain.

Add a new domain bus token and allow it in the bus token check in
msi_check_reservation_mode() to keep the functionality the same once VMD
uses this token.

Signed-off-by: Thomas Gleixner 
Cc: Jon Derrick 
---
 include/linux/irqdomain.h |1 +
 kernel/irq/msi.c  |7 ++-
 2 files changed, 7 insertions(+), 1 deletion(-)

--- a/include/linux/irqdomain.h
+++ b/include/linux/irqdomain.h
@@ -84,6 +84,7 @@ enum irq_domain_bus_token {
DOMAIN_BUS_FSL_MC_MSI,
DOMAIN_BUS_TI_SCI_INTA_MSI,
DOMAIN_BUS_WAKEUP,
+   DOMAIN_BUS_VMD_MSI,
 };
 
 /**
--- a/kernel/irq/msi.c
+++ b/kernel/irq/msi.c
@@ -370,8 +370,13 @@ static bool msi_check_reservation_mode(s
 {
struct msi_desc *desc;
 
-   if (domain->bus_token != DOMAIN_BUS_PCI_MSI)
+   switch(domain->bus_token) {
+   case DOMAIN_BUS_PCI_MSI:
+   case DOMAIN_BUS_VMD_MSI:
+   break;
+   default:
return false;
+   }
 
if (!(info->flags & MSI_FLAG_MUST_REACTIVATE))
return false;

[patch RFC 27/38] iommm/vt-d: Store irq domain in struct device

2020-08-20 Thread Thomas Gleixner

As a first step to make X86 utilize the direct MSI irq domain operations
store the irq domain pointer in the device struct when a device is probed.

This is done from dmar_pci_bus_add_dev() because it has to work even when
DMA remapping is disabled. It only overrides the irqdomain of devices which
are handled by a regular PCI/MSI irq domain which protects PCI devices
behind special busses like VMD which have their own irq domain.

No functional change. It just avoids the redirection through
arch_*_msi_irqs() and allows the PCI/MSI core to directly invoke the irq
domain alloc/free functions instead of having to look up the irq domain for
every single MSI interupt.

Signed-off-by: Thomas Gleixner 
Cc: Joerg Roedel 
Cc: io...@lists.linux-foundation.org
Cc: Lu Baolu 
---
 drivers/iommu/intel/dmar.c  |3 +++
 drivers/iommu/intel/irq_remapping.c |   16 
 include/linux/intel-iommu.h |5 +
 3 files changed, 24 insertions(+)

--- a/drivers/iommu/intel/dmar.c
+++ b/drivers/iommu/intel/dmar.c
@@ -316,6 +316,9 @@ static int dmar_pci_bus_add_dev(struct d
if (ret < 0 && dmar_dev_scope_status == 0)
dmar_dev_scope_status = ret;
 
+   if (ret >= 0)
+   intel_irq_remap_add_device(info);
+
return ret;
 }
 
--- a/drivers/iommu/intel/irq_remapping.c
+++ b/drivers/iommu/intel/irq_remapping.c
@@ -1086,6 +1086,22 @@ static int reenable_irq_remapping(int ei
return -1;
 }
 
+/*
+ * Store the MSI remapping domain pointer in the device if enabled.
+ *
+ * This is called from dmar_pci_bus_add_dev() so it works even when DMA
+ * remapping is disabled. Only update the pointer if the device is not
+ * already handled by a non default PCI/MSI interrupt domain. This protects
+ * e.g. VMD devices.
+ */
+void intel_irq_remap_add_device(struct dmar_pci_notify_info *info)
+{
+   if (!irq_remapping_enabled || pci_dev_has_special_msi_domain(info->dev))
+   return;
+
+   dev_set_msi_domain(&info->dev->dev, map_dev_to_ir(info->dev));
+}
+
 static void prepare_irte(struct irte *irte, int vector, unsigned int dest)
 {
memset(irte, 0, sizeof(*irte));
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -439,6 +439,11 @@ struct ir_table {
struct irte *base;
unsigned long *bitmap;
 };
+
+void intel_irq_remap_add_device(struct dmar_pci_notify_info *info);
+#else
+static inline void
+intel_irq_remap_add_device(struct dmar_pci_notify_info *info) { }
 #endif
 
 struct iommu_flush {

1 2 >

1 - 100 of 110 matches

Mail list logo