[Xen-devel] 3.18 xen-pcifront regression?

2015-03-24 Thread Michael D Labriola
I'm having problems booting a 3.18 or newer domU w/ PCI devices passed 
through.  It only seems to be the domU kernel that's upset (i.e., Behavior 
is identical whether I'm running 3.19 or 3.13 dom0).  I'm running a 32bit 
dom0 (3.13.11) w/ 64bit 4.4.0 hypervisor and 32bit domU.  I get the 
following Oops when trying to boot my domU with a couple PCI cards passed 
through:

BUG: unable to handle kernel paging request at 0030303e
IP: [] acpi_ns_validate_handle+0x12/0x1a
*pdpt = 019f1027 *pde =  
Oops:  [#1] PREEMPT SMP 
Modules linked in: xen_pcifront(+) pcspkr xen_blkfront loop
CPU: 0 PID: 18 Comm: xenwatch Not tainted 3.17.0-test+ #6
task: cb869950 ti: cb8ae000 task.ti: cb8ae000
EIP: 0061:[] EFLAGS: 00010246 CPU: 0
EIP is at acpi_ns_validate_handle+0x12/0x1a
EAX:  EBX: cb895dc0 ECX:  EDX: 0030303a
ESI: c0a6bccd EDI:  EBP: 0004 ESP: cb8afd00
 DS: 007b ES: 007b FS: 00d8 GS:  SS: 0069
CR0: 8005003b CR2: 0030303e CR3: 0a68e000 CR4: 00040660
Stack:
 c06eda4d  c096a21d   6462  c0c102c0
 0030303a 00040004 0030303a cb8afd94 cb8afdec cb8afd60 c06b78e1 cb8afd60
 0061 0246 c0407bc7 c0c102c0  cb8afda0 cb8afda8 cb8afdb0
Call Trace:
 [] ? acpi_evaluate_object+0x31/0x1fc
 [] ? resume_kernel+0x5/0x7
 [] ? pci_get_hp_params+0x111/0x4e0
 [] ? xen_force_evtchn_callback+0x17/0x30
 [] ? xen_restore_fl_direct_reloc+0x4/0x4
 [] ? pci_device_add+0x24/0x450
 [] ? pci_bus_read_config_word+0x6e/0x80
 [] ? pci_scan_single_device+0x8d/0xb0
 [] ? pci_scan_slot+0x3c/0xf0
 [] ? pci_scan_child_bus+0x1c/0x90
 [] ? pci_scan_bus_parented+0x6a/0x90
 [] ? pcifront_scan_root+0x91/0x130 [xen_pcifront]
 [] ? pcifront_backend_changed+0x4af/0x654 [xen_pcifront]
 [] ? xenbus_gather+0x5f/0x90
 [] ? xenbus_gather+0x5f/0x90
 [] ? xenbus_read_driver_state+0x33/0x50
 [] ? xenbus_otherend_changed+0x95/0xa0
 [] ? backend_changed+0xf/0x20
 [] ? xenwatch_thread+0x72/0x110
 [] ? bit_waitqueue+0x50/0x50
 [] ? join+0x70/0x70
 [] ? kthread+0xab/0xd0
 [] ? ret_from_kernel_thread+0x21/0x30
 [] ? flush_kthread_worker+0xa0/0xa0
Code: 03 10 00 00 eb 0e 46 83 c2 04 4b 85 db 75 b9 c6 02 00 31 c0 5b 5e 5f 
5d c3 89 c2 8d 40 ff 83 f8 fd 76 06 a1 2c 32 c1 c0 c3 31 c0 <80> 7a 04 0f 
0f 44 c2 c3 83 ec 10 83 f8 1d 76 24 89 44 24 0c c7
EIP: [] acpi_ns_validate_handle+0x12/0x1a SS:ESP 0069:cb8afd00
CR2: 0030303e
---[ end trace d4ddeb038cbcbdf7 ]---


I've bisected down to the following commit in 3.18, which breaks my 
system.

6cd33649fa83d97ba7b66f1d871a360e867c5220 is the first bad commit
commit 6cd33649fa83d97ba7b66f1d871a360e867c5220
Author: Bjorn Helgaas 
Date:   Wed Aug 27 14:29:47 2014 -0600

PCI: Add pci_configure_device() during enumeration
 
Some platforms can tell the OS how to configure PCI devices, e.g., how 
to
set cache line size, error reporting enables, etc.  ACPI defines _HPP 
and
_HPX methods for this purpose.
 
This configuration was previously done by some of the hotplug drivers 
using
pci_configure_slot().  But not all hotplug drivers did this, and per 
the
spec (ACPI rev 5.0, sec 6.2.7), we can also do it for "devices not
configured by the BIOS at system boot."
 
Move this configuration into the PCI core by adding 
pci_configure_device()
and calling it from pci_device_add(), so we do this for all devices as 
we
enumerate them.
 
This is based on pci_configure_slot(), which is used by hotplug 
drivers.
I omitted:
 
  - pcie_bus_configure_settings() because it configures MPS and MRRS, 
which
requires global knowledge of the fabric and must be done later, 
and
 
  - configuration of subordinate devices; that will happen when we 
call
pci_device_add() for those devices.
 
Because pci_configure_slot() was only done by hotplug drivers, this 
initial
version of pci_configure_device() only configures hot-added devices,
ignoring anything added during boot.
 
Signed-off-by: Bjorn Helgaas 
Acked-by: Yinghai Lu 

:04 04 4fadbe1e5f8f18daa6be7bdb7c9c1d6def0a2615 
9aef037aa35ca156ac46553f7fc4c5b1b3980c19 M  drivers


I've reverted that commit on top of 3.19, which feels incredibly wrong, 
but does fix the problem on my system.  This is a little over my head, 
though...  ;-)

Thoughts?

---
Michael D Labriola
Electric Boat
mlabr...@gdeb.com
401-848-8871 (desk)
401-848-8513 (lab)
401-316-9844 (cell)


 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] 3.18 xen-pcifront regression?

2015-03-24 Thread Michael D Labriola
Bjorn Helgaas  wrote on 03/24/2015 01:27:02 PM:

> From: Bjorn Helgaas 
> To: Konrad Rzeszutek Wilk , 
> Cc: Michael D Labriola , xen-
> de...@lists.xenproject.org, Stuart Wehrly , 
> michael.d.labri...@gmail.com, Jayson A Dyke , "Rafael 
> J. Wysocki" , linux-...@vger.kernel.org, linux-
> a...@vger.kernel.org
> Date: 03/24/2015 01:29 PM
> Subject: Re: [Xen-devel] 3.18 xen-pcifront regression?
> 
> [+cc Rafael, linux-pci, linux-acpi]
> 
> On Tue, Mar 24, 2015 at 11:28:06AM -0400, Konrad Rzeszutek Wilk wrote:
> > On Tue, Mar 24, 2015 at 11:14:29AM -0400, Michael D Labriola wrote:
> > > I'm having problems booting a 3.18 or newer domU w/ PCI devices 
passed 
> > > through.  It only seems to be the domU kernel that's upset (i.e., 
Behavior 
> > > is identical whether I'm running 3.19 or 3.13 dom0).  I'm running a 
32bit 
> > > dom0 (3.13.11) w/ 64bit 4.4.0 hypervisor and 32bit domU.  I get the 
> > > following Oops when trying to boot my domU with a couple PCI cards 
passed 
> > > through:
> > > 
> > > BUG: unable to handle kernel paging request at 0030303e
> > > IP: [] acpi_ns_validate_handle+0x12/0x1a
> > > *pdpt = 019f1027 *pde =  
> > > Oops:  [#1] PREEMPT SMP 
> > > Modules linked in: xen_pcifront(+) pcspkr xen_blkfront loop
> > > CPU: 0 PID: 18 Comm: xenwatch Not tainted 3.17.0-test+ #6
> > > task: cb869950 ti: cb8ae000 task.ti: cb8ae000
> > > EIP: 0061:[] EFLAGS: 00010246 CPU: 0
> > > EIP is at acpi_ns_validate_handle+0x12/0x1a
> > > EAX:  EBX: cb895dc0 ECX:  EDX: 0030303a
> > > ESI: c0a6bccd EDI:  EBP: 0004 ESP: cb8afd00
> > >  DS: 007b ES: 007b FS: 00d8 GS:  SS: 0069
> > > CR0: 8005003b CR2: 0030303e CR3: 0a68e000 CR4: 00040660
> > > Stack:
> > >  c06eda4d  c096a21d   6462  
c0c102c0
> > >  0030303a 00040004 0030303a cb8afd94 cb8afdec cb8afd60 c06b78e1 
cb8afd60
> > >  0061 0246 c0407bc7 c0c102c0  cb8afda0 cb8afda8 
cb8afdb0
> > > Call Trace:
> > >  [] ? acpi_evaluate_object+0x31/0x1fc
> > 
> > We should not be calling in any acpi code in PV domU guests.
> > 
> > WE actually disable it (acpi=0) to make sure we don't call it - as
> > there is no ACPI AML data at all in the guest.
> > 
> > CC-ing Bjorn.
> > >  [] ? resume_kernel+0x5/0x7
> > >  [] ? pci_get_hp_params+0x111/0x4e0
> > >  [] ? xen_force_evtchn_callback+0x17/0x30
> > >  [] ? xen_restore_fl_direct_reloc+0x4/0x4
> > >  [] ? pci_device_add+0x24/0x450
> > >  [] ? pci_bus_read_config_word+0x6e/0x80
> > >  [] ? pci_scan_single_device+0x8d/0xb0
> > >  [] ? pci_scan_slot+0x3c/0xf0
> > >  [] ? pci_scan_child_bus+0x1c/0x90
> > >  [] ? pci_scan_bus_parented+0x6a/0x90
> > >  [] ? pcifront_scan_root+0x91/0x130 [xen_pcifront]
> > >  [] ? pcifront_backend_changed+0x4af/0x654 [xen_pcifront]
> > >  [] ? xenbus_gather+0x5f/0x90
> > >  [] ? xenbus_gather+0x5f/0x90
> > >  [] ? xenbus_read_driver_state+0x33/0x50
> > >  [] ? xenbus_otherend_changed+0x95/0xa0
> > >  [] ? backend_changed+0xf/0x20
> > >  [] ? xenwatch_thread+0x72/0x110
> > >  [] ? bit_waitqueue+0x50/0x50
> > >  [] ? join+0x70/0x70
> > >  [] ? kthread+0xab/0xd0
> > >  [] ? ret_from_kernel_thread+0x21/0x30
> > >  [] ? flush_kthread_worker+0xa0/0xa0
> > > Code: 03 10 00 00 eb 0e 46 83 c2 04 4b 85 db 75 b9 c6 02 00 31 c0 5b 
5e 5f 
> > > 5d c3 89 c2 8d 40 ff 83 f8 fd 76 06 a1 2c 32 c1 c0 c3 31 c0 <80> 7a 
04 0f 
> > > 0f 44 c2 c3 83 ec 10 83 f8 1d 76 24 89 44 24 0c c7
> > > EIP: [] acpi_ns_validate_handle+0x12/0x1a SS:ESP 
0069:cb8afd00
> > > CR2: 0030303e
> > > ---[ end trace d4ddeb038cbcbdf7 ]---
> > > 
> > > 
> > > I've bisected down to the following commit in 3.18, which breaks my 
> > > system.
> > > 
> > > 6cd33649fa83d97ba7b66f1d871a360e867c5220 is the first bad commit
> > > commit 6cd33649fa83d97ba7b66f1d871a360e867c5220
> > > Author: Bjorn Helgaas 
> > > Date:   Wed Aug 27 14:29:47 2014 -0600
> > > 
> > > PCI: Add pci_configure_device() during enumeration
> > > 
> > > Some platforms can tell the OS how to configure PCI devices, 
e.g., how 
> > > to
> > > set cache line size, error reporting enables, etc.  ACPI defines 
_HPP 
> > > and
> > > _HPX methods for this purpose.
> > > 
> 

Re: [Xen-devel] 3.18 xen-pcifront regression?

2015-03-25 Thread Michael D Labriola
Konrad Rzeszutek Wilk  wrote on 03/25/2015 
04:27:00 PM:

> From: Konrad Rzeszutek Wilk 
> To: Bjorn Helgaas , 
> Cc: Michael D Labriola , xen-
> de...@lists.xenproject.org, Stuart Wehrly , 
> michael.d.labri...@gmail.com, Jayson A Dyke , "Rafael 
> J. Wysocki" , linux-...@vger.kernel.org, linux-
> a...@vger.kernel.org
> Date: 03/25/2015 04:27 PM
> Subject: Re: [Xen-devel] 3.18 xen-pcifront regression?
> 
> On Tue, Mar 24, 2015 at 12:27:02PM -0500, Bjorn Helgaas wrote:
> > [+cc Rafael, linux-pci, linux-acpi]
> > 
> > On Tue, Mar 24, 2015 at 11:28:06AM -0400, Konrad Rzeszutek Wilk wrote:
> > > On Tue, Mar 24, 2015 at 11:14:29AM -0400, Michael D Labriola wrote:
> > > > I'm having problems booting a 3.18 or newer domU w/ PCI devices 
passed 
> > > > through.  It only seems to be the domU kernel that's upset 
> (i.e., Behavior 
> > > > is identical whether I'm running 3.19 or 3.13 dom0).  I'm running 
a 32bit 
> > > > dom0 (3.13.11) w/ 64bit 4.4.0 hypervisor and 32bit domU.  I get 
the 
> > > > following Oops when trying to boot my domU with a couple PCI cards 
passed 
> > > > through:
> > > > 
> > > > BUG: unable to handle kernel paging request at 0030303e
> > > > IP: [] acpi_ns_validate_handle+0x12/0x1a
> > > > *pdpt = 019f1027 *pde =  
> > > > Oops:  [#1] PREEMPT SMP 
> > > > Modules linked in: xen_pcifront(+) pcspkr xen_blkfront loop
> > > > CPU: 0 PID: 18 Comm: xenwatch Not tainted 3.17.0-test+ #6
> > > > task: cb869950 ti: cb8ae000 task.ti: cb8ae000
> > > > EIP: 0061:[] EFLAGS: 00010246 CPU: 0
> > > > EIP is at acpi_ns_validate_handle+0x12/0x1a
> > > > EAX:  EBX: cb895dc0 ECX:  EDX: 0030303a
> > > > ESI: c0a6bccd EDI:  EBP: 0004 ESP: cb8afd00
> > > >  DS: 007b ES: 007b FS: 00d8 GS:  SS: 0069
> > > > CR0: 8005003b CR2: 0030303e CR3: 0a68e000 CR4: 00040660
> > > > Stack:
> > > >  c06eda4d  c096a21d   6462  
c0c102c0
> > > >  0030303a 00040004 0030303a cb8afd94 cb8afdec cb8afd60 c06b78e1 
cb8afd60
> > > >  0061 0246 c0407bc7 c0c102c0  cb8afda0 cb8afda8 
cb8afdb0
> > > > Call Trace:
> > > >  [] ? acpi_evaluate_object+0x31/0x1fc
> > > 
> > > We should not be calling in any acpi code in PV domU guests.
> > > 
> > > WE actually disable it (acpi=0) to make sure we don't call it - as
> > > there is no ACPI AML data at all in the guest.
> > > 
> > > CC-ing Bjorn.
> > > >  [] ? resume_kernel+0x5/0x7
> > > >  [] ? pci_get_hp_params+0x111/0x4e0
> > > >  [] ? xen_force_evtchn_callback+0x17/0x30
> > > >  [] ? xen_restore_fl_direct_reloc+0x4/0x4
> > > >  [] ? pci_device_add+0x24/0x450
> > > >  [] ? pci_bus_read_config_word+0x6e/0x80
> > > >  [] ? pci_scan_single_device+0x8d/0xb0
> > > >  [] ? pci_scan_slot+0x3c/0xf0
> > > >  [] ? pci_scan_child_bus+0x1c/0x90
> > > >  [] ? pci_scan_bus_parented+0x6a/0x90
> > > >  [] ? pcifront_scan_root+0x91/0x130 [xen_pcifront]
> > > >  [] ? pcifront_backend_changed+0x4af/0x654 
[xen_pcifront]
> > > >  [] ? xenbus_gather+0x5f/0x90
> > > >  [] ? xenbus_gather+0x5f/0x90
> > > >  [] ? xenbus_read_driver_state+0x33/0x50
> > > >  [] ? xenbus_otherend_changed+0x95/0xa0
> > > >  [] ? backend_changed+0xf/0x20
> > > >  [] ? xenwatch_thread+0x72/0x110
> > > >  [] ? bit_waitqueue+0x50/0x50
> > > >  [] ? join+0x70/0x70
> > > >  [] ? kthread+0xab/0xd0
> > > >  [] ? ret_from_kernel_thread+0x21/0x30
> > > >  [] ? flush_kthread_worker+0xa0/0xa0
> > > > Code: 03 10 00 00 eb 0e 46 83 c2 04 4b 85 db 75 b9 c6 02 00 31 
> c0 5b 5e 5f 
> > > > 5d c3 89 c2 8d 40 ff 83 f8 fd 76 06 a1 2c 32 c1 c0 c3 31 c0 <80>7a 
04 0f 
> > > > 0f 44 c2 c3 83 ec 10 83 f8 1d 76 24 89 44 24 0c c7
> > > > EIP: [] acpi_ns_validate_handle+0x12/0x1a SS:ESP 
0069:cb8afd00
> > > > CR2: 0030303e
> > > > ---[ end trace d4ddeb038cbcbdf7 ]---
> > > > 
> > > > 
> > > > I've bisected down to the following commit in 3.18, which breaks 
my 
> > > > system.
> > > > 
> > > > 6cd33649fa83d97ba7b66f1d871a360e867c5220 is the first bad commit
> > > > commit 6cd33649fa83d97ba7b66f1d871a360e867c5220
> > > > Author: Bjor