On 11/18/2015 10:18 AM, Joerg Roedel wrote:
Hello Laine,
On Thu, Nov 12, 2015 at 12:33:53PM -0500, Laine Stump wrote:
After a crash course in kernel building from Alex, I bisected down
to commit aafd8ba - a kernel built without this commit succeeds in
setting up all the devices mentioned, adding it causes failure (and
a very long delay during boot). Joerg, do you have any ideas for
debugging the problem further to see what in the commit causes this
problem? (note that 2 other people with the same chipset but
slightly different hardware plugged into it report no failure - see
the other replies to the parent of this message for more detail).
I'm happy to build a kernel with any suggested patches and report
results...
commit aafd8ba0ca74894b9397e412bbd7f8ea2662ead8
Author: Joerg Roedel <jroe...@suse.de>
Date: Thu May 28 18:41:39 2015 +0200
iommu/amd: Implement add_device and remove_device
Implement these two iommu-ops call-backs to make use of the
initialization and notifier features of the iommu core.
Signed-off-by: Joerg Roedel <jroe...@suse.de>
I have no idea yet how this patch causes your regression. You certainly
already posted it, but since I was not on Cc, can you please give me an
overview about the problem you are seeing with this patch?
Sure. Sorry it took so long to get back to you. (My to-do list keeps
getting longer instead of shorter, and I'm thrashing a bit).
Here's my original description, along with some questions from Alex
and my responses:
On 11/05/2015 02:05 PM, Laine Stump wrote:
On 11/04/2015 04:08 PM, Alex Williamson wrote:
On Wed, 2015-11-04 at 12:24 -0500, Laine Stump wrote:
Last week I upgraded my Fedora 22 AMD 990FX system from kernel
4.1.10 to
4.2.3 (standard Fedora builds) and multiple devices stopped working:
* 00:14.2 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] SBx00
Azalia (Intel HDA) (rev 40)
* 02:00.[01] Ethernet controller: Intel Corporation 82576 Gigabit
Network Connection
* 01:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Cedar
HDMI Audio [Radeon HD 5400/6300 Series]
(The 1st is integrated on the motherboard, the 2nd & 3rd are behind
an AMD RD890 pci-pci bridge. There may be other devices failing,
but these are the ones immediately obvious.)
Whatever is the source of the failure, it ends up that the drivers
for these devices aren't loaded.
At Alex Williamson's suggestion, I tried disabling IOMMU in the BIOS,
and magically all the devices resumed normal operation (except that
I can't do vfio device assignment because the IOMMU is disabled).
Reverting to kernel 4.1.10 very definitely eliminates the problem. I've
also tried kernel 4.2.5 and it has the same problem as 4.2.3 (these
three are the only pre-built kernels for F22). I can provide dmesg /
lspci output from each of these, or any other debug info anyone
might like me to gather.
I built a 4.2.3 kernel for my 990fx system and can't seem to
reproduce it. Does 'lspci -k' for those devices show any driver?
00:14.2 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] SBx00
Azalia (Intel HDA) (rev 40)
Subsystem: Gigabyte Technology Co., Ltd Device a132
Kernel modules: snd_hda_intel
01:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Cedar HDMI
Audio [Radeon HD 5400/6300 Series]
Subsystem: Gigabyte Technology Co., Ltd Device aa68
Kernel modules: snd_hda_intel
02:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network
Connection (rev 01)
Subsystem: Intel Corporation Gigabit ET Dual Port Server Adapter
Kernel driver in use: igb
Kernel modules: igb
02:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network
Connection (rev 01)
Subsystem: Intel Corporation Gigabit ET Dual Port Server Adapter
Kernel modules: igb
/sys/devices/pci0000:00/0000:00:04.0/0000:02:00.0 does show a link
from driver to ........drivers/igb, but .......:02::00.1 doesn't
have a link, and neither of them shows up in /sys/class/net.
Similarly for 01:00.[01] (which are behind the PCI to PCI bridge at
00:02.0), the .0 device does have a link to the radeon driver, but
the .1 device (which is the sound device on the radeon video card)
has no driver link.
And 00:14.2 (the motherboard integrated sound device) shows no driver
link in sysfs either.
Does 'lsmod'
show the drivers loaded, igb and snd_hda_intel? If not, does
manually modprobe'ing either of those drivers change anything?
Both of those drivers show up in lsmod output.
You haven't
installed a script that writes to driver_override or setup a
configuration where those devices are claimed by pci-stub and
forgotten about it, have you? (it's happened to me)
Not that I'm aware of. /etc/modules.d/local.conf had a few stray very
old items that I'd forgotten about, but I removed those and the
results are the same.
Otherwise, dmesg is probably a good place to start.
On 11/08/2015 11:52 AM, Laine Stump wrote:
Here is the dmesg
with IOMMU enabled in the BIOS (i.e. the devices *don't* work):
http://fpaste.org/296772/14490851/
and here is is when IOMMU has been *disabled* in the BIOS (the
devices *do* work):
http://fpaste.org/296774/44908550/
(I refreshed those links since they were almost a month old).
It was after getting the above dmesg's that I bisected kernel builds
down to aafd8ba. If it would help, I can provide dmesg from just
before/after that commit, with any sort of extra debugging you'd
like turned on, or if you have a patch you'd like tested (or just
something to add extra debugging) I'm happy to do that to. Since
this is my main test machine for vfio device assignment, I'm open to
do just about anything to help figure out the problem, but don't
really have the knowledge to figure it out myself. :-)
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu