Hi, On 30-09-16 17:33, Laszlo Ersek wrote: > On 09/30/16 16:59, Hans de Goede wrote: >> Hi, >> >> On 30-09-16 16:51, Laszlo Ersek wrote: >>> On 09/30/16 12:35, Hans de Goede wrote: >>> >>>> Attached are 2 patches against the xserver which should fix this, >>>> please give them a try. >>> >>> Sorry about the delay. >>> >>> The patches don't seem to fix the issue for me. Please see the Xorg log >>> attached. >>> >>> I tested the patches as follows. Given that my bisection had been done >>> in a Fedora 24 guest, using >>> >>> xorg-x11-server-1.18.4-4.fc24 >>> http://koji.fedoraproject.org/koji/buildinfo?buildID=794494 >>> >>> I now rebuilt the guest kernel exactly at the failing commit (a325725 >>> "drm: Lobotomize set_busid nonsense for !pci drivers"), and first >>> reproduced the issue with the above X server. >>> >>> Then, I ported your patches to "xorg-server-1.18.4" (using the upstream >>> xserver tree), and rebuilt the Fedora package with the backport. For the >>> backport, I had to cherry-pick the following two patches from master >>> first: >>> >>> 1 ca8d88e50310 xfree86: recognize primary BUS_PCI device in >>> xf86IsPrimaryPlatform() >>> 2 ea91db4b8331 config: fix GPUDevice fail when AutoAddGPU off + BusID >>> >>> This way your patches applied cleanly. (Cherry pick #1 above is actually >>> necessary for semantics, while cherry pick #2 is needed for a clean >>> context only, and has no impact for this test.) >>> >>> That is, in total, I added the following four patches to the Fedora 24 >>> package: >>> >>> 1 xfree86: recognize primary BUS_PCI device in xf86IsPrimaryPlatform() >>> 2 config: fix GPUDevice fail when AutoAddGPU off + BusID >>> 3 xfree86: Make adding unclaimed devices as GPU devices a separate step >>> 4 xfree86: Try harder to find atleast 1 non GPU Screen >>> >>> You can find the scratch build that I used for testing here: >>> >>> xorg-x11-server-1.18.4-4.hans_bz1366842_2.fc24 >>> http://koji.fedoraproject.org/koji/taskinfo?taskID=15875087 >>> >>> Another reason I used F24's X server as basis, rather than upstream >>> HEAD, is that Fedora 24 is pretty young, and it's already on kernel >>> 4.7.4, and I believe it will soon move to kernel 4.8, without >>> (necessarily) rebasing its X server package to upstream. IOW the kernel >>> upgrade to 4.8 will break X in Fedora 24 too, and then I expect the >>> Fedora X maintainers would have to cherry pick those two patches as >>> dependencies just the same. >>> >>> To summarize, the patches don't seem to help. I shall nonetheless thank >>> you for spending your Friday on this! >> >> Hmm, do you have a xorg.conf file lying around somewhere, the message >> about the xserver not being able to find an entry for screen 0 does >> not make sense ... > > Good catch, I actually had two files under "/etc/X11/xorg.conf.d/": > > * "00-keyboard.conf", from package "systemd-229-13.fc24.x86_64", with > contents > > ------------ > # Read and parsed by systemd-localed. It's probably wise not to edit > this file > # manually too freely. > Section "InputClass" > Identifier "system-keyboard" > MatchIsKeyboard "on" > Option "XkbLayout" "us" > EndSection > ------------ > > * "01-resolution.conf", which I had created, in order to set the > preferred display resolution: > > ------------ > Section "Screen" > Identifier "Default Screen" > Device "Default Device" > Monitor "Default Monitor" > EndSection > > Section "Device" > Identifier "Default Device" > Driver "modesetting" > EndSection > > Section "Monitor" > Identifier "Default Monitor" > Option "PreferredMode" "640x480" > # Option "PreferredMode" "1440x900" > EndSection > ------------ > > I removed these files now, and repeated the test. Again, the X server > wouldn't start, but I think the log file looks a bit different now. > Attached.
Ah, ok so it seems that my initial analysis is wrong, the problem is not a re-occuring of the device getting identified as a GPU screen, libdrm sorta depends on bus-ids and the lack of one is causing the server to misbehave. I guess that even with a xorg.conf things will fail with the troublesome kernel version (might be worth trying). Emil's analysis seems to be spot on. This does not seem easily fixable in userspace / does seem like a real regression as it even breaks things when specifying the device through xorg.conf (I or so I believe) which is something which uses to work ... I made the mistake of thinking the kernel change was re-triggering the old problem Laszlo fixed, but that does not seem to be the case. Regards, Hans