Please provide your dmesg output and xorg log. Alex
On Thu, May 21, 2020 at 3:03 PM Javad Karabi <karabija...@gmail.com> wrote: > > Alex, > yea, youre totally right i was overcomplicating it lol > so i was able to get the radeon to run super fast, by doing as you > suggested and blacklisting i915. > (had to use module_blacklist= though because modprobe.blacklist still > allows i915, if a dependency wants to load it) > but with one caveat: > using the amdgpu driver, there was some error saying something about > telling me that i need to add BusID to my device or something. > maybe amdgpu wasnt able to find the card or something, i dont > remember. so i used modesetting instead and it seemed to work. > i will try going back to amdgpu and seeing what that error message was. > i recall you saying that modesetting doesnt have some features that > amdgpu provides. > what are some examples of that? > is the direction that graphics drivers are going, to be simply used as > "modesetting" via xorg? > > On Wed, May 20, 2020 at 10:12 PM Alex Deucher <alexdeuc...@gmail.com> wrote: > > > > I think you are overcomplicating things. Just try and get X running > > on just the AMD GPU on bare metal. Introducing virtualization is just > > adding more uncertainty. If you can't configure X to not use the > > integrated GPU, just blacklist the i915 driver (append > > modprobe.blacklist=i915 to the kernel command line in grub) and X > > should come up on the dGPU. > > > > Alex > > > > On Wed, May 20, 2020 at 6:05 PM Javad Karabi <karabija...@gmail.com> wrote: > > > > > > Thanks Alex, > > > Here's my plan: > > > > > > since my laptop's os is pretty customized, e.g. compiling my own kernel, > > > building latest xorg, latest xorg-driver-amdgpu, etc etc, > > > im going to use the intel iommu and pass through my rx 5600 into a > > > virtual machine, which will be a 100% stock ubuntu installation. > > > then, inside that vm, i will continue to debug > > > > > > does that sound like it would make sense for testing? for example, with > > > that scenario, it adds the iommu into the mix, so who knows if that > > > causes performance issues. but i think its worth a shot, to see if a > > > stock kernel will handle it better > > > > > > also, quick question: > > > from what i understand, a thunderbolt 3 pci express connection should > > > handle 8 GT/s x4, however, along the chain of bridges to my device, i > > > notice that the bridge closest to the graphics card is at 2.5 GT/s x4, > > > and it also says "downgraded" (this is via the lspci output) > > > > > > now, when i boot into windows, it _also_ says 2.5 GT/s x4, and it runs > > > extremely well. no issues at all. > > > > > > so my question is: the fact that the bridge is at 2.5 GT/s x4, and not at > > > its theoretical "full link speed" of 8 GT/s x4, do you suppose that > > > _could_ be an issue? > > > i do not think so, because, like i said, in windows it also reports that > > > link speed. > > > i would assume that you would want the fastest link speed possible, > > > because i would assume that of _all_ tb3 pci express devices, a GPU would > > > be the #1 most demanding on the link > > > > > > just curious if you think 2.5 GT/s could be the bottleneck > > > > > > i will pass through the device into a ubuntu vm and let you know how it > > > goes. thanks > > > > > > > > > > > > On Tue, May 19, 2020 at 9:29 PM Alex Deucher <alexdeuc...@gmail.com> > > > wrote: > > >> > > >> On Tue, May 19, 2020 at 9:16 PM Javad Karabi <karabija...@gmail.com> > > >> wrote: > > >> > > > >> > thanks for the answers alex. > > >> > > > >> > so, i went ahead and got a displayport cable to see if that changes > > >> > anything. and now, when i run monitor only, and the monitor connected > > >> > to the card, it has no issues like before! so i am thinking that > > >> > somethings up with either the hdmi cable, or some hdmi related setting > > >> > in my system? who knows, but im just gonna roll with only using > > >> > displayport cables now. > > >> > the previous hdmi cable was actually pretty long, because i was > > >> > extending it with an hdmi extension cable, so maybe the signal was > > >> > really bad or something :/ > > >> > > > >> > but yea, i guess the only real issue now is maybe something simple > > >> > related to some sysfs entry about enabling some powermode, voltage, > > >> > clock frequency, or something, so that glxgears will give me more than > > >> > 300 fps. but atleast now i can use a single monitor configuration with > > >> > the monitor displayported up to the card. > > >> > > > >> > > >> The GPU dynamically adjusts the clocks and voltages based on load. No > > >> manual configuration is required. > > >> > > >> At this point, we probably need to see you xorg log and dmesg output > > >> to try and figure out exactly what is going on. I still suspect there > > >> is some interaction going on with both GPUs and the integrated GPU > > >> being the primary, so as I mentioned before, you should try and run X > > >> on just the amdgpu rather than trying to use both of them. > > >> > > >> Alex > > >> > > >> > > >> > also, one other thing i think you might be interested in, that was > > >> > happening before. > > >> > > > >> > so, previously, with laptop -tb3-> egpu-hdmi> monitor, there was a > > >> > funny thing happening which i never could figure out. > > >> > when i would look at the X logs, i would see that "modesetting" (for > > >> > the intel integrated graphics) was reporting that MonitorA was used > > >> > with "eDP-1", which is correct and what i expected. > > >> > when i scrolled further down, i then saw that "HDMI-A-1-2" was being > > >> > used for another MonitorB, which also is what i expected (albeit i > > >> > have no idea why its saying A-1-2) > > >> > but amdgpu was _also_ saying that DisplayPort-1-2 (a port on the > > >> > radeon card) was being used for MonitorA, which is the same Monitor > > >> > that the modesetting driver had claimed to be using with eDP-1! > > >> > > > >> > so the point is that amdgpu was "using" Monitor0 with DisplayPort-1-2, > > >> > although that is what modesetting was using for eDP-1. > > >> > > > >> > anyway, thats a little aside, i doubt it was related to the terrible > > >> > hdmi experience i was getting, since its about display port and stuff, > > >> > but i thought id let you know about that. > > >> > > > >> > if you think that is a possible issue, im more than happy to plug the > > >> > hdmi setup back in and create an issue on gitlab with the logs and > > >> > everything > > >> > > > >> > On Tue, May 19, 2020 at 4:42 PM Alex Deucher <alexdeuc...@gmail.com> > > >> > wrote: > > >> > > > > >> > > On Tue, May 19, 2020 at 5:22 PM Javad Karabi <karabija...@gmail.com> > > >> > > wrote: > > >> > > > > > >> > > > lol youre quick! > > >> > > > > > >> > > > "Windows has supported peer to peer DMA for years so it already > > >> > > > has a > > >> > > > numbers of optimizations that are only now becoming possible on > > >> > > > Linux" > > >> > > > > > >> > > > whoa, i figured linux would be ahead of windows when it comes to > > >> > > > things like that. but peer-to-peer dma is something that is only > > >> > > > recently possible on linux, but has been possible on windows? what > > >> > > > changed recently that allows for peer to peer dma in linux? > > >> > > > > > >> > > > > >> > > A few things that made this more complicated on Linux: > > >> > > 1. Linux uses IOMMUs more extensively than windows so you can't just > > >> > > pass around physical bus addresses. > > >> > > 2. Linux supports lots of strange architectures that have a lot of > > >> > > limitations with respect to peer to peer transactions > > >> > > > > >> > > It just took years to get all the necessary bits in place in Linux > > >> > > and > > >> > > make everyone happy. > > >> > > > > >> > > > also, in the context of a game running opengl on some gpu, is the > > >> > > > "peer-to-peer" dma transfer something like: the game draw's to some > > >> > > > memory it has allocated, then a DMA transfer gets that and moves it > > >> > > > into the graphics card output? > > >> > > > > >> > > Peer to peer DMA just lets devices access another devices local > > >> > > memory > > >> > > directly. So if you have a buffer in vram on one device, you can > > >> > > share that directly with another device rather than having to copy it > > >> > > to system memory first. For example, if you have two GPUs, you can > > >> > > have one of them copy it's content directly to a buffer in the other > > >> > > GPU's vram rather than having to go through system memory first. > > >> > > > > >> > > > > > >> > > > also, i know it can be super annoying trying to debug an issue like > > >> > > > this, with someone like me who has all types of differences from a > > >> > > > normal setup (e.g. using it via egpu, using a kernel with custom > > >> > > > configs and stuff) so as a token of my appreciation i donated 50$ > > >> > > > to > > >> > > > the red cross' corona virus outbreak charity thing, on behalf of > > >> > > > amd-gfx. > > >> > > > > >> > > Thanks, > > >> > > > > >> > > Alex > > >> > > > > >> > > > > > >> > > > On Tue, May 19, 2020 at 4:13 PM Alex Deucher > > >> > > > <alexdeuc...@gmail.com> wrote: > > >> > > > > > > >> > > > > On Tue, May 19, 2020 at 3:44 PM Javad Karabi > > >> > > > > <karabija...@gmail.com> wrote: > > >> > > > > > > > >> > > > > > just a couple more questions: > > >> > > > > > > > >> > > > > > - based on what you are aware of, the technical details such as > > >> > > > > > "shared buffers go through system memory", and all that, do > > >> > > > > > you see > > >> > > > > > any issues that might exist that i might be missing in my > > >> > > > > > setup? i > > >> > > > > > cant imagine this being the case because the card works great > > >> > > > > > in > > >> > > > > > windows, unless the windows driver does something different? > > >> > > > > > > > >> > > > > > > >> > > > > Windows has supported peer to peer DMA for years so it already > > >> > > > > has a > > >> > > > > numbers of optimizations that are only now becoming possible on > > >> > > > > Linux. > > >> > > > > > > >> > > > > > - as far as kernel config, is there anything in particular > > >> > > > > > which > > >> > > > > > _should_ or _should not_ be enabled/disabled? > > >> > > > > > > >> > > > > You'll need the GPU drivers for your devices and dma-buf support. > > >> > > > > > > >> > > > > > > > >> > > > > > - does the vendor matter? for instance, this is an xfx card. > > >> > > > > > when it > > >> > > > > > comes to different vendors, are there interface changes that > > >> > > > > > might > > >> > > > > > make one vendor work better for linux than another? i dont > > >> > > > > > really > > >> > > > > > understand the differences in vendors, but i imagine that the > > >> > > > > > vbios > > >> > > > > > differs between vendors, and as such, the linux compatibility > > >> > > > > > would > > >> > > > > > maybe change? > > >> > > > > > > >> > > > > board vendor shouldn't matter. > > >> > > > > > > >> > > > > > > > >> > > > > > - is the pcie bandwidth possible an issue? the pcie_bw file > > >> > > > > > changes > > >> > > > > > between values like this: > > >> > > > > > 18446683600662707640 18446744071581623085 128 > > >> > > > > > and sometimes i see this: > > >> > > > > > 4096 0 128 > > >> > > > > > as you can see, the second value seems significantly lower. is > > >> > > > > > that > > >> > > > > > possibly an issue? possibly due to aspm? > > >> > > > > > > >> > > > > pcie_bw is not implemented for navi yet so you are just seeing > > >> > > > > uninitialized data. This patch set should clear that up. > > >> > > > > https://patchwork.freedesktop.org/patch/366262/ > > >> > > > > > > >> > > > > Alex > > >> > > > > > > >> > > > > > > > >> > > > > > On Tue, May 19, 2020 at 2:20 PM Javad Karabi > > >> > > > > > <karabija...@gmail.com> wrote: > > >> > > > > > > > > >> > > > > > > im using Driver "amdgpu" in my xorg conf > > >> > > > > > > > > >> > > > > > > how does one verify which gpu is the primary? im assuming my > > >> > > > > > > intel > > >> > > > > > > card is the primary, since i have not done anything to > > >> > > > > > > change that. > > >> > > > > > > > > >> > > > > > > also, if all shared buffers have to go through system > > >> > > > > > > memory, then > > >> > > > > > > that means an eGPU amdgpu wont work very well in general > > >> > > > > > > right? > > >> > > > > > > because going through system memory for the egpu means going > > >> > > > > > > over the > > >> > > > > > > thunderbolt connection > > >> > > > > > > > > >> > > > > > > and what are the shared buffers youre referring to? for > > >> > > > > > > example, if an > > >> > > > > > > application is drawing to a buffer, is that an example of a > > >> > > > > > > shared > > >> > > > > > > buffer that has to go through system memory? if so, thats > > >> > > > > > > fine, right? > > >> > > > > > > because the application's memory is in system memory, so > > >> > > > > > > that copy > > >> > > > > > > wouldnt be an issue. > > >> > > > > > > > > >> > > > > > > in general, do you think the "copy buffer across system > > >> > > > > > > memory might > > >> > > > > > > be a hindrance for thunderbolt? im trying to figure out which > > >> > > > > > > directions to go to debug and im totally lost, so maybe i > > >> > > > > > > can do some > > >> > > > > > > testing that direction? > > >> > > > > > > > > >> > > > > > > and for what its worth, when i turn the display "off" via > > >> > > > > > > the gnome > > >> > > > > > > display settings, its the same issue as when the laptop lid > > >> > > > > > > is closed, > > >> > > > > > > so unless the motherboard reads the "closed lid" the same as > > >> > > > > > > "display > > >> > > > > > > off", then im not sure if its thermal issues. > > >> > > > > > > > > >> > > > > > > On Tue, May 19, 2020 at 2:14 PM Alex Deucher > > >> > > > > > > <alexdeuc...@gmail.com> wrote: > > >> > > > > > > > > > >> > > > > > > > On Tue, May 19, 2020 at 2:59 PM Javad Karabi > > >> > > > > > > > <karabija...@gmail.com> wrote: > > >> > > > > > > > > > > >> > > > > > > > > given this setup: > > >> > > > > > > > > laptop -thunderbolt-> razer core x -> xfx rx 5600 xt raw > > >> > > > > > > > > 2 -hdmi-> monitor > > >> > > > > > > > > DRI_PRIME=1 glxgears gears gives me ~300fps > > >> > > > > > > > > > > >> > > > > > > > > given this setup: > > >> > > > > > > > > laptop -thunderbolt-> razer core x -> xfx rx 5600 xt raw > > >> > > > > > > > > 2 > > >> > > > > > > > > laptop -hdmi-> monitor > > >> > > > > > > > > > > >> > > > > > > > > glx gears gives me ~1800fps > > >> > > > > > > > > > > >> > > > > > > > > this doesnt make sense to me because i thought that > > >> > > > > > > > > having the monitor > > >> > > > > > > > > plugged directly into the card should give best > > >> > > > > > > > > performance. > > >> > > > > > > > > > > >> > > > > > > > > > >> > > > > > > > Do you have displays connected to both GPUs? If you are > > >> > > > > > > > using X which > > >> > > > > > > > ddx are you using? xf86-video-modesetting or > > >> > > > > > > > xf86-video-amdgpu? > > >> > > > > > > > IIRC, xf86-video-amdgpu has some optimizations for prime > > >> > > > > > > > which are not > > >> > > > > > > > yet in xf86-video-modesetting. Which GPU is set up as the > > >> > > > > > > > primary? > > >> > > > > > > > Note that the GPU which does the rendering is not > > >> > > > > > > > necessarily the one > > >> > > > > > > > that the displays are attached to. The render GPU renders > > >> > > > > > > > to it's > > >> > > > > > > > render buffer and then that data may end up being copied > > >> > > > > > > > other GPUs > > >> > > > > > > > for display. Also, at this point, all shared buffers have > > >> > > > > > > > to go > > >> > > > > > > > through system memory (this will be changing eventually > > >> > > > > > > > now that we > > >> > > > > > > > support device memory via dma-buf), so there is often an > > >> > > > > > > > extra copy > > >> > > > > > > > involved. > > >> > > > > > > > > > >> > > > > > > > > theres another really weird issue... > > >> > > > > > > > > > > >> > > > > > > > > given setup 1, where the monitor is plugged in to the > > >> > > > > > > > > card: > > >> > > > > > > > > when i close the laptop lid, my monitor is "active" and > > >> > > > > > > > > whatnot, and i > > >> > > > > > > > > can "use it" in a sense > > >> > > > > > > > > > > >> > > > > > > > > however, heres the weirdness: > > >> > > > > > > > > the mouse cursor will move along the monitor perfectly > > >> > > > > > > > > smooth and > > >> > > > > > > > > fine, but all the other updates to the screen are > > >> > > > > > > > > delayed by about 2 > > >> > > > > > > > > or 3 seconds. > > >> > > > > > > > > that is to say, its as if the laptop is doing everything > > >> > > > > > > > > (e.g. if i > > >> > > > > > > > > open a terminal, the terminal will open, but it will > > >> > > > > > > > > take 2 seconds > > >> > > > > > > > > for me to see it) > > >> > > > > > > > > > > >> > > > > > > > > its almost as if all the frames and everything are being > > >> > > > > > > > > drawn, and > > >> > > > > > > > > the laptop is running fine and everything, but i simply > > >> > > > > > > > > just dont get > > >> > > > > > > > > to see it on the monitor, except for one time every 2 > > >> > > > > > > > > seconds. > > >> > > > > > > > > > > >> > > > > > > > > its hard to articulate, because its so bizarre. its not > > >> > > > > > > > > like, a "low > > >> > > > > > > > > fps" per se, because the cursor is totally smooth. but > > >> > > > > > > > > its that > > >> > > > > > > > > _everything else_ is only updated once every couple > > >> > > > > > > > > seconds. > > >> > > > > > > > > > >> > > > > > > > This might also be related to which GPU is the primary. > > >> > > > > > > > It still may > > >> > > > > > > > be the integrated GPU since that is what is attached to > > >> > > > > > > > the laptop > > >> > > > > > > > panel. Also the platform and some drivers may do certain > > >> > > > > > > > things when > > >> > > > > > > > the lid is closed. E.g., for thermal reasons, the > > >> > > > > > > > integrated GPU or > > >> > > > > > > > CPU may have a more limited TDP because the laptop cannot > > >> > > > > > > > cool as > > >> > > > > > > > efficiently. > > >> > > > > > > > > > >> > > > > > > > Alex _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx