On 08/17/2016 06:27 PM, Christian König wrote: >> AMD uses copy swaps because radeon/amdgpu kms can't switch the >> scanout mode from tiled to linear on the fly during flips. > Well I'm not an expert on this, but as far as I know the bigger problem > is that the dedicated AMD hardware generations you are targeting usually > can't reliable scanout from system memory without a rather complicated > setup. > > So that is a complete NAK to the radeon changes.
Hi Christian, thanks for the feedback, but i think that's a misunderstanding. The patches don't make them scanout from system memory, they just enforce a fresh copy from RAM/GTT -> VRAM before scanning out a buffer again. I just assume there is a more elegant/clean way than this "fake" pin/unpin to GTT to essentially tell the driver that its current VRAM content is stale and needs a refresh from the up to date dmabuf in system RAM. Btw. i'll be offline for the next few hours, just wanted to get this out now. thanks, -mario > > Regards, > Christian. > > Am 17.08.2016 um 18:12 schrieb Mario Kleiner: >> Hi, >> >> i spent some time playing with DRI3/Present + PRIME for testing >> how well it works for Optimus/Enduro style setups wrt. page flipping >> on the current kernel/mesa/xorg. I want page flipping, because >> neuroscience/medical applications need the reliable timing/timestamping >> and tear free presentation we currently only can get via page >> flipping, but not the copyswap path. >> >> Intel as display gpu + nouveau for render offload worked nicely >> on intel-ddx with page flipping, proper timing, dmabuf fence sync >> and all. >> >> AMD uses copy swaps because radeon/amdgpu kms can't switch the >> scanout mode from tiled to linear on the fly during flips. That's >> a todo in itself. For the moment i used the ati-ddx with Option >> "ColorTiling/ColorTiling2D" "off" to force my pair of old Radeon >> HD-5770's into linear mode so page flipping can be used for >> prime. The current modesetting-ddx will use page flipping in >> any case as it doesn't detect the tiling format mismatch. >> >> nouveau uses page flips. >> >> Turns out that prime + page flipping currently doesn't work >> on nouveau and amd. The first offload rendered images from >> the imported dmabufs show up properly, but then the display >> is stuck alternating between the first two or three rendered >> frames. >> >> The problem is that during the pageflip ioctl we pin the >> dmabuf into VRAM in preparation for scanout, then unpin it >> when we are done with it at next flip, but the buffer stays >> in the VRAM memory domain. Next time we flip to the buffer >> again, the driver skips the DMA copy from GTT to VRAM during >> pinning, because the buffers content apparently already resides >> in VRAM. Therefore it doesn't update the VRAM copy with the updated >> dmabuf content in system RAM, so freshly rendered frames from the >> prime export/render offload gpu never reach the display gpu and one >> only sees stale images. >> >> The attached patches for nouveau and radeon kms seem to work >> pretty ok, page flipping works, display updates, tear-free, >> dmabuf fence sync works, onset timing/timestamping is correct. >> They simply pin the buffer back into GTT, then unpin, to force >> a move of the buffer into the GTT domain, and thereby force the >> following pin to do a new copy from GTT -> VRAM. The code tries >> to avoid a useless copy from VRAM -> GTT during the pin op. >> >> However, the approach feels very much like a hack, so i assume >> this is not the proper way of doing it? I looked what ttm has >> to offer, but couldn't find anything elegant and obvious. Maybe >> there is a way to evict a bo without actually copying data back >> to RAM? Or to invalidate the VRAM copy as stale? Maybe i just >> missed something, as i'm not very familiar with ttm. >> >> Thoughts or suggestions? >> >> Another insight with my hacks is so far that nouveau seems to >> be fast as prime exporter/renderoffload, but rather slow as >> display gpu/prime importer, as tested on a 2008 or 2009 >> MacBookPro dual-Nvidia laptop. >> >> AMD, as tested with dual Radeon HD-5770 seems to be fast as prime >> importer/display gpu, but very slow as prime exporter/render offload, >> e.g., taking 16 msecs to get a 1920x1080 framebuffer into RAM. Seems >> that Mesa's blitImage function is the slow bit here. On r600 it seems >> to draw a textured triangle strip to detile the gpu renderbuffer and >> copy it into GTT. As drawing a textured fullscreen quad is normally >> much faster, something special seems to be going on there wrt. DMA? >> However, i don't have a realistic real Enduro test setup with AMD >> iGPU + dGPU, only this cobbled together dual HD-5770's in a MacPro, >> so this could be wrong. >> >> thanks, >> -mario >> >> _______________________________________________ >> dri-devel mailing list >> dri-devel at lists.freedesktop.org >> https://lists.freedesktop.org/mailman/listinfo/dri-devel > >