Am 17.08.2016 um 18:35 schrieb Mario Kleiner: > On 08/17/2016 06:27 PM, Christian König wrote: >>> AMD uses copy swaps because radeon/amdgpu kms can't switch the >>> scanout mode from tiled to linear on the fly during flips. >> Well I'm not an expert on this, but as far as I know the bigger problem >> is that the dedicated AMD hardware generations you are targeting usually >> can't reliable scanout from system memory without a rather complicated >> setup. >> >> So that is a complete NAK to the radeon changes. > > Hi Christian, > > thanks for the feedback, but i think that's a misunderstanding. The > patches don't make them scanout from system memory, they just enforce > a fresh copy from RAM/GTT -> VRAM before scanning out a buffer again. > I just assume there is a more elegant/clean way than this "fake" > pin/unpin to GTT to essentially tell the driver that its current VRAM > content is stale and needs a refresh from the up to date dmabuf in > system RAM.
I was already wondering how the heck you got that working. What do you mean with a fresh copy from GTT to VRAM? A buffer exported by DMA-buf should never move as long as it is exported, same for a buffer pinned to VRAM. So using a DMA-buf for scanout is impossible and actually not valuable cause is shouldn't matter if we copy from GTT to VRAM because of a buffer migration or because of a copy triggered by the DDX. What are you actually trying to do here? Regards, Christian. > > Btw. i'll be offline for the next few hours, just wanted to get this > out now. > > thanks, > -mario > >> >> Regards, >> Christian. >> >> Am 17.08.2016 um 18:12 schrieb Mario Kleiner: >>> Hi, >>> >>> i spent some time playing with DRI3/Present + PRIME for testing >>> how well it works for Optimus/Enduro style setups wrt. page flipping >>> on the current kernel/mesa/xorg. I want page flipping, because >>> neuroscience/medical applications need the reliable timing/timestamping >>> and tear free presentation we currently only can get via page >>> flipping, but not the copyswap path. >>> >>> Intel as display gpu + nouveau for render offload worked nicely >>> on intel-ddx with page flipping, proper timing, dmabuf fence sync >>> and all. >>> >>> AMD uses copy swaps because radeon/amdgpu kms can't switch the >>> scanout mode from tiled to linear on the fly during flips. That's >>> a todo in itself. For the moment i used the ati-ddx with Option >>> "ColorTiling/ColorTiling2D" "off" to force my pair of old Radeon >>> HD-5770's into linear mode so page flipping can be used for >>> prime. The current modesetting-ddx will use page flipping in >>> any case as it doesn't detect the tiling format mismatch. >>> >>> nouveau uses page flips. >>> >>> Turns out that prime + page flipping currently doesn't work >>> on nouveau and amd. The first offload rendered images from >>> the imported dmabufs show up properly, but then the display >>> is stuck alternating between the first two or three rendered >>> frames. >>> >>> The problem is that during the pageflip ioctl we pin the >>> dmabuf into VRAM in preparation for scanout, then unpin it >>> when we are done with it at next flip, but the buffer stays >>> in the VRAM memory domain. Next time we flip to the buffer >>> again, the driver skips the DMA copy from GTT to VRAM during >>> pinning, because the buffers content apparently already resides >>> in VRAM. Therefore it doesn't update the VRAM copy with the updated >>> dmabuf content in system RAM, so freshly rendered frames from the >>> prime export/render offload gpu never reach the display gpu and one >>> only sees stale images. >>> >>> The attached patches for nouveau and radeon kms seem to work >>> pretty ok, page flipping works, display updates, tear-free, >>> dmabuf fence sync works, onset timing/timestamping is correct. >>> They simply pin the buffer back into GTT, then unpin, to force >>> a move of the buffer into the GTT domain, and thereby force the >>> following pin to do a new copy from GTT -> VRAM. The code tries >>> to avoid a useless copy from VRAM -> GTT during the pin op. >>> >>> However, the approach feels very much like a hack, so i assume >>> this is not the proper way of doing it? I looked what ttm has >>> to offer, but couldn't find anything elegant and obvious. Maybe >>> there is a way to evict a bo without actually copying data back >>> to RAM? Or to invalidate the VRAM copy as stale? Maybe i just >>> missed something, as i'm not very familiar with ttm. >>> >>> Thoughts or suggestions? >>> >>> Another insight with my hacks is so far that nouveau seems to >>> be fast as prime exporter/renderoffload, but rather slow as >>> display gpu/prime importer, as tested on a 2008 or 2009 >>> MacBookPro dual-Nvidia laptop. >>> >>> AMD, as tested with dual Radeon HD-5770 seems to be fast as prime >>> importer/display gpu, but very slow as prime exporter/render offload, >>> e.g., taking 16 msecs to get a 1920x1080 framebuffer into RAM. Seems >>> that Mesa's blitImage function is the slow bit here. On r600 it seems >>> to draw a textured triangle strip to detile the gpu renderbuffer and >>> copy it into GTT. As drawing a textured fullscreen quad is normally >>> much faster, something special seems to be going on there wrt. DMA? >>> However, i don't have a realistic real Enduro test setup with AMD >>> iGPU + dGPU, only this cobbled together dual HD-5770's in a MacPro, >>> so this could be wrong. >>> >>> thanks, >>> -mario >>> >>> _______________________________________________ >>> dri-devel mailing list >>> dri-devel at lists.freedesktop.org >>> https://lists.freedesktop.org/mailman/listinfo/dri-devel >> >>