On 08/31/2012 08:00 PM, Dave Airlie wrote: >> object interface, so that Optimus-based laptops can use our driver to drive >> the discrete GPU and display on the integrated GPU. The good news is that >> I've got a proof of concept working. > > Don't suppose you'll be interested in adding the other method at some point as > well? since saving power is probably important to a lot of people
That's milestone 2. I'm focusing on display offload to start because it's easier to implement and lays the groundwork for the kernel pieces. I have to emphasize that I'm just doing a feasibility study right now and I can't promise that we're going to officially support this stuff. >> During a review of the current code, we came up with a few concerns: >> >> 1. The output source is responsible for allocating the shared memory >> >> Right now, the X server calls CreatePixmap on the output source screen and >> then expects the output sink screen to be able to display from whatever >> memory >> the source allocates. Right now, the source has no mechanism for asking the >> sink what its requirements are for the surface. I'm using our own internal >> pitch alignment requirements and that seems to be good enough for the Intel >> device to scan out, but that could be pure luck. > > Well in theory it might be nice but it would have been premature since so far > the only interactions for prime are combination of intel, nvidia and AMD, and > I think everyone has fairly similar pitch alignment requirements, I'd be > interested in adding such an interface but I don't think its some I personally > would be working on. Okay. Hopefully that won't be too painful to add if we ever need it in the future. >> other, or is it sufficient to just define a lowest common denominator format >> and if your hardware can't deal with that format, you just don't get to share >> buffers? > > At the moment I'm happy to just go with linear, minimum pitch alignment 64 or 256, for us. > something as a base standard, but yeah I'm happy for it to work either way, > just don't have enough evidence it's worth it yet. I've not looked at ARM > stuff, so patches welcome if people consider they need to use this stuff for > SoC devices. We can always hack it to whatever is necessary if we see that the sink side driver is Tegra, but I was hoping for something more general. >> 2. There's no fallback mechanism if sharing can't be negotiated >> >> If RandR fails to share a pixmap with the output sink screen, the whole >> modeset fails. This means you'll end up not seeing anything on the screen >> and >> you'll probably think your computer locked up. Should there be some sort of >> software copy fallback to ensure that something at least shows up on the >> display? > > Uggh, it would be fairly slow and unuseable, I'd rather they saw nothing, but > again open to suggestions on how to make this work, since it might fail for > other reasons and in that case there is still nothing a sw copy can do. What > happens if the slave intel device just fails to allocate a pixmap, but yeah > I'm willing to think about it a bit more when we have some reference > implementations. Just rolling back the modeset operation to whatever was working before would be a good start. It's worse than that on my current laptop, though, since our driver sees a phantom CRT output and we happily start driving pixels to it that end up going nowhere. I'll need to think about what the right behavior is there since I don't know if we want to rely on an X client to make that configuration work. >> 3. How should the memory be allocated? >> >> In the prototype I threw together, I'm allocating the shared memory using >> shm_open and then exporting that as a dma-buf file descriptor using an ioctl >> I >> added to the kernel, and then importing that memory back into our driver >> through dma_buf_attach & dma_buf_map_attachment. Does it make sense for >> user-space programs to be able to export shmfs files like that? Should that >> interface go in DRM / GEM / PRIME instead? Something else? I'm pretty >> unfamiliar with this kernel code so any suggestions would be appreciated. > > Your kernel driver should in theory be doing it all, if you allocate shared > pixmaps in GTT accessible memory, then you need an ioctl to tell your kernel > driver to export the dma buf to the fd handle. (assuming we get rid of the > _GPL, which people have mentioned they are open to doing). We have handle->fd > and fd->handle interfaces on DRM, you'd need something similiar on the nvidia > kernel driver interface. Okay, I can do that. We already have a mechanism for importing buffers allocated elsewhere so reusing that for shmfs and/or dma-buf seemed like a natural extension. I don't think adding a separate ioctl for exporting our own allocations will add too much extra code. > Yes for 4 some sort of fencing is being worked on by Maarten for other stuff > but would be a pre-req for doing this, and also some devices don't want > fullscreen updates, like USB, so doing flipped updates would have to be > optional or negoitated. It makes sense for us as well since things like > gnome-shell can do full screen pageflips and we have to do full screen dirty > updates. Right now my implementation has two sources of tearing: 1. The dGPU reads the vidmem primary surface asynchronously from its own rendering to it. 2. The iGPU fetches the shared surface for display asynchronously from the dGPU writing into it. #1 I can fix within our driver. For #2, I don't want to rely on the dGPU being able to push complete frames over the bus during vblank in response to an iGPU fence trigger so I was thinking we would want double-buffering all the time. Also, I was hoping to set up a proper flip chain between the dGPU, the dGPU's DMA engine, and the Intel display engine so that for full-screen applications, glXSwapBuffers is stalled properly without relying on the CPU to schedule things. Maybe that's overly ambitious for now? -- Aaron