A few questions about the best way to implement RandR 1.4 / PRIME buffer sharing

Aaron Plattner Tue, 4 Sep 2012 13:57:32 -0700

On 08/31/2012 08:00 PM, Dave Airlie wrote:
>> object interface, so that Optimus-based laptops can use our driver to drive
>> the discrete GPU and display on the integrated GPU.  The good news is that
>> I've got a proof of concept working.
>
> Don't suppose you'll be interested in adding the other method at some point as
> well? since saving power is probably important to a lot of people


That's milestone 2.  I'm focusing on display offload to start because it's
easier to implement and lays the groundwork for the kernel pieces.  I have to
emphasize that I'm just doing a feasibility study right now and I can't promise
that we're going to officially support this stuff.

>> During a review of the current code, we came up with a few concerns:
>>
>> 1. The output source is responsible for allocating the shared memory
>>
>> Right now, the X server calls CreatePixmap on the output source screen and
>> then expects the output sink screen to be able to display from whatever 
>> memory
>> the source allocates.  Right now, the source has no mechanism for asking the
>> sink what its requirements are for the surface.  I'm using our own internal
>> pitch alignment requirements and that seems to be good enough for the Intel
>> device to scan out, but that could be pure luck.
>
> Well in theory it might be nice but it would have been premature since so far
> the only interactions for prime are combination of intel, nvidia and AMD, and
> I think everyone has fairly similar pitch alignment requirements, I'd be
> interested in adding such an interface but I don't think its some I personally
> would be working on.

Okay.  Hopefully that won't be too painful to add if we ever need it in the
future.

>> other, or is it sufficient to just define a lowest common denominator format
>> and if your hardware can't deal with that format, you just don't get to share
>> buffers?
>
> At the moment I'm happy to just go with linear, minimum pitch alignment 64 or

256, for us.

> something as a base standard, but yeah I'm happy for it to work either way,
> just don't have enough evidence it's worth it yet. I've not looked at ARM
> stuff, so patches welcome if people consider they need to use this stuff for
> SoC devices.

We can always hack it to whatever is necessary if we see that the sink side
driver is Tegra, but I was hoping for something more general.

>> 2. There's no fallback mechanism if sharing can't be negotiated
>>
>> If RandR fails to share a pixmap with the output sink screen, the whole
>> modeset fails.  This means you'll end up not seeing anything on the screen 
>> and
>> you'll probably think your computer locked up.  Should there be some sort of
>> software copy fallback to ensure that something at least shows up on the
>> display?
>
> Uggh, it would be fairly slow and unuseable, I'd rather they saw nothing, but
> again open to suggestions on how to make this work, since it might fail for
> other reasons and in that case there is still nothing a sw copy can do. What
> happens if the slave intel device just fails to allocate a pixmap, but yeah
> I'm willing to think about it a bit more when we have some reference
> implementations.

Just rolling back the modeset operation to whatever was working before would be
a good start.

It's worse than that on my current laptop, though, since our driver sees a
phantom CRT output and we happily start driving pixels to it that end up going
nowhere.  I'll need to think about what the right behavior is there since I
don't know if we want to rely on an X client to make that configuration work.

>> 3. How should the memory be allocated?
>>
>> In the prototype I threw together, I'm allocating the shared memory using
>> shm_open and then exporting that as a dma-buf file descriptor using an ioctl 
>> I
>> added to the kernel, and then importing that memory back into our driver
>> through dma_buf_attach & dma_buf_map_attachment.  Does it make sense for
>> user-space programs to be able to export shmfs files like that?  Should that
>> interface go in DRM / GEM / PRIME instead?  Something else?  I'm pretty
>> unfamiliar with this kernel code so any suggestions would be appreciated.
>
> Your kernel driver should in theory be doing it all, if you allocate shared
> pixmaps in GTT accessible memory, then you need an ioctl to tell your kernel
> driver to export the dma buf to the fd handle.  (assuming we get rid of the
> _GPL, which people have mentioned they are open to doing). We have handle->fd
> and fd->handle interfaces on DRM, you'd need something similiar on the nvidia
> kernel driver interface.

Okay, I can do that.  We already have a mechanism for importing buffers
allocated elsewhere so reusing that for shmfs and/or dma-buf seemed like a
natural extension.  I don't think adding a separate ioctl for exporting our own
allocations will add too much extra code.

> Yes for 4 some sort of fencing is being worked on by Maarten for other stuff
> but would be a pre-req for doing this, and also some devices don't want
> fullscreen updates, like USB, so doing flipped updates would have to be
> optional or negoitated. It makes sense for us as well since things like
> gnome-shell can do full screen pageflips and we have to do full screen dirty
> updates.

Right now my implementation has two sources of tearing:

1. The dGPU reads the vidmem primary surface asynchronously from its own
    rendering to it.

2. The iGPU fetches the shared surface for display asynchronously from the dGPU
    writing into it.

#1 I can fix within our driver.  For #2, I don't want to rely on the dGPU being
able to push complete frames over the bus during vblank in response to an iGPU
fence trigger so I was thinking we would want double-buffering all the time.
Also, I was hoping to set up a proper flip chain between the dGPU, the dGPU's
DMA engine, and the Intel display engine so that for full-screen applications,
glXSwapBuffers is stalled properly without relying on the CPU to schedule
things.  Maybe that's overly ambitious for now?

-- Aaron

A few questions about the best way to implement RandR 1.4 / PRIME buffer sharing

Reply via email to