Hi, On Fri, Jul 15, 2016 at 10:42:01AM -0700, Eric Anholt wrote: >Ville Syrjälä <ville.syrjala at linux.intel.com> writes: > >> On Fri, Jul 15, 2016 at 10:09:19AM +0100, Brian Starkey wrote: >>> Hi Daniel, >>> >>> Thanks for taking a look. >>> >>> (+Cc Laurent) >>> >>> On Fri, Jul 15, 2016 at 09:33:34AM +0200, Daniel Vetter wrote: >>> >On Thu, Jul 14, 2016 at 06:03:40PM +0100, Brian Starkey wrote: >>> >> Hi, >>> >> >>> >> The Mali-DP display processors have a memory-writeback engine which >>> >> can write the result of the composition (CRTC output) to a memory >>> >> buffer in a variety of formats. >>> >> >>> >> We're looking for feedback/suggestions on how to expose this in the >>> >> mali-dp DRM kernel driver - possibly via V4L2. >>> >> >>> >> We've got a few use cases where writeback is useful: >>> >> - testing, to check the displayed image >>> > >>> >This might or might not need a separate interface. There are efforts to >>> >make the intel kms validation tests in i-g-t generic (well under way >>> >already), and part of that is creating a generic infrastructure to capture >>> >display CRCs for functional tests (still in progress). >>> > >>> >But it might be better if userspace abstracts between full readback and >>> >display CRC, assuming we can make full writeback cross-vendor enough for >>> >that use-case. >>> > >>> >>> I'd lean towards the userspace abstraction. >>> Encumbering a simple CRC interface with all the complexity of >>> full-writeback (size, scaling, pixel format, multi-planar etc.) sounds >>> a bit unnecessary. >>> >>> Of course, if v4l2 isn't going to be the cross-vendor full-writeback >>> implementation, then we need to be aiming to use whatever _is_ in >>> the mali-dp driver. >>> >>> >> - screen recording >>> >> - wireless display (e.g. Miracast) >>> >> - dual-display clone mode >>> >> - memory-to-memory composition >>> >> Note that the HW is capable of writing one of the input planes instead >>> >> of the CRTC output, but we've no good use-case for wanting to expose >>> >> that. >>> >> >>> >> In our Android ADF driver[1] we exposed the memory write engine as >>> >> part of the ADF device using ADF's "MEMORY" interface type. DRM/KMS >>> >> doesn't have any similar support for memory output from CRTCs, but we >>> >> want to expose the functionality in the mainline Mali-DP DRM driver. >>> >> >>> >> A previous discussion on the topic went towards exposing the >>> >> memory-write engine via V4L2[2]. >>> >> >>> >> I'm thinking to userspace this would look like two distinct devices: >>> >> - A DRM KMS display controller. >>> >> - A V4L2 video source. >>> >> They'd both exist in the same kernel driver. >>> >> A V4L2 client can queue up (CAPTURE) buffers in the normal way, and >>> >> the DRM driver would see if there's a buffer to dequeue every time a >>> >> new modeset is received via the DRM API - if so, it can configure the >>> >> HW to dump into it (one-shot operation). >>> >> >>> >> An implication of this is that if the screen is actively displaying a >>> >> static scene and the V4L2 client queues up a buffer, it won't get >>> >> filled until the DRM scene changes. This seems best, otherwise the >>> >> V4L2 driver has to change the HW configuration out-of-band from the >>> >> DRM device which sounds horribly racy. >>> >> >>> >> One further complication is scaling. Our HW has a scaler which can >>> >> tasked with either scaling an input plane or the buffer being written >>> >> to memory, but not both at the same time. This means we need to >>> >> arbitrate the scaler between the DRM device (scaling input planes) and >>> >> the V4L2 device (scaling output buffers). >>> >> >>> >> I think the simplest approach here is to allow V4L2 to "claim" the >>> >> scaler by setting the image size (VIDIOC_S_FMT) to something other >>> >> than the CRTC's current resolution. After that, any attempt to use the >>> >> scaler on an input plane via DRM should fail atomic_check(). >>> > >>> >That's perfectly fine atomic_check behaviour. Only trouble is that the v4l >>> >locking must integrate into the drm locking, but that should be doable. >>> >Worst case you must shadow all v4l locks with a wait/wound >>> >drm_modeset_lock to avoid deadlocks (since you could try to grab locks >>> >from either end). >>> > >>> >>> Yes, I haven't looked at the details of the locking but I'm hoping >>> it's manageable. >>> >>> >> If the V4L2 client goes away or sets the image size to the CRTC's >>> >> native resolution, then the DRM device is allowed to use the scaler. >>> >> I don't know if/how the DRM device should communicate to userspace >>> >> that the scaler is or isn't available for use. >>> >> >>> >> Any thoughts on this approach? >>> >> Is it acceptable to both V4L2 and DRM folks? >>> > >>> >For streaming a V4L2 capture device seems like the right interface. But if >>> >you want to use writeback in your compositor you must know which atomic >>> >kms update results in which frame, since if you don't you can't use that >>> >composited buffer for the next frame reliable. >>> > >>> >For that case I think a drm-only solution would be better, to make sure >>> >you can do an atomic update and writeback in one step. v4l seems to grow >>> >an atomic api of its own, but I don't think we'll have one spanning >>> >subsystems anytime soon. >>> > >>> >>> I've been thinking about this from the point of view of a HWComposer >>> implementation and I think the hybrid DRM-V4L2 device would work OK. >>> However it depends on the behaviour I mentioned above: >>> >>> >> if the screen is actively displaying a >>> >> static scene and the V4L2 client queues up a buffer, it won't get >>> >> filled until the DRM scene changes. >>> >>> V4L2 buffer queues are FIFO, so as long as the compositor queues only >>> one V4L2 buffer per atomic update, there's no ambiguity. >>> In the most simplistic case the compositor would alternate between: >>> - Queue V4L2 buffer >>> - DRM atomic update >>> ... and dequeue either in the same thread or a different one. As long >>> as the compositor keeps track of how many buffers it has queued and >>> how many atomic updates it's made, it doesn't really matter. >>> >>> We'd probably be looking to add in V4L2 asynchronous dequeue using >>> fences for synchronisation, but that's a separate issue. >>> >>> >For the kms-only interface the idea was to add a property on the crtc >>> >where you can attach a writeback drm_framebuffer. Extending that idea to >>> >the drm->v4l case we could create special drm_framebuffer objects >>> >representing a v4l sink, and attach them to the same property. That would >>> >also solve the problem of getting some agreement on buffer metadata >>> >between v4l and drm side. >>> > >>> >>> I think a drm_framebuffer on its own wouldn't be enough to handle our >>> scaling case - at that point it starts to look more like a plane. >>> However, if "special" CRTC sinks became a thing it could allow us to >>> chain our writeback output to another CRTC's input (via memory) >>> without a trip through userspace, which would be nice. >> >> My thinking has been that we could represent the writeback sink >> as a connector. Combine that with my other idea of exposing a >> fixed mode property for each connector (to make output scaling >> user configurable), and we should end up with a way to control >> the scaling of the writeback path. >> >> Also it would mean there's always a connector attached to the >> crtc, even for the pure writeback only case, which I think should >> result in less problems in general as I'm sure there are a lot of >> assumptions in the code that there must be at least one connector >> for an active crtc. > >This is what I've been thinking for vc4's writeback as well. I >definitely want my writeback to be using the same drivers/gpu/drm/vc4 >code for setting up the hardware. And, since I expect writeback to be >mostly used by compositors for the case where a frame exceeds bandwidth >constraints, I want userspace to be able to use the same DRM APIs for >setting up the frame to be written back as it was trying to use for >setting up the frame before.
OK, so let's talk about using connectors instead then :-) We can attach a generic fixed_mode property which can represent panel fitters or Mali-DP's writeback scaling, that sounds good. The DRM driver can create a new "virtual" encoder/connector pair, I guess with a new connector type specifically for memory-writeback? On this connector we allow the fb_id property to attach a framebuffer for writing into. We'd probably want an additional property to enable writeback, perhaps an enum: "disabled", "streaming", "oneshot". I think it makes sense to put the output buffer format, size etc. validation in the virtual encoder's atomic_check. It has access to both CRTC and connector state, and the encoder is supposed to represent the format/stream conversion element anyway. What I'm not so clear on is how to manage the API with userspace. Changing connectors is a "full modeset required" operation. I expect this means an application needs to know up-front if it will ever make use of memory-writeback, and if so do its initial modeset on both the writeback and the normal connector, even if it doesn't attach a writeback framebuffer until much later. Otherwise, enabling writeback will force a full disable-modeset-enable cycle. The writeback connector would have to be hidden from legacy clients which would expect it to work like a normal connector - I suppose that's somewhat similar to universal planes. For the case that Ville mentioned where only the writeback connector is connected, would userspace be expected to "make up" a drm_mode_modeinfo to set? I can't think of any meaningful list of available modes for the writeback connector to report. Thanks for all the input so far! -Brian