Hi Everyone, On Wed, May 10, 2017 at 9:24 AM, Inki Dae <inki....@samsung.com> wrote: > > > 2017년 04월 26일 07:21에 Sakari Ailus 이(가) 쓴 글: >> Hi Marek, >> >> On Thu, Apr 20, 2017 at 01:23:09PM +0200, Marek Szyprowski wrote: >>> Hi Laurent, >>> >>> On 2017-04-20 12:25, Laurent Pinchart wrote: >>>> Hi Marek, >>>> >>>> (CC'ing Sakari Ailus) >>>> >>>> Thank you for the patches. >>>> >>>> On Thursday 20 Apr 2017 11:13:36 Marek Szyprowski wrote: >>>>> Dear all, >>>>> >>>>> This is an updated proposal for extending EXYNOS DRM API with generic >>>>> support for hardware modules, which can be used for processing image data >>>> >from the one memory buffer to another. Typical memory-to-memory operations >>>>> are: rotation, scaling, colour space conversion or mix of them. This is a >>>>> follow-up of my previous proposal "[RFC 0/2] New feature: Framebuffer >>>>> processors", which has been rejected as "not really needed in the DRM >>>>> core": >>>>> http://www.mail-archive.com/dri-devel@lists.freedesktop.org/msg146286.html >>>>> >>>>> In this proposal I moved all the code to Exynos DRM driver, so now this >>>>> will be specific only to Exynos DRM. I've also changed the name from >>>>> framebuffer processor (fbproc) to picture processor (pp) to avoid >>>>> confusion >>>>> with fbdev API. >>>>> >>>>> Here is a bit more information what picture processors are: >>>>> >>>>> Embedded SoCs are known to have a number of hardware blocks, which perform >>>>> such operations. They can be used in paralel to the main GPU module to >>>>> offload CPU from processing grapics or video data. One of example use of >>>>> such modules is implementing video overlay, which usually requires color >>>>> space conversion from NV12 (or similar) to RGB32 color space and scaling >>>>> to >>>>> target window size. >>>>> >>>>> The proposed API is heavily inspired by atomic KMS approach - it is also >>>>> based on DRM objects and their properties. A new DRM object is introduced: >>>>> picture processor (called pp for convenience). Such objects have a set of >>>>> standard DRM properties, which describes the operation to be performed by >>>>> respective hardware module. In typical case those properties are a source >>>>> fb id and rectangle (x, y, width, height) and destination fb id and >>>>> rectangle. Optionally a rotation property can be also specified if >>>>> supported by the given hardware. To perform an operation on image data, >>>>> userspace provides a set of properties and their values for given fbproc >>>>> object in a similar way as object and properties are provided for >>>>> performing atomic page flip / mode setting. >>>>> >>>>> The proposed API consists of the 3 new ioctls: >>>>> - DRM_IOCTL_EXYNOS_PP_GET_RESOURCES: to enumerate all available picture >>>>> processors, >>>>> - DRM_IOCTL_EXYNOS_PP_GET: to query capabilities of given picture >>>>> processor, >>>>> - DRM_IOCTL_EXYNOS_PP_COMMIT: to perform operation described by given >>>>> property set. >>>>> >>>>> The proposed API is extensible. Drivers can attach their own, custom >>>>> properties to add support for more advanced picture processing (for >>>>> example >>>>> blending). >>>>> >>>>> This proposal aims to replace Exynos DRM IPP (Image Post Processing) >>>>> subsystem. IPP API is over-engineered in general, but not really >>>>> extensible >>>>> on the other side. It is also buggy, with significant design flaws - the >>>>> biggest issue is the fact that the API covers memory-2-memory picture >>>>> operations together with CRTC writeback and duplicating features, which >>>>> belongs to video plane. Comparing with IPP subsystem, the PP framework is >>>>> smaller (1807 vs 778 lines) and allows driver simplification (Exynos >>>>> rotator driver smaller by over 200 lines). >>>> This seems to be the kind of hardware that is typically supported by V4L2. >>>> Stupid question, why DRM ? >>> >>> Let me elaborate a bit on the reasons for implementing it in Exynos DRM: >>> >>> 1. we want to replace existing Exynos IPP subsystem: >>> - it is used only in some internal/vendor trees, not in open-source >>> - we want it to have sane and potentially extensible userspace API >>> - but we don't want to loose its functionality >>> >>> 2. we want to have simple API for performing single image processing >>> operation: >>> - typically it will be used by compositing window manager, this means that >>> some parameters of the processing might change on each vblank (like >>> destination rectangle for example). This api allows such change on each >>> operation without any additional cost. V4L2 requires to reinitialize >>> queues with new configuration on such change, what means that a bunch of >>> ioctls has to be called. >> >> What do you mean by re-initialising the queue? Format, buffers or something >> else? >> >> If you need a larger buffer than what you have already allocated, you'll >> need to re-allocate, V4L2 or not. >> >> We also do lack a way to destroy individual buffers in V4L2. It'd be up to >> implementing that and some work in videobuf2. >> >> Another thing is that V4L2 is very stream oriented. For most devices that's >> fine as a lot of the parameters are not changeable during streaming, >> especially if the pipeline is handled by multiple drivers. That said, for >> devices that process data from memory to memory performing changes in the >> media bus formats and pipeline configuration is not very efficient >> currently, largely for the same reason. >> >> The request API that people have been working for a bit different use cases >> isn't in mainline yet. It would allow more efficient per-request >> configuration than what is currently possible, but it has turned out to be >> far from trivial to implement. >> >>> - validating processing parameters in V4l2 API is really complicated, >>> because the parameters (format, src&dest rectangles, rotation) are being >>> set incrementally, so we have to either allow some impossible, >>> transitional >>> configurations or complicate the configuration steps even more (like >>> calling some ioctls multiple times for both input and output). In the end >>> all parameters have to be again validated just before performing the >>> operation. >> >> You have to validate the parameters in any case. In a MC pipeline this takes >> place when the stream is started. >> >>> >>> 3. generic approach (to add it to DRM core) has been rejected: >>> http://www.mail-archive.com/dri-devel@lists.freedesktop.org/msg146286.html >> >> For GPUs I generally understand the reasoning: there's a very limited number >> of users of this API --- primarily because it's not an application >> interface. >> >> If you have a device that however falls under the scope of V4L2 (at least >> API-wise), does this continue to be the case? Will there be only one or two >> (or so) users for this API? Is it the case here? >> >> Using a device specific interface definitely has some benefits: there's no >> need to think how would you generalise the interface for other similar >> devices. There's no need to consider backwards compatibility as it's not a >> requirement. The drawback is that the applications that need to support >> similar devices will bear the burden of having to support different APIs. >> >> I don't mean to say that you should ram whatever under V4L2 / MC >> independently of how unworkable that might be, but there are also clear >> advantages in using a standardised interface such as V4L2. >> >> V4L2 has a long history behind it and if it was designed today, I bet it >> would look quite different from what it is now. > > It's true. There is definitely a benefit with V4L2 because V4L2 provides > Linux standard ABI - for DRM as of now not. > > However, I think that is a only benefit we could get through V4L2. Using V4L2 > makes software stack of Platform to be complicated - We have to open video > device node and card device node to display a image on the screen scaling or > converting color space of the image and also we need to export DMA buffer > from one side and import it to other side using DMABUF. > > It may not related to this but even V4L2 has performance problem - every > QBUF/DQBUF requests performs mapping/unmapping DMA buffer you already know > this. :) > > In addition, recently Display subsystems on ARM SoC tend to include pre/post > processing hardware in Display controller - OMAP, Exynos8895 and MSM as long > as I know. >
I agree with many of the arguments given by Inki above and earlier by Marek. However, they apply to already existing V4L2 implementation, not V4L2 as the idea in general, and I believe a comparison against a complete new API that doesn't even exist in the kernel tree and userspace yet (only in terms of patches on the list) is not fair. I strongly (if that's of any value) stand on Sakari's side and also agree with DRM maintainers. V4L2 is already there, provides a general interface for the userspace and already support the kind of devices Marek mention. Sure, it might have several issues, but please give me an example of a subsystem/interface/code that doesn't have any. Instead of taking the easy (for short term) path, with a bit more effort we can get something than in long run should end up much better. Best regards, Tomasz > > Thanks, > Inki Dae > >> >>> >>> 4. this api can be considered as extended 'blit' operation, other DRM >>> drivers >>> (MGA, R128, VIA) already have ioctls for such operation, so there is also >>> place in DRM for it >> >> Added LMML to cc. >> _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel