Am 12.09.2014 um 17:48 schrieb Jerome Glisse: > On Fri, Sep 12, 2014 at 05:42:57PM +0200, Christian K?nig wrote: >> Am 12.09.2014 um 17:33 schrieb Jerome Glisse: >>> On Fri, Sep 12, 2014 at 11:25:12AM -0400, Alex Deucher wrote: >>>> On Fri, Sep 12, 2014 at 10:50 AM, Jerome Glisse <j.glisse at gmail.com> >>>> wrote: >>>>> On Fri, Sep 12, 2014 at 04:43:44PM +0200, Daniel Vetter wrote: >>>>>> On Fri, Sep 12, 2014 at 4:09 PM, Daniel Vetter <daniel at ffwll.ch> >>>>>> wrote: >>>>>>> On Fri, Sep 12, 2014 at 03:23:22PM +0200, Christian K?nig wrote: >>>>>>>> Hello everyone, >>>>>>>> >>>>>>>> to allow concurrent buffer access by different engines beyond the >>>>>>>> multiple >>>>>>>> readers/single writer model that we currently use in radeon and other >>>>>>>> drivers we need some kind of synchonization object exposed to >>>>>>>> userspace. >>>>>>>> >>>>>>>> My initial patch set for this used (or rather abused) zero sized GEM >>>>>>>> buffers >>>>>>>> as fence handles. This is obviously isn't the best way of doing this >>>>>>>> (to >>>>>>>> much overhead, rather ugly etc...), Jerome commented on this >>>>>>>> accordingly. >>>>>>>> >>>>>>>> So what should a driver expose instead? Android sync points? Something >>>>>>>> else? >>>>>>> I think actually exposing the struct fence objects as a fd, using >>>>>>> android >>>>>>> syncpts (or at least something compatible to it) is the way to go. >>>>>>> Problem >>>>>>> is that it's super-hard to get the android guys out of hiding for this >>>>>>> :( >>>>>>> >>>>>>> Adding a bunch of people in the hopes that something sticks. >>>>>> More people. >>>>> Just to re-iterate, exposing such thing while still using command stream >>>>> ioctl that use implicit synchronization is a waste and you can only get >>>>> the lowest common denominator which is implicit synchronization. So i do >>>>> not see the point of such api if you are not also adding a new cs ioctl >>>>> with explicit contract that it does not do any kind of synchronization >>>>> (it could be almost the exact same code modulo the do not wait for >>>>> previous cmd to complete). >>>> Our thinking was to allow explicit sync from a single process, but >>>> implicitly sync between processes. >>> This is a BIG NAK if you are using the same ioctl as it would mean you are >>> changing userspace API, well at least userspace expectation. Adding a new >>> cs flag might do the trick but it should not be about inter-process, or any >>> thing special, it's just implicit sync or no synchronization. Converting >>> userspace is not that much of a big deal either, it can be broken into >>> several step. Like mesa use explicit synchronization all time but ddx use >>> implicit. >> The thinking here is that we need to be backward compatible for DRI2/3 and >> support all kind of different use cases like old DDX and new Mesa, or old >> Mesa and new DDX etc... >> >> So for my prototype if the kernel sees any access of a BO from two different >> clients it falls back to the old behavior of implicit synchronization of >> access to the same buffer object. That might not be the fastest approach, >> but is as far as I can see conservative and so should work under all >> conditions. >> >> Apart from that the planning so far was that we just hide this feature >> behind a couple of command submission flags and new chunks. > Just to reproduce IRC discussion, i think it's a lot simpler and not that > complex. For explicit cs ioctl you do not wait for any previous fence of > any of the buffer referenced in the cs ioctl, but you still associate a > new fence with all the buffer object referenced in the cs ioctl. So if the > next ioctl is an implicit sync ioctl it will wait properly and synchronize > properly with previous explicit cs ioctl. Hence you can easily have a mix > in userspace thing is you only get benefit once enough of your userspace > is using explicit.
Yes, that's exactly what my patches currently implement. The only difference is that by current planning I implemented it as a per BO flag for the command submission, but that was just for testing. Having a single flag to switch between implicit and explicit synchronization for whole CS IOCTL would do equally well. > Note that you still need a way to have explicit cs ioctl to wait on a > previos "explicit" fence so you need some api to expose fence per cs > submission. Exactly, that's what this mail thread is all about. As Daniel correctly noted you need something like a functionality to get a fence as the result of a command submission as well as pass in a list of fences to wait for before beginning a command submission. At least it looks like we are all on the same general line here, its just nobody has a good idea how the details should look like. Regards, Christian. > > Cheers, > J?r?me > >> Regards, >> Christian. >> >>> Cheers, >>> J?r?me >>> >>>> Alex >>>> >>>>> Also one thing that the Android sync point does not have, AFAICT, is a >>>>> way to schedule synchronization as part of a cs ioctl so cpu never have >>>>> to be involve for cmd stream that deal only one gpu (assuming the driver >>>>> and hw can do such trick). >>>>> >>>>> Cheers, >>>>> J?r?me >>>>> >>>>>> -Daniel >>>>>> -- >>>>>> Daniel Vetter >>>>>> Software Engineer, Intel Corporation >>>>>> +41 (0) 79 365 57 48 - http://blog.ffwll.ch >>>>> _______________________________________________ >>>>> dri-devel mailing list >>>>> dri-devel at lists.freedesktop.org >>>>> http://lists.freedesktop.org/mailman/listinfo/dri-devel