On Wed, Jan 23, 2019 at 3:35 PM Oded Gabbay <oded.gab...@gmail.com> wrote: > > On Thu, Jan 24, 2019 at 1:20 AM Jerome Glisse <jgli...@redhat.com> wrote: > > > > On Wed, Jan 23, 2019 at 03:04:33PM -0800, Olof Johansson wrote: > > > On Wed, Jan 23, 2019 at 2:45 PM Dave Airlie <airl...@gmail.com> wrote: > > > > > > > > On Thu, 24 Jan 2019 at 08:32, Oded Gabbay <oded.gab...@gmail.com> wrote: > > > > > > > > > > On Thu, Jan 24, 2019 at 12:02 AM Dave Airlie <airl...@gmail.com> > > > > > wrote: > > > > > > > > > > > > Adding Daniel as well. > > > > > > > > > > > > Dave. > > > > > > > > > > > > On Thu, 24 Jan 2019 at 07:57, Dave Airlie <airl...@gmail.com> wrote: > > > > > > > > > > > > > > On Wed, 23 Jan 2019 at 10:01, Oded Gabbay <oded.gab...@gmail.com> > > > > > > > wrote: > > > > > > > > > > > > > > > > Hello, > > > > > > > > > > > > > > > > For those who don't know me, my name is Oded Gabbay (Kernel > > > > > > > > Maintainer > > > > > > > > for AMD's amdkfd driver, worked at RedHat's Desktop group) and > > > > > > > > I work at > > > > > > > > Habana Labs since its inception two and a half years ago. > > > > > > > > > > > > > > Hey Oded, > > > > > > > > > > > > > > So this creates a driver with a userspace facing API via ioctls. > > > > > > > Although this isn't a "GPU" driver we have a rule in the graphics > > > > > > > drivers are for accelerators that we don't merge userspace API > > > > > > > with an > > > > > > > appropriate userspace user. > > > > > > > > > > > > > > https://dri.freedesktop.org/docs/drm/gpu/drm-uapi.html#open-source-userspace-requirements > > > > > > > > > > > > > > I see nothing in these accelerator drivers that make me think we > > > > > > > should be treating them different. > > > > > > > > > > > > > > Having large closed userspaces that we have no insight into means > > > > > > > we > > > > > > > get suboptimal locked for ever uAPIs. If someone in the future > > > > > > > creates > > > > > > > an open source userspace, we will end up in a place where they get > > > > > > > suboptimal behaviour because they are locked into a uAPI that we > > > > > > > can't > > > > > > > change. > > > > > > > > > > > > > > Dave. > > > > > > > > > > Hi Dave, > > > > > While I always appreciate your opinion and happy to hear it, I totally > > > > > disagree with you on this point. > > > > > > > > > > First of all, as you said, this device is NOT a GPU. Hence, I wasn't > > > > > aware that this rule might apply to this driver or to any other driver > > > > > outside of drm. Has this rule been applied to all the current drivers > > > > > in the kernel tree with userspace facing API via IOCTLs, which are not > > > > > in the drm subsystem ? I see the logic for GPUs as they drive the > > > > > display of the entire machine, but this is an accelerator for a > > > > > specific purpose, not something generic as GPU. I just don't see how > > > > > one can treat them in the same way. > > > > > > > > The logic isn't there for GPUs for those reason that we have an > > > > established library or that GPUs are in laptops. They are just where > > > > we learned the lessons of merging things whose primary reason for > > > > being in the kernel is to execute stuff from misc userspace stacks, > > > > where the uAPI has to remain stable indefinitely. > > > > > > > > a) security - without knowledge of what the accelerator can do how can > > > > we know if the API you expose isn't just a giant root hole? > > > > > > > > b) uAPI stability. Without a userspace for this, there is no way for > > > > anyone even if in possession of the hardware to validate the uAPI you > > > > provide and are asking the kernel to commit to supporting indefinitely > > > > is optimal or secure. If an open source userspace appears is it to be > > > > limited to API the closed userspace has created. It limits the future > > > > unnecessarily. > > > > > > > > > There is no way that "someone" will create a userspace > > > > > for our H/W without the intimate knowledge of the H/W or without the > > > > > ISA of our programmable cores. Maybe for large companies this request > > > > > is valid, but for startups complying to this request is not realistic. > > > > > > > > So what benefit does the Linux kernel get from having support for this > > > > feature upstream? > > > > > > > > If users can't access the necessary code to use it, why does this > > > > require to be maintained in the kernel. > > > > > > > > > To conclude, I think this approach discourage other companies from > > > > > open sourcing their drivers and is counter-productive. I'm not sure > > > > > you are aware of how difficult it is to convince startup management to > > > > > opensource the code... > > > > > > > > Oh I am, but I'm also more aware how quickly startups go away and > > > > leave the kernel holding a lot of code we don't know how to validate > > > > or use. > > > > > > > > I'm opening to being convinced but I think defining new userspace > > > > facing APIs is a task that we should take a lot more seriously going > > > > forward to avoid mistakes of the past. > > > > > > I think the most important thing here is to know that things are > > > likely to change quite a bit over the next couple of years, and that > > > we don't know yet what we actually need. If we hold off picking up > > > support for hardware while all of this is ironed out, we'll miss out > > > on being exposed to it, and will have a very tall hill to climb once > > > we try to convince vendors to come into the fold. It's also not been a > > > requirement for the other two drivers we have merged, as far as I can > > > tell (CAPI and OpenCAPI) so the cat's already out of the bag. > > > > > > I'd rather not get stuck in a stand-off needing the longterm solution > > > to pick up the short term contribution. That way we can move over to a > > > _new_ API once there's been a better chance of finding common grounds > > > and once things settle down a bit, instead of trying to bring some > > > larger legacy codebase for devices that people might no longer care > > > much about over to the newer APIs. > > > > > > It's better to be exposed to the HW and drivers now, than having > > > people build large elaborate out-of-tree software stacks for this. > > > It's also better to get them to come and collaborate now, instead of > > > pushing them away until things are perfect. > > > > > > Having a way to validate and exercise the userspace API is important, > > > including ability to change it if needed. Would it be possible to open > > > up the lowest userspace pieces (driver interactions), even if some > > > other layers might not yet be, to exercise the device/kernel/userspace > > > interfaces without "live" workload, etc? > > > > Yes and to exercise the userspace API you need at very least to > > know the ISA so that you can write program for the accelerator. > > You also need to know the set of commands the hardware has. The > > ioctl and how to create a userspace that interact with the kernel > > is the easy part, the hard part is the compiler. > > So actually in my case in order to exercise the IOCTL API, you can > give "work" to the device that will not trigger the compute parts, but > only the different queues and the DMA engines. > I think that is enough to validate that the IOCTLs won't break. > All the "commands" that you can give to the queue logic (QMAN) is > exposed in one of the files in the driver (goya_packets.h). > > I want to stress this - To validate the IOCTLs, it is enough to do DMA > work. You will use ALL the 5 IOCTLs to do just that - give work to the > DMA engines.
I personally think this is a reasonable trade-off, given that you have a communication layer between. For hardware that doesn't have that, and where device behavior and data movement depends on execution on the compute parts, more would need to be open. -Olof