> -----Original Message----- > From: Jerin Jacob <jerinjac...@gmail.com> > Sent: Friday, August 27, 2021 20:19 > To: Thomas Monjalon <tho...@monjalon.net> > Cc: Jerin Jacob <jer...@marvell.com>; dpdk-dev <dev@dpdk.org>; Stephen > Hemminger > <step...@networkplumber.org>; David Marchand <david.march...@redhat.com>; > Andrew Rybchenko > <andrew.rybche...@oktetlabs.ru>; Wang, Haiyue <haiyue.w...@intel.com>; > Honnappa Nagarahalli > <honnappa.nagaraha...@arm.com>; Yigit, Ferruh <ferruh.yi...@intel.com>; > techbo...@dpdk.org; Elena > Agostini <eagost...@nvidia.com> > Subject: Re: [dpdk-dev] [RFC PATCH v2 0/7] heterogeneous computing library > > On Fri, Aug 27, 2021 at 3:14 PM Thomas Monjalon <tho...@monjalon.net> wrote: > > > > 31/07/2021 15:42, Jerin Jacob: > > > On Sat, Jul 31, 2021 at 1:51 PM Thomas Monjalon <tho...@monjalon.net> > > > wrote: > > > > 31/07/2021 09:06, Jerin Jacob: > > > > > On Fri, Jul 30, 2021 at 7:25 PM Thomas Monjalon <tho...@monjalon.net> > > > > > wrote: > > > > > > From: Elena Agostini <eagost...@nvidia.com> > > > > > > > > > > > > In heterogeneous computing system, processing is not only in the > > > > > > CPU. > > > > > > Some tasks can be delegated to devices working in parallel. > > > > > > > > > > > > The goal of this new library is to enhance the collaboration between > > > > > > DPDK, that's primarily a CPU framework, and other type of devices > > > > > > like GPUs. > > > > > > > > > > > > When mixing network activity with task processing on a non-CPU > > > > > > device, > > > > > > there may be the need to put in communication the CPU with the > > > > > > device > > > > > > in order to manage the memory, synchronize operations, exchange > > > > > > info, etc.. > > > > > > > > > > > > This library provides a number of new features: > > > > > > - Interoperability with device specific library with generic > > > > > > handlers > > > > > > - Possibility to allocate and free memory on the device > > > > > > - Possibility to allocate and free memory on the CPU but visible > > > > > > from the device > > > > > > - Communication functions to enhance the dialog between the CPU and > > > > > > the device > > > > > > > > > > > > The infrastructure is prepared to welcome drivers in drivers/hc/ > > > > > > as the upcoming NVIDIA one, implementing the hcdev API. > > > > > > > > > > > > Some parts are not complete: > > > > > > - locks > > > > > > - memory allocation table > > > > > > - memory freeing > > > > > > - guide documentation > > > > > > - integration in devtools/check-doc-vs-code.sh > > > > > > - unit tests > > > > > > - integration in testpmd to enable Rx/Tx to/from GPU memory. > > > > > > > > > > Since the above line is the crux of the following text, I will start > > > > > from this point. > > > > > > > > > > + Techboard > > > > > > > > > > I can give my honest feedback on this. > > > > > > > > > > I can map similar stuff in Marvell HW, where we do machine learning > > > > > as compute offload > > > > > on a different class of CPU. > > > > > > > > > > In terms of RFC patch features > > > > > > > > > > 1) memory API - Use cases are aligned > > > > > 2) communication flag and communication list > > > > > Our structure is completely different and we are using HW ring kind of > > > > > interface to post the job to compute interface and > > > > > the job completion result happens through the event device. > > > > > Kind of similar to the DMA API that has been discussed on the mailing > > > > > list. > > > > > > > > Interesting. > > > > > > It is hard to generalize the communication mechanism. > > > Is other GPU vendors have a similar communication mechanism? AMD, Intel ?? > > > > I don't know who to ask in AMD & Intel. Any ideas? > > Good question. > > At least in Marvell HW, the communication flag and communication list is > our structure is completely different and we are using HW ring kind of > interface to post the job to compute interface and > the job completion result happens through the event device. > kind of similar to the DMA API that has been discussed on the mailing list. > > > > > > > > Now the bigger question is why need to Tx and then Rx something to > > > > > compute the device > > > > > Isn't ot offload something? If so, why not add the those offload in > > > > > respective subsystem > > > > > to improve the subsystem(ethdev, cryptiodev etc) features set to adapt > > > > > new features or > > > > > introduce new subsystem (like ML, Inline Baseband processing) so that > > > > > it will be an opportunity to > > > > > implement the same in HW or compute device. For example, if we take > > > > > this path, ML offloading will > > > > > be application code like testpmd, which deals with "specific" device > > > > > commands(aka glorified rawdev) > > > > > to deal with specific computing device offload "COMMANDS" > > > > > (The commands will be specific to offload device, the same code wont > > > > > run on other compute device) > > > > > > > > Having specific features API is convenient for compatibility > > > > between devices, yes, for the set of defined features. > > > > Our approach is to start with a flexible API that the application > > > > can use to implement any processing because with GPU programming, > > > > there is no restriction on what can be achieved. > > > > This approach does not contradict what you propose, > > > > it does not prevent extending existing classes. > > > > > > It does prevent extending the existing classes as no one is going to > > > extent it there is the path of not doing do. > > > > I disagree. Specific API is more convenient for some tasks, > > so there is an incentive to define or extend specific device class APIs. > > But it should not forbid doing custom processing. > > This is the same as the raw device is in DPDK where the device > personality is not defined. > > Even if define another API and if the personality is not defined, > it comes similar to the raw device as similar > to rawdev enqueue and dequeue. > > To summarize, > > 1) My _personal_ preference is to have specific subsystems > to improve the DPDK instead of the raw device kind of path.
Something like rte_memdev to focus on device (GPU) memory management ? The new DPDK auxiliary bus maybe make life easier to solve the complex heterogeneous computing library. ;-) > 2) If the device personality is not defined, use rawdev > 3) All computing devices do not use "communication flag" and > "communication list" > kind of structure. If are targeting a generic computing device then > that is not a portable scheme. > For GPU abstraction if "communication flag" and "communication list" > is the right kind of mechanism > then we can have a separate library for GPU communication specific to GPU <-> > DPDK communication needs and explicit for GPU. > > I think generic DPDK applications like testpmd should not > pollute with device-specific functions. Like, call device-specific > messages from the application > which makes the application runs only one device. I don't have a > strong opinion(expect > standardizing "communication flag" and "communication list" as > generic computing device > communication mechanism) of others think it is OK to do that way in DPDK. > > > > > > If an application can run only on a specific device, it is similar to > > > a raw device, > > > where the device definition is not defined. (i.e JOB metadata is not > > > defined and > > > it is specific to the device). > > > > > > > > Just my _personal_ preference is to have specific subsystems to > > > > > improve the DPDK instead of raw device kind of > > > > > path. If we decide another path as a community it is _fine_ too(as a > > > > > _project manager_ point of view it will be an easy path to dump SDK > > > > > stuff to DPDK without introducing the pain of the subsystem nor > > > > > improving the DPDK). > > > > > > > > Adding a new class API is also improving DPDK. > > > > > > But the class is similar as raw dev class. The reason I say, > > > Job submission and response is can be abstracted as queue/dequeue APIs. > > > Taks/Job metadata is specific to compute devices (and it can not be > > > generalized). > > > If we generalize it makes sense to have a new class that does > > > "specific function". > > > > Computing device programming is already generalized with languages like > > OpenCL. > > We should not try to reinvent the same. > > We are just trying to properly integrate the concept in DPDK > > and allow building on top of it. > > See above. > > > > >