On Tue, Oct 19, 2021 at 21:36 Jerin Jacob <jerinjac...@gmail.com> wrote:
> On Wed, Oct 20, 2021 at 12:38 AM Thomas Monjalon <tho...@monjalon.net> wrote: > > > > 19/10/2021 20:14, jer...@marvell.com: > > > Definition of Dataplane Workload Accelerator > > > -------------------------------------------- > > > Dataplane Workload Accelerator(DWA) typically contains a set of CPUs, > > > Network controllers and programmable data acceleration engines for > > > packet processing, cryptography, regex engines, baseband processing, etc. > > > This allows DWA to offload compute/packet processing/baseband/ > > > cryptography-related workload from the host CPU to save the cost and > > > power. > > > Also to enable scaling the workload by adding DWAs to the Host CPU as > > > needed. > > > > > > Unlike other devices in DPDK, the DWA device is not fixed-function > > > due to the fact that it has CPUs and programmable HW accelerators. > > > This enables DWA personality/workload to be completely programmable. > > > Typical examples of DWA offloads are Flow/Session management, > > > Virtual switch, TLS offload, IPsec offload, l3fwd offload, etc. > > > > If I understand well, the idea is to abstract the offload > > of some stack layers in the hardware. > > Yes. It may not just HW, For expressing the complicated workloads > may need CPU and/or other HW accelerators. > > > I am not sure we should give an API for such stack layers in DPDK. > > Why not? > > > It looks to be the role of the dataplane application to finely manage > > how to use the hardware for a specific dataplane. > > It is possible with this scheme. > > > I believe the API for such layer would be either too big, or too limited, > > or not optimized for specific needs. > > It will be optimized for specific needs as applications ask for what to do? > not how to do? > > > If we really want to automate or abstract the HW/SW co-design, > > I think we should better look at compiler work like P4 or PANDA. > > The compiler stuff is very static in nature. It can address the packet > transformation > workloads. Not the ones like IPsec or baseband offload. > Another way to look at it, GPU RFC started just because you are not able > to express all the workload in P4. > That’s not the purpose of the GPU RFC. gpudev library goal is to enhance the dialog between GPU, CPU and NIC offering the possibility to: - Have DPDK aware of non-CPU memory like device memory (e.g. similarly to what happened with MPI) - Hide some memory management GPU library specific implementation details - Reduce the gap between network activity and device activity (e.g. receive/send packets directly using the device memory) - Reduce the gap between CPU activity and application-defined GPU workload - Open to the capability to interact with the GPU device, not managing it gpudev library can be easily embedded in any GPU specific application with a relatively small effort. The application can allocate, communicate and manage the memory with the device transparently through DPDK. What you are providing here is different and out of the scope of the gpudev library: control and manage the workload submission of possibly any accelerator device, hiding a lot of implementation details within DPDK. A wrapper for accelerator devices specific libraries and I think that it’s too far to be realistic. As a GPU user, I don’t want to delegate my tasks to DWA because it can’t be fully optimized, updated to the latest GPU specific feature, etc.. Additionally, a generic DWA won't work for a GPU: - Memory copies of DWA to CPU / CPU to DWA is latency expensive. Packets can directly be received in device memory - When launching multiple processing blocks, efficiency may be compromised I don’t actually see a real comparison between gpudev and DWA. If in the future we’ll expose some GPU workload through the gpudev library, it will be for some network specific and well-defined problems.