On Wed, Oct 20, 2021 at 2:12 AM Tom Herbert <t...@herbertland.com> wrote: > > On Tue, Oct 19, 2021 at 12:36 PM Jerin Jacob <jerinjac...@gmail.com> wrote: > > > > On Wed, Oct 20, 2021 at 12:38 AM Thomas Monjalon <tho...@monjalon.net> > > wrote: > > > > > > 19/10/2021 20:14, jer...@marvell.com: > > > > Definition of Dataplane Workload Accelerator > > > > -------------------------------------------- > > > > Dataplane Workload Accelerator(DWA) typically contains a set of CPUs, > > > > Network controllers and programmable data acceleration engines for > > > > packet processing, cryptography, regex engines, baseband processing, > > > > etc. > > > > This allows DWA to offload compute/packet processing/baseband/ > > > > cryptography-related workload from the host CPU to save the cost and > > > > power. > > > > Also to enable scaling the workload by adding DWAs to the Host CPU as > > > > needed. > > > > > > > > Unlike other devices in DPDK, the DWA device is not fixed-function > > > > due to the fact that it has CPUs and programmable HW accelerators. > > > > This enables DWA personality/workload to be completely programmable. > > > > Typical examples of DWA offloads are Flow/Session management, > > > > Virtual switch, TLS offload, IPsec offload, l3fwd offload, etc. > > > > > > If I understand well, the idea is to abstract the offload > > > of some stack layers in the hardware. > > > > Yes. It may not just HW, For expressing the complicated workloads > > may need CPU and/or other HW accelerators. > > > > > I am not sure we should give an API for such stack layers in DPDK. > > > > Why not? > > > > > It looks to be the role of the dataplane application to finely manage > > > how to use the hardware for a specific dataplane. > > > > It is possible with this scheme. > > > > > I believe the API for such layer would be either too big, or too limited, > > > or not optimized for specific needs. > > > > It will be optimized for specific needs as applications ask for what to do? > > not how to do? > > > > > If we really want to automate or abstract the HW/SW co-design, > > > I think we should better look at compiler work like P4 or PANDA. > > > > The compiler stuff is very static in nature. It can address the packet > > transformation > > workloads. Not the ones like IPsec or baseband offload. > > Another way to look at it, GPU RFC started just because you are not able > > to express all the workload in P4. > > Hi, > > Indeed, you may want to look at PANDA > (https://github.com/panda-net/panda) for this purpose especially with > regard to HW/SW co-design. Fundamentally, it is C/C++ so it's "Turing > Complete" as far as being able to express arbitrary workloads. The > program structure abstracts out any underlying details of the runtime > environment (hence it is "right once, run anywhere"). It is the > auspices of a compiler to convert the user's expression of intent into > optimized code for the backend; the backends can be software, such as > DPDK (which PANDA will soon support), or even hardware. In any case, > the code emitted will be optimized per the environment, taking > advantage of hardware acceleration for instance, which leads to the > PANDA mantra "write once, run anywhere, run well". This does require > APIs to control hardware acceleration, but our goal is to hide that > complexity from the user without the loss of benefits of emerging > hardware features. Also with emerging compiler techniques like LLVM's > MLIR and dynamically defined instructions, the historically "static" > nature of compilers can be undone.
Thanks for the insight. These technologies we are using in Accelerator device and not host CPU interface. host-side is all about memory management, communication, and expressing the workload. And workload like vDPU application to offload ORAN 7.2 split for 5G RU device is much beyond packet processing. > > Tom > > > > > > > > > >