On Tue, Oct 19, 2021 at 12:36 PM Jerin Jacob <jerinjac...@gmail.com> wrote: > > On Wed, Oct 20, 2021 at 12:38 AM Thomas Monjalon <tho...@monjalon.net> wrote: > > > > 19/10/2021 20:14, jer...@marvell.com: > > > Definition of Dataplane Workload Accelerator > > > -------------------------------------------- > > > Dataplane Workload Accelerator(DWA) typically contains a set of CPUs, > > > Network controllers and programmable data acceleration engines for > > > packet processing, cryptography, regex engines, baseband processing, etc. > > > This allows DWA to offload compute/packet processing/baseband/ > > > cryptography-related workload from the host CPU to save the cost and > > > power. > > > Also to enable scaling the workload by adding DWAs to the Host CPU as > > > needed. > > > > > > Unlike other devices in DPDK, the DWA device is not fixed-function > > > due to the fact that it has CPUs and programmable HW accelerators. > > > This enables DWA personality/workload to be completely programmable. > > > Typical examples of DWA offloads are Flow/Session management, > > > Virtual switch, TLS offload, IPsec offload, l3fwd offload, etc. > > > > If I understand well, the idea is to abstract the offload > > of some stack layers in the hardware. > > Yes. It may not just HW, For expressing the complicated workloads > may need CPU and/or other HW accelerators. > > > I am not sure we should give an API for such stack layers in DPDK. > > Why not? > > > It looks to be the role of the dataplane application to finely manage > > how to use the hardware for a specific dataplane. > > It is possible with this scheme. > > > I believe the API for such layer would be either too big, or too limited, > > or not optimized for specific needs. > > It will be optimized for specific needs as applications ask for what to do? > not how to do? > > > If we really want to automate or abstract the HW/SW co-design, > > I think we should better look at compiler work like P4 or PANDA. > > The compiler stuff is very static in nature. It can address the packet > transformation > workloads. Not the ones like IPsec or baseband offload. > Another way to look at it, GPU RFC started just because you are not able > to express all the workload in P4.
Hi, Indeed, you may want to look at PANDA (https://github.com/panda-net/panda) for this purpose especially with regard to HW/SW co-design. Fundamentally, it is C/C++ so it's "Turing Complete" as far as being able to express arbitrary workloads. The program structure abstracts out any underlying details of the runtime environment (hence it is "right once, run anywhere"). It is the auspices of a compiler to convert the user's expression of intent into optimized code for the backend; the backends can be software, such as DPDK (which PANDA will soon support), or even hardware. In any case, the code emitted will be optimized per the environment, taking advantage of hardware acceleration for instance, which leads to the PANDA mantra "write once, run anywhere, run well". This does require APIs to control hardware acceleration, but our goal is to hide that complexity from the user without the loss of benefits of emerging hardware features. Also with emerging compiler techniques like LLVM's MLIR and dynamically defined instructions, the historically "static" nature of compilers can be undone. Tom > > > > >