On Tue, Oct 19, 2021 at 12:36 PM Jerin Jacob <jerinjac...@gmail.com> wrote:
>
> On Wed, Oct 20, 2021 at 12:38 AM Thomas Monjalon <tho...@monjalon.net> wrote:
> >
> > 19/10/2021 20:14, jer...@marvell.com:
> > > Definition of Dataplane Workload Accelerator
> > > --------------------------------------------
> > > Dataplane Workload Accelerator(DWA) typically contains a set of CPUs,
> > > Network controllers and programmable data acceleration engines for
> > > packet processing, cryptography, regex engines, baseband processing, etc.
> > > This allows DWA to offload  compute/packet processing/baseband/
> > > cryptography-related workload from the host CPU to save the cost and 
> > > power.
> > > Also to enable scaling the workload by adding DWAs to the Host CPU as 
> > > needed.
> > >
> > > Unlike other devices in DPDK, the DWA device is not fixed-function
> > > due to the fact that it has CPUs and programmable HW accelerators.
> > > This enables DWA personality/workload to be completely programmable.
> > > Typical examples of DWA offloads are Flow/Session management,
> > > Virtual switch, TLS offload, IPsec offload, l3fwd offload, etc.
> >
> > If I understand well, the idea is to abstract the offload
> > of some stack layers in the hardware.
>
> Yes. It may not just HW, For expressing the complicated workloads
> may need CPU and/or other HW accelerators.
>
> > I am not sure we should give an API for such stack layers in DPDK.
>
> Why not?
>
> > It looks to be the role of the dataplane application to finely manage
> > how to use the hardware for a specific dataplane.
>
> It is possible with this scheme.
>
> > I believe the API for such layer would be either too big, or too limited,
> > or not optimized for specific needs.
>
> It will be optimized for specific needs as applications ask for what to do?
> not how to do?
>
> > If we really want to automate or abstract the HW/SW co-design,
> > I think we should better look at compiler work like P4 or PANDA.
>
> The compiler stuff is very static in nature. It can address the packet
> transformation
> workloads. Not the ones like IPsec or baseband offload.
> Another way to look at it, GPU RFC started just because you are not able
> to express all the workload in P4.

Hi,

Indeed, you may want to look at PANDA
(https://github.com/panda-net/panda) for this purpose especially with
regard to HW/SW co-design. Fundamentally, it is C/C++ so it's "Turing
Complete" as far as being able to express arbitrary workloads. The
program structure abstracts out any underlying details of the runtime
environment (hence it is "right once, run anywhere"). It is the
auspices of a compiler to convert the user's expression of intent into
optimized code for the backend; the backends can be software, such as
DPDK (which PANDA will soon support), or even hardware. In any case,
the code emitted will be optimized per the environment, taking
advantage of hardware acceleration for instance, which leads to the
PANDA mantra "write once, run anywhere, run well". This does require
APIs to control hardware acceleration, but our goal is to hide that
complexity from the user without the loss of benefits of emerging
hardware features. Also with emerging compiler techniques like LLVM's
MLIR and dynamically defined instructions, the historically "static"
nature of compilers can be undone.

Tom


>
> >
> >

Reply via email to