On Wed, Oct 20, 2021 at 2:12 AM Tom Herbert <t...@herbertland.com> wrote:
>
> On Tue, Oct 19, 2021 at 12:36 PM Jerin Jacob <jerinjac...@gmail.com> wrote:
> >
> > On Wed, Oct 20, 2021 at 12:38 AM Thomas Monjalon <tho...@monjalon.net> 
> > wrote:
> > >
> > > 19/10/2021 20:14, jer...@marvell.com:
> > > > Definition of Dataplane Workload Accelerator
> > > > --------------------------------------------
> > > > Dataplane Workload Accelerator(DWA) typically contains a set of CPUs,
> > > > Network controllers and programmable data acceleration engines for
> > > > packet processing, cryptography, regex engines, baseband processing, 
> > > > etc.
> > > > This allows DWA to offload  compute/packet processing/baseband/
> > > > cryptography-related workload from the host CPU to save the cost and 
> > > > power.
> > > > Also to enable scaling the workload by adding DWAs to the Host CPU as 
> > > > needed.
> > > >
> > > > Unlike other devices in DPDK, the DWA device is not fixed-function
> > > > due to the fact that it has CPUs and programmable HW accelerators.
> > > > This enables DWA personality/workload to be completely programmable.
> > > > Typical examples of DWA offloads are Flow/Session management,
> > > > Virtual switch, TLS offload, IPsec offload, l3fwd offload, etc.
> > >
> > > If I understand well, the idea is to abstract the offload
> > > of some stack layers in the hardware.
> >
> > Yes. It may not just HW, For expressing the complicated workloads
> > may need CPU and/or other HW accelerators.
> >
> > > I am not sure we should give an API for such stack layers in DPDK.
> >
> > Why not?
> >
> > > It looks to be the role of the dataplane application to finely manage
> > > how to use the hardware for a specific dataplane.
> >
> > It is possible with this scheme.
> >
> > > I believe the API for such layer would be either too big, or too limited,
> > > or not optimized for specific needs.
> >
> > It will be optimized for specific needs as applications ask for what to do?
> > not how to do?
> >
> > > If we really want to automate or abstract the HW/SW co-design,
> > > I think we should better look at compiler work like P4 or PANDA.
> >
> > The compiler stuff is very static in nature. It can address the packet
> > transformation
> > workloads. Not the ones like IPsec or baseband offload.
> > Another way to look at it, GPU RFC started just because you are not able
> > to express all the workload in P4.
>
> Hi,
>
> Indeed, you may want to look at PANDA
> (https://github.com/panda-net/panda) for this purpose especially with
> regard to HW/SW co-design. Fundamentally, it is C/C++ so it's "Turing
> Complete" as far as being able to express arbitrary workloads. The
> program structure abstracts out any underlying details of the runtime
> environment (hence it is "right once, run anywhere"). It is the
> auspices of a compiler to convert the user's expression of intent into
> optimized code for the backend; the backends can be software, such as
> DPDK (which PANDA will soon support), or even hardware. In any case,
> the code emitted will be optimized per the environment, taking
> advantage of hardware acceleration for instance, which leads to the
> PANDA mantra "write once, run anywhere, run well". This does require
> APIs to control hardware acceleration, but our goal is to hide that
> complexity from the user without the loss of benefits of emerging
> hardware features. Also with emerging compiler techniques like LLVM's
> MLIR and dynamically defined instructions, the historically "static"
> nature of compilers can be undone.

Thanks for the insight.
These technologies we are using in Accelerator device and not host CPU
interface. host-side is all about memory management, communication, and
expressing the workload. And workload like vDPU application to
offload ORAN 7.2 split for 5G RU device is much beyond packet
processing.

>
> Tom
>
>
> >
> > >
> > >

Reply via email to