On Mon, 2019-02-04 at 19:32 +0100, Damjan Marion wrote:

On 4 Feb 2019, at 14:19, Jerin Jacob Kollanukkaran 
<jer...@marvell.com<mailto:jer...@marvell.com>> wrote:

On Sun, 2019-02-03 at 21:13 +0100, Damjan Marion wrote:
External Email

On 3 Feb 2019, at 20:13, Saxena, Nitin 
<nitin.sax...@cavium.com<mailto:nitin.sax...@cavium.com>> wrote:

Hi Damjan,

See function octeontx_fpa_bufpool_alloc() called by octeontx_fpa_dequeue(). Its 
a single read instruction to get the pointer of data.

Yeah saw that, and today vpp buffer manager can grab up to 16 buffer indices 
with one instructions so no big deal here....

Similarly, octeontx_fpa_bufpool_free() is also a single write instruction.

So, If you are able to prove with numbers that current software solution is 
low-performant and that you are confident that you can do significantly better, 
I will be happy to work with you on implementing support for hardware buffer 
manager.
First of all I welcome your patch as we were also trying to remove latencies 
seen by memcpy_x4() of buffer template. As I said earlier hardware buffer 
coprocessor is being used by other packet engines hence the support has to be 
added in VPP. I am looking for suggestion for its resolution.

You can hardly get any suggestion from my side if you are ignoring my 
questions, which I asked in my previous email to get better understanding of 
what your hardware do.

"It is hardware so it is fast" is not real argument, we need real datapoints 
before investing time into this area....


Adding more details of HW mempool manger attributes:

1) Semantically HW mempool manager is same as SW mempool manger
2) HW mempool mangers has "alloc/dequeue" and "free/enqueue" operation as SW 
mempool manager
3) HW mempool mangers can work with SW per core local cache scheme too
4) user metadata initialization is not done in HW. SW needs to do before free() 
or after alloc()
5) Typically it has an operation to "Dont free" the packet after Tx. Which can 
be used as back end to clone the packet(aka reference count schemes)
6) How does HW pool manger improves the performance:
- MP/MC can work without locks(HW takes care internally)
- HW Frees the buffer on Tx unlike core does in SW mempool case. So it does 
save CPU cycles packet Tx and cost of bringing packet again
in L1 cache.
- On the RX side, HW alloc/dequeue packet from mempool. No SW intervention 
required.

In terms of abstraction. DPDK mempool manger does abstract SW and HW mempool 
though static struct rte_mempool_ops.

Limitations:
1) Some NPU packet processing HW can work only with HW mempool manger.(Aka it 
can not work with SW mempool manager
as on the RX, HW looks for mempool manager to alloc and then form the packet)

Using DPDK abstractions will enable to write agositic software which works NPU 
and CPUs models.

VPP is not DPDK application so that doesn't work for us. DPDK is just one 
optional device driver access method
and I hear more and more people asking for VPP without DPDK.

We can implement hardware buffer manager support in VPP, but honestly I'm not 
convinced that will bring any huge value and
justify time investment. I would like that somebody proves me wrong, but with 
real data, not with statements like "it is hardware so it is faster".

I believe, I have listed the HW buffer manager attributes and how it works and 
what gain it gives(See point 6)
Need to do it if VPP needs to support NPU.
In terms of data point, What data point you would like to have?





--
Damjan

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#12166): https://lists.fd.io/g/vpp-dev/message/12166
Mute This Topic: https://lists.fd.io/mt/29655016/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Reply via email to