On 09/12/2017 08:51 AM, Jerin Jacob wrote:
Tuesday, September 12, 2017 7:01 AM, Jerin Jacob:
Yes, only when ETH_TXQ_FLAGS_NOMULTMEMP and
ETH_TXQ_FLAGS_NOREFCOUNT selected at tx queue configuration.

So literally, yes it is not a TX HW offload, though I understand your
intention to have such possibility - it might help to save some cycles.
It not a few cycles. We could see ~24% drop on per core(with 64B) with
testpmd and l3fwd on some SoCs. It is not very specific to nicvf HW, The
problem is with limited cache hierarchy in very low end arm64 machines.
For TX buffer recycling case, it need to touch the mbuf again to find out the
associated mempool to free. It is fine if application demands it but not all the
application demands it.

We have two category of arm64 machines, The high end machine where
cache hierarchy similar x86 server machine. The low end ones with very
limited cache resources. Unfortunately, we need to have the same binary on
both machines.


Wonder would some new driver specific function would help in that case?
nicvf_txq_pool_setup(portid, queueid, struct rte_mempool *txpool,
uint32_t flags); or so?
It is possible, but how do we make such change in testpmd, l3fwd or ipsec-
gw in tree application which does need only NOMULTIMEMP &
NOREFCOUNT.

If there is concern about making it Tx queue level it is fine. We can move
from queue level to port level or global level.
IMO, Application should express in some form that it wants only
NOMULTIMEMP & NOREFCOUNT and thats is the case for l3fwd and ipsec-
gw

I understand the use case, and the fact those flags improve the performance on 
low-end ARM CPUs.
IMO those flags cannot be on queue/port level. They must be global.
Where should we have it as global(in terms of API)?
And why it can not be at port level?

I think port level is the right place for these flags. These flags define which
transmit and transmit cleanup callbacks could be used. These functions are
specified on port level now. However, I see no good reasons to change it.
It will complicate the possibility to make transmit and transmit cleanup callback
per queue (not per port as now).
All three (no-multi-seg, no-multi-mempool, no-reference-counter) are from
one group and should go together.

Even though the use-case is generic the nicvf PMD is the only one which do such 
optimization.
So am suggesting again - why not expose it as a PMD specific parameter?
Why to make it as PMD specific? if application can express it though
normative DPDK APIs.

- The application can express it wants such optimization.
- It is global

Currently it does not seems there is high demand for such flags from other 
PMDs. If such demand will raise, we can discuss again on how to expose it 
properly.
It is not PMD specific. It is all about where it runs? it will
applicable for any PMD that runs low end hardwares where it need SW
based Tx buffer recycling(The NPU is different story as it has HW
assisted mempool manager).
What we are loosing by running DPDK effectively on low end hardware
with such "on demand" runtime configuration though DPDK normative API.

+1 and it improves performance on amd64 as well, definitely less than 24%,
but noticeable. If application architecture meets these conditions, why don't
allow it use the advantage and run faster.

Reply via email to