On 11/13/2017 10:33 PM, Ilya Matveychikov wrote:
On Nov 13, 2017, at 9:15 PM, Adrien Mazarguil <adrien.mazarg...@6wind.com> 
wrote:

On Mon, Nov 13, 2017 at 02:56:23PM +0400, Ilya Matveychikov wrote:
On Nov 13, 2017, at 2:39 PM, Adrien Mazarguil <adrien.mazarg...@6wind.com> 
wrote:

On Sat, Nov 11, 2017 at 09:18:45PM +0400, Ilya Matveychikov wrote:
Folks,

Are you serious with it:

typedef uint16_t (*eth_rx_burst_t)(void *rxq,
                                   struct rte_mbuf **rx_pkts,
                                   uint16_t nb_pkts);
typedef uint16_t (*eth_tx_burst_t)(void *txq,
                                   struct rte_mbuf **tx_pkts,
                                   uint16_t nb_pkts);

I’m not surprised that every PMD stores port_id in every and each queue as 
having just the queue as an argument doesn’t allow to get the device. So the 
question is - why not to use something like:

typedef uint16_t (*eth_rx_burst_t)(void *dev, uint16_t queue_id,
                                   struct rte_mbuf **rx_pkts,
                                   uint16_t nb_pkts);
typedef uint16_t (*eth_tx_burst_t)(void *dev, uint16_t queue_id,
                                   struct rte_mbuf **tx_pkts,
                                   uint16_t nb_pkts);
I assume it's since the rte_eth_[rt]x_burst() wrappers already pay the price
for that indirection, doing it twice would be redundant.
No need to do it twice, agree. We can pass dev pointer as well as queue, not 
just the queue’s
index.

Basically the cost of storing a back-pointer to dev or a queue index in each
Rx/Tx queue structure is minor compared to saving a couple of CPU cycles
wherever we can.
Not sure about it. More data to store - more cache space to occupy. Note that 
every queue has
at least 4 bytes more than it actually needs. And RTE_MAX_QUEUES_PER_PORT is 
defined
by it’s default to 1024. So we may have 4k extra for each port....
Note that queues are only allocated if requested by application, there's
really not much overhead involved.
Yeah, mostly you are right here.

Also to echo Konstantin's reply and clarify mine, PMDs normally do not
access this structure from their data plane. This pointer, if needed, is
normally stored away from hot regions accessed during TX/RX, usually at the
end of TX/RX structures and only for the convenience of management
operations. It therefore has no measurable impact on the CPU cache.

I did a research of how drivers implements rx/tx queues and now I want to share 
the information
and some thoughts about it (see the info at the end):

1) All drivers have tx/rx queues defined as structures
2) Current design implies that it’s enough to pass opaque rx/tx queue to the 
driver and frankly
    speaking it is. But..
3) Most of drivers wants to get not only the queue’s pointer but at least 
queue_id and port_id and
    most of them wants to have the pointer to internal devices’ data also.

The way each driver solves (3) issue is data duplication. In other words, every 
queue used to have
such the information (queue_id, port_id and dev_priv pointer) inside.

My question was and still about such the design. Not sure that it’s the best 
way to do it keeping in
mind that queue_id may be calculated using pointer difference and port_id may 
be stored just only
once per device. But it’ll require to change internal interface, sure.

And as I promised here is the result of the research on rx/tx queues:

<...>

drivers/net/sfq:
   struct sfc_ef10_rxq { dp { queue_id, port_id } }
   struct sfc_ef10_txq { dp { queue_id, port_id } }

Which are not used on data/fast path (as Adrien state above). So, it does not affect cache usage efficient etc.

Reply via email to