CC freebsd-net@ for wider discussion.

Hi Adrian,

Many thanks for the explanation.  I checked the if_igb.c  and found the flowid 
field was set in the RX side in igb_rxeof():

Igb_rxeof()
{
 ...
#ifdef  RSS
                        /* XXX set flowtype once this works right */
                        rxr->fmp->m_pkthdr.flowid =
                            le32toh(cur->wb.lower.hi_dword.rss);
                        rxr->fmp->m_flags |= M_FLOWID;
 ...
}

I have two questions regarding this. 

1. Is the RSS hash value stored in cur->wb.lower.hi_dword.rss set by the NIC 
hardware?
2. So the hash value and m_flags are stored in the mbuf related to the received 
packet on the rx side(lgb_rxeof()). But we check the hash value and m_flags in 
mbuf related to the send packet on the tx side (in igb_mq_start()). Does the 
kernel re-use the same mbuf for tx? If so, how does it know for the same 
network stream it should use the same mbuf got from the rx for packet sending? 
If not, how does the kernel preserve the same hash value across the rx mbuf and 
tx mbuf for same network stream? This seems quite magical to me.

For the Hyper-V case, the host controls which vCPU it wants to interrupt. And 
the rule can change dynamically based on the load. For a non-busy VM, host will 
send most packets to same vCPU for power saving purpose. For a busy VM, host 
will distribute the packets evenly across all vCPUs. This means host could 
change the RSS bucket mapping dynamically. Hyper-V does this by sending a 
mapping table to VM whenever the it needs update. This also means we cannot use 
FreeBSD's own bucket mapping which I believe is fixed. Also Hyper-V use its own 
hash key. So do you think it is possible we still use the exisiting RSS 
infrastructure built in FreeBSD in this purpose?

Thanks so much,
Wei



-----Original Message-----
From: adrian.ch...@gmail.com [mailto:adrian.ch...@gmail.com] On Behalf Of 
Adrian Chadd
Sent: Saturday, August 9, 2014 3:39 AM
To: Wei Hu
Cc: d...@delphij.net
Subject: Re: vRSS support on FreeBSD

Hi!

On 8 August 2014 04:43, Wei Hu <w...@microsoft.com> wrote:
> Hi Adrian,
>
> My name is Wei Hu. I work for Microsoft OSTC (Open Source Technology Center) 
> in Shanghai, China.
>
> Microsoft is investing on FreeBSD running on its Hyper-V virtualization 
> environment. As the result, I am trying to bring the performance of FreeBSD 
> on Hyper-V in par with other guest operating systems (such as Linux and 
> Windows). One of the key network features I am trying to add is vRSS (Virtual 
> Receive Side Scaling) into our existing netvsc driver on FreeBSD.

Cool!

> Currently we already have NIC driver called netvsc (under 
> sys/dev/hyperv/netvsc) which drives a synthetic virtual NIC device provided 
> by Hyper-V. The driver only supports one H/W network queue as of FreeBSD 10. 
> I am responsible of adding both multiqueue and RSS support into this driver. 
> Xin Li told me that you have done works on RSS. I wonder if you can help me 
> with some questions I am having currently with regard to these two features.

I can try!

> 1. Tx multiqueue support. I am looking at an existing driver if_igb.c for 
> some clues. It looks to me I just need to set a proper multiqueue aware 
> function to ifp->if_transit. Inside this function I can select which tx queue 
> I need to send the packet. Is this all I need to do for the tx queue? How do 
> I make sure this procedure also happens on the CPU what this tx queue binds 
> to? I want to distribute the send workload on all available CPUs. Does the 
> kernel automatically select a proper CPU to call ifp->if_transit? Or I have 
> to do the distribution by driver itself inside ifp->if_transit routine?

So the network stack doesn't enforce affinity - it's just doing parallelism. 
The RSS stuff i'm working with is trying to enforce affinity.

The if_transmit() method for a given mbuf can be called on any CPU and they can 
be called from multiple CPUs at the same time.

Each driver has a different way of mapping an mbuf to a given destination TX 
queue. The naive way is just to queue the mbuf into the TX queue of the current 
CPU. The "nicer" way is to hash the m_pkthdr.flowid value to choose a TX queue. 
The RSS way is to see if m_pkthdr.flowid is an RSS hash and if so, choose the 
destination TX queue based on the RSS bucket for the given hash.

It'll be up to the higher layers (ie, network stack and userland) to distribute 
transmit work to all CPUs.

So yes, you have to do the distribution inside the driver yourself for now. You 
saw what I did for ixgbe/igb for RSS transmit.

> 2. Rx multiqueue support. The received packet could end up on different rx 
> queues. Each rx queue is bind to a CPU. So depending on the which queue the 
> packet arrives, it will be processed on a different CPU. My question is: do I 
> need to set anything in the receive packet to inform the upper layer which 
> queue the packet was received? If it is RSS enabled, do I need to information 
> the upper layer IP fields were hashed (IP, TCP, etc) to select the queue so 
> the upper layer knows which queue the response should be send to?

This is where flowid comes into it.

Initially, flowid was just an opaque token provided by the NIC driver to 
represent which queue the given packet came in on. It was then propagated 
throughout the network stack so transmit would also occur using the same 
flowid. This was done purely to keep packets in a flow in-order on the same 
queue rather than having them go out to different CPUs based on which CPU the 
scheduler decided to run things on.

The RSS work mutates that a little so the flowid _can_ be one of the RSS 
hashtypes. If it is, then the driver should tag the mbuf with the RSS hash 
value in flowid and set the mbuf hash type to the relevant RSS hash type.

So what you need to do is:

* create one rx queue per RSS bucket;
* get the RSS bucket -> CPU mapping from in_rss.c;
* CPU pin things appropriately;
* make sure the trafffic for a given RSS hash -> RSS bucket ends up in the 
right RX queue;
* make sure mbuf hash flowid set, the hash type set, and the M_FLOWID flag set.

> 3. RSS hash. I found the file sys/netinet/in_rss.c contains RSS support 
> already in FreeBSD head. But I don't know how to use it. I checked the same 
> driver if_igb.c. It has following code:

> static int
> igb_mq_start(struct ifnet *ifp, struct mbuf *m) {
>   ...
> #ifdef  RSS
>         uint32_t                bucket_id;
> #endif
>
>         /* Which queue to use */
>         /*
>          * When doing RSS, map it to the same outbound queue
>          * as the incoming flow would be mapped to.
>          *
>          * If everything is setup correctly, it should be the
>          * same bucket that the current CPU we're on is.
>          */
>         if ((m->m_flags & M_FLOWID) != 0) { #ifdef  RSS
>                 if (rss_hash2bucket(m->m_pkthdr.flowid,
>                     M_HASHTYPE_GET(m), &bucket_id) == 0) {
>                         /* XXX TODO: spit out something if bucket_id > 
> num_queue s? */
>                         i = bucket_id % adapter->num_queues;
>                 } else {
> #endif
>                         i = m->m_pkthdr.flowid % adapter->num_queues; 
> ...
> }
>
> What exactly is the m_pkthdr.flowid field? Does it already contain the IP and 
> TCP hash of the sending packet? When was it set and M_HASHTYPE_GET(m) 
> returning proper hash type?

This is for transmit. This is done to ensure that when transmitting, traffic 
going to a given RSS bucket ends up in the right destination RSS bucket / TX 
queue. If it's all setup correctly and userland knows how to CPU pin things, 
it'll be scheduled to the same CPU the userland
thread(s) are running on. If it's not (eg a non-RSS-aware program) then it'll 
schedule the transmit to go out on the correct, non-CPU-local destination TX 
queue so packets are kept in-order.

I hope that helps!

If you'd like to ask more questions please CC one of the mailing lists like 
freebsd-net@. I'd like others to see the answers and participate in the 
discussion. :)



-a
_______________________________________________
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"

Reply via email to