[dpdk-dev] Redirection Table

2014-01-07 Thread Ivan Boule
On 01/06/2014 05:52 PM, Michael Quicquaro wrote:
> Thanks for the details.  Can the hash function be modified so that I 
> can provide my own RSS function?  i.e.  my ultimate goal is to provide 
> RSS that is not dependent on packet contents.
No, the RSS function is "hard-wired" and only works on IPv4/IPv6 
packets. All other packets are stored in the same queue (0 by default).
You can change the RSS key used by the RSS function to compute the hash 
value.
See the following testpmd command:

port config X rss-hash-key <80 hexa digits>

to set the 320-bit RSS key of port X.

Best regards,
Ivan

> You may have seen my thread "generic load balancing".  At this point, 
> I'm realizing that the only way to accomplish this is to let the 
> packets land where they may (the queue where the NIC places the 
> packet) and distribute them (to other queues) by having some of the 
> CPU processing devoted to this task.  Can you verify this?
>
> Regards,
> - Michael.
>
>
> On Mon, Jan 6, 2014 at 10:21 AM, Ivan Boule  > wrote:
>
> On 12/31/2013 08:45 PM, Michael Quicquaro wrote:
>
> Has anyone used the "port config all reta (hash,queue)"
> command of testpmd
> with any success?
>
> I haven't found much documentation on it.
>
> Can someone provide an example on why and how it was used.
>
> Regards and Happy New Year,
> Michael Quicquaro
>
> Hi Michael,
>
> "RETA" stands for Redirection Table.
> It is a per-port configurable table of 128 entries that is used by the
> RSS filtering feature of Intel 1GbE and 10GbE controllers to
> select the
> RX queue into which to store a received IP packet.
> When receiving an IPv4/IPv6 packet, the controller computes a 32-bit
> hash on:
>
>   * the source address and the destination address of the IP header of
> the packet,
>   * the source port and the destination port of the UDP/TCP
> header, if any.
>
> Then, the controller takes the 7 lower bits of the RSS hash as an
> index
> into the RETA table to get the RX queue number where to store the
> packet.
>
> The API of the DPDK includes a function that is exported by Poll Mode
> Drivers to configure RETA entries of a given port.
>
> For test purposes, the testpmd application includes the following
> command
>
> "port config X rss reta (hash,queue)[,(hash,queue)]"
>
> to configure RETA entries of a port X, with each couple (hash,queue)
> contains the index of a RETA entry (between 0 and 127 included)
> and the
> RX queue number (between 0 and 15) to be stored into that RETA entry.
>
> Best regards
> Ivan
>
> -- 
> Ivan Boule
> 6WIND Development Engineer
>
>


-- 
Ivan Boule
6WIND Development Engineer



[dpdk-dev] Redirection Table

2014-01-07 Thread Stefan Baranoff
All,

Does this mean that an application looking at traffic in something like an
IP/IP or GRE tunnel with only two endpoints on the tunnels but many clients
behind them must do software load balancing as the packets would IP only
(not TCP/UDP) with the same two addresses?

How much of a penalty is there for crossing processor boundaries in that
case and might a 1 CPU server, while less core dense, actually give better
performance/watt?

Thanks,
Stefan

Sent from my smart phone; people don't make typos, Swype does!
On Jan 7, 2014 3:36 AM, "Ivan Boule"  wrote:

> On 01/06/2014 05:52 PM, Michael Quicquaro wrote:
>
>> Thanks for the details.  Can the hash function be modified so that I can
>> provide my own RSS function?  i.e.  my ultimate goal is to provide RSS that
>> is not dependent on packet contents.
>>
> No, the RSS function is "hard-wired" and only works on IPv4/IPv6 packets.
> All other packets are stored in the same queue (0 by default).
> You can change the RSS key used by the RSS function to compute the hash
> value.
> See the following testpmd command:
>
>port config X rss-hash-key <80 hexa digits>
>
> to set the 320-bit RSS key of port X.
>
> Best regards,
> Ivan
>
>  You may have seen my thread "generic load balancing".  At this point, I'm
>> realizing that the only way to accomplish this is to let the packets land
>> where they may (the queue where the NIC places the packet) and distribute
>> them (to other queues) by having some of the CPU processing devoted to this
>> task.  Can you verify this?
>>
>> Regards,
>> - Michael.
>>
>>
>> On Mon, Jan 6, 2014 at 10:21 AM, Ivan Boule > ivan.boule at 6wind.com>> wrote:
>>
>> On 12/31/2013 08:45 PM, Michael Quicquaro wrote:
>>
>> Has anyone used the "port config all reta (hash,queue)"
>> command of testpmd
>> with any success?
>>
>> I haven't found much documentation on it.
>>
>> Can someone provide an example on why and how it was used.
>>
>> Regards and Happy New Year,
>> Michael Quicquaro
>>
>> Hi Michael,
>>
>> "RETA" stands for Redirection Table.
>> It is a per-port configurable table of 128 entries that is used by the
>> RSS filtering feature of Intel 1GbE and 10GbE controllers to
>> select the
>> RX queue into which to store a received IP packet.
>> When receiving an IPv4/IPv6 packet, the controller computes a 32-bit
>> hash on:
>>
>>   * the source address and the destination address of the IP header of
>> the packet,
>>   * the source port and the destination port of the UDP/TCP
>> header, if any.
>>
>> Then, the controller takes the 7 lower bits of the RSS hash as an
>> index
>> into the RETA table to get the RX queue number where to store the
>> packet.
>>
>> The API of the DPDK includes a function that is exported by Poll Mode
>> Drivers to configure RETA entries of a given port.
>>
>> For test purposes, the testpmd application includes the following
>> command
>>
>> "port config X rss reta (hash,queue)[,(hash,queue)]"
>>
>> to configure RETA entries of a port X, with each couple (hash,queue)
>> contains the index of a RETA entry (between 0 and 127 included)
>> and the
>> RX queue number (between 0 and 15) to be stored into that RETA entry.
>>
>> Best regards
>> Ivan
>>
>> -- Ivan Boule
>> 6WIND Development Engineer
>>
>>
>>
>
> --
> Ivan Boule
> 6WIND Development Engineer
>
>


[dpdk-dev] rte_mempools / rte_rings thread safe?

2014-01-07 Thread Olivier MATZ
Hi Jyotiswarup,

On 12/28/2013 03:55 PM, Jyotiswarup Raiturkar wrote:
 > The rte_mempool and rte_ring libs have
 > multi-producer/multi-consumer versions. But it's also mentioned
 > in the header files that the implementation is not
 > pre-emtable : " Note: the mempool implementation is not
 > preemptable. A lcore must not be interrupted by another task
 > that uses the same mempool (because it uses a ring which is not
 > preemptable)"

Correct.

 > - Does having mutually exclusive core masks for a set of
 > threads which use the ring/mempool suffice for thread safety
 > (threads will have different core ids but they will not be
 > pinned to cores) ?

As you noticed, and as explained in [1], one problem with the mempool
is related to the cache structure, which is table indexed by the
lcore_id. This cache structure is accessed without lock as it is
supposed that each pthread has its own lcore_id. So having different
lcore_ids would solve this problem even if the pthreads are not pinned
to different cores.

But this is not the only problem. One design paradigm of the dpdk
is that the execution units are never interrupted. Let's imagine the
following case, similar to what you describe:

- 2 pthreads are running on the same cpu (taskset on with only one
   bit set in the mask). Their respective lcore_id (internal variable)
   are 0 and 1.
- pthread 0 takes a spinlock
- the kernel interrupts pthread 0, and schedules pthread 1
- pthread 1 wants to take the same spinlock, but it cannot be
   released by pthread 0 as it is not running
- pthread 1 will loop using 100% of the cpu for nothing

The same can occur in rte_ring. If a pthread is interrupted
in an enqueue process, between lines 392 and 414 (see [2]), it would
prevent any other pthread to enqueue another object.

So, as you can see, doing this could lead to significant performance
issues.

[1] http://www.dpdk.org/ml/archives/dev/2013-December/001002.html

[2] 
http://dpdk.org/browse/dpdk/tree/lib/librte_ring/rte_ring.h?id=142dfe1eedb215bd2a0762afcc65ef5a7fba10aa

 > - If i want to use this data structures in a pthread ( created
 > outside of DPDK environment), is it ok to use this if i
 > do "RTE_DEFINE_PER_LCORE(unsigned, _lcore_id);" with a core_id
 > exclusive of all other core masks for DPDK processes.

The short answer is no.

However, it is possible to use a rte_ring between an external process
and a dpdk application at the following conditions:

(on the external app, the pthreads must be pinned to a cpu
   OR
  on the external app the ring must be single consumer/producer)
   AND
(external apps must do enqueues only and dpdk app dequeues only
   OR
  external apps must do dequeues only and dpdk app enqueues only)

Regards,
Olivier