[dpdk-dev] Unable to get RSS to work in testpmd and load balancing question

2014-01-10 Thread Thomas Monjalon
Hello,

09/01/2014 10:49, Daniel Kan :
> The problem appears to be that rxmode.mq_mode was never set to ETH_MQ_RX_RSS
> in testpmd.c; it?s initialized to 0.

You're right. Its broken since the commit   "ETH_MQ_RX_NONE should disable 
RSS":
http://dpdk.org/browse/dpdk/commit/?id=243db2ddee3094a2cb39fdd4b17e26df4e7735e1

> There should probably be a configuration for that, or should be set when
> rxq > 1.

RSS can be configured or disabled with testpmd options or commands.
So it must be fixed in 2 places:
- in app/test-pmd/parameters.c for options
- in app/test-pmd/cmdline.c for commands
When setting rss_hf, mq_mode must be set accordingly.

Note that DCB feature can use mq_mode.

Thanks for the report. Patch is welcome :)
-- 
Thomas


[dpdk-dev] Unable to get RSS to work in testpmd and load balancing question

2014-01-10 Thread Choi, Sy Jong

Hi Dan,

I have tested with 6 flows with identical ip address, but varies UDP port 
number. I can see both queues with traffic.
Using the following command:-
sudo ./app/testpmd -c 0x1f -n 4 -- -i -rss-udp --portmask=0x03 --nb-cores=4 
--rxq=2 --txq=2


I have started with RSS IPv4, which is enabled by default.
The critical part is the traffic, since I only 2 queues, I am sending 6 flows 
with different IP addresses in order to see the flow got distributed evenly. Or 
else you might see only 1 queues if you have 2 flows they might load to a 
single queue only.  

My Command:-
sudo ./app/testpmd -c 0x1f -n 4 -- -i --portmask=0x03 --nb-cores=4 --rxq=2 
--txq=2
-   Using 4 cores
-   Rxq = 2 for each port, so 4 queues to 4 cores.



testpmd> show port stats all

?  NIC statistics for port 0? 
? RX-packets:? 6306519648??? RX-errors:? 757945685??? RX-bytes: 
309383840254
? TX-packets:?? 132592678??? TX-errors:? 0??? TX-bytes: 
8485925376

? Stats reg? 0 RX-packets: 2556150208??? RX-errors:? 0??? RX-bytes: 
116477417471
? Stats reg? 1 RX-packets: 3750369440??? RX-errors:? 0??? RX-bytes: 
192906422783
? Stats reg? 2 RX-packets:? 0??? RX-errors:? 0??? RX-bytes:?? 
???0
. 
. 
. 
? Stats reg 15 RX-packets:? 0??? RX-errors:? 0??? 
RX-bytes:? 0
? 

?  NIC statistics for port 1? 
? RX-packets:?? 132594048??? RX-errors:?? 13825889??? RX-bytes: 
8486020288
? TX-packets:? 6306522739??? TX-errors:? 0??? TX-bytes: 
231983528894

? Stats reg? 0 RX-packets:?? 83615783??? RX-errors: ?0??? RX-bytes: 
5351410624
? Stats reg? 1 RX-packets:?? 48978265??? RX-errors:? 0??? RX-bytes: 
3134609664
? Stats reg? 2 RX-packets:? 0??? RX-errors:? 0??? 
RX-bytes:? 0
. 
. 
. 
? Stats reg 15 RX-packets:? 0??? RX-errors:? 0??? 
RX-bytes:? 0
? 
testpmd>




My Command:-
sudo ./app/testpmd -c 0x1f -n 4 -- -i --portmask=0x03 --nb-cores=4 --rxq=2 
--txq=2
- Using 4 cores
- Rxq = 2 for each port, so 4 queues to 4 cores.

I use this command to map the queue statistic.
testpmd> set stat_qmap rx 0 0 0
testpmd> set stat_qmap rx 0 1 1
testpmd> set stat_qmap rx 1 0 0
testpmd> set stat_qmap rx 1 1 1
testpmd> start
? io packet forwarding - CRC stripping disabled - packets/burst=16
? nb forwarding cores=2 - nb forwarding ports=2
? RX queues=2 - RX desc=128 - RX free threshold=0
? RX threshold registers: pthresh=8 hthresh=8 wthresh=4
? TX queues=2 - TX desc=512 - TX free threshold=0
? TX threshold registers: pthresh=36 hthresh=0 wthresh=0
? TX RS bit threshold=0 - TXQ flags=0x0

testpmd> show port stats all



Regards,
Choi, Sy Jong
Platform Application Engineer

From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Dan Kan
Sent: Wednesday, January 08, 2014 3:25 PM
To: dev at dpdk.org
Subject: [dpdk-dev] Unable to get RSS to work in testpmd and load balancing 
question

I'm evaluating DPDK using dpdk-1.5.1r1. I have been playing around with the 
test-pmd sample app. I'm having a hard time to get RSS to work. I have a 2-port 
82599 Intel X540-DA2 NIC. I'm running the following command to start the app.

sudo ./testpmd -c 0x1f -n 2 -- -i --portmask=0x3 --nb-cores=4 --rxq=4
--txq=4

I have a packet generator that sends udp packets with various src IP.
According testpmd, I'm only receiving packets in port 0's queue 0. Packets are 
not going into any other queues. I have attached the output from testpmd.


  --- Forward Stats for RX Port= 0/Queue= 0 -> TX Port= 1/Queue= 0
---
  RX-packets: 100TX-packets: 100TX-dropped:
0
  -- Forward statistics for port 0
--
  RX-packets: 100RX-dropped: 0 RX-total: 100
  TX-packets: 0  TX-dropped: 0 TX-total: 0



  -- Forward statistics for port 1
--
  RX-packets: 0  RX-dropped: 0 RX-total: 0
  TX-packets: 100TX-dropped: 0 TX-total: 100



  +++ Accumulated forward statistics for all
ports+++
  RX-packets: 100RX-dropped: 0 RX-total: 100
  TX-packets: 100TX-dropped: 0 TX-total: 100



On a separate note, I also find that the CPU utilization using 1 forwarding 
core for 2 ports seems to be better (in the aggregate sense) than using 2 
forwarding cores for 2 ports. Running at 10gbps li

[dpdk-dev] Useful: Simple C++ program & Makefile

2014-01-10 Thread Hamid Ramazani
> I don't exactly know what is needed for C++. Please keep us informed.

Hey Thomas,

I've attached a simple program (main.cpp main.h and Makefile) that has
a C++ class and just prints some messages in the output.
Despite the fact that it's working fine, I'm sure the Makefile could
be written much better; Maybe I made it completer in future.

All the Best,
--Hamid

On 1/3/14, Thomas Monjalon  wrote:
> Hello,
>
> 03/01/2014 11:48, Hamid Ramazani :
>> eal_timer.c:(.text+0x42c): undefined reference to `clock_gettime'
>
> From "man clock_gettime":
> Link with -lrt (only for glibc versions before 2.17).
>
>>  g++ -m64 -pthread  -march=native -DRTE_MACHINE_CPUFLAG_SSE
>> -DRTE_MACHINE_CPUFLAG_SSE2 -DRTE_MACHINE_CPUFLAG_SSSE3
>> -DRTE_COMPILE_TIME_CPUFLAGS=RTE_CPUFLAG_SSE,RTE_CPUFLAG_SSE2,RTE_CPUFLAG_SS
>> SE3 -I/home/hamid/dpdk/dpdk-1.5.1r1/examples/sample/build/include
>> -I/home/hamid/dpdk/dpdk-1.5.1r1/x86_64-default-linuxapp-gcc/include
>> -include
>> /home/hamid/dpdk/dpdk-1.5.1r1/x86_64-default-linuxapp-gcc/include/rte_conf
>> ig.h -O3 -W -Wall -Werror -Wmissing-declarations -Wpointer-arith
>> -Wcast-align -Wcast-qual -Wformat-nonliteral -Wformat-security -Wundef
>> -Wwrite-strings -Wl,-melf_x86_64 -Wl,-export-dynamic sample.cpp -o
>> sample -Wl,-L/home/hamid/dpdk/dpdk-1.5.1r1/examples/sample/build/lib
>> -Wl,-L/home/hamid/dpdk/dpdk-1.5.1r1/x86_64-default-linuxapp-gcc/lib
>> -Wl,-L/home/hamid/dpdk/dpdk-1.5.1r1/x86_64-default-linuxapp-gcc/lib
>> -Wl,-lrte_kni -Wl,-lrte_pmd_e1000 -Wl,-lrte_pmd_ixgbe -Wl,-lrte_mbuf
>> -Wl,-lrte_cmdline -Wl,-lrte_timer -Wl,-lrte_hash -Wl,-lrte_lpm
>> -Wl,--start-group -Wl,-lethdev -Wl,-lrte_malloc -Wl,-lrte_mempool
>> -Wl,-lrte_ring -Wl,-lrte_eal -Wl,-ldl -Wl,--end-group
>
> Try CONFIG_RTE_BUILD_COMBINE_LIBS=y and -lintel_dpdk instead of all these
> libraries. You can also remove the warning options if you want.
>
> You can also try to build your Makefile by including files like
> mk/rte.extapp.mk and defining CC=g++.
> I don't exactly know what is needed for C++. Please keep us informed.
>
> --
> Thomas
>


[dpdk-dev] Redirection Table

2014-01-10 Thread Ivan Boule
On 01/07/2014 02:03 PM, Stefan Baranoff wrote:
>
> All,
>
> Does this mean that an application looking at traffic in something 
> like an IP/IP or GRE tunnel with only two endpoints on the tunnels but 
> many clients behind them must do software load balancing as the 
> packets would IP only (not TCP/UDP) with the same two addresses?
>
Yes, your understanding is correct, when Intel's NICs are used. But some 
other NICs could allow deeper packet analysis.
When NICs only support IPv4/IPv6 RSS, then it is up to the network stack 
on top of the DPDK to provide the best optimal solution in software.

> How much of a penalty is there for crossing processor boundaries in 
> that case and might a 1 CPU server, while less core dense, actually 
> give better performance/watt?
>
This is hard to say. It is likely to partially depend on the 
CPU/memory/bus/ characteristics.
Note that if you consider dedicating a single core to the polling of 
packets received on all ports, then you only need a single RX queue per 
port.
By the way, in such a case, you can still configure RSS on the ports, so 
that the hardware computes the 32-bit RSS hash on each IP packet (it is 
stored into the rte_mbuf structure by the Poll Mode Driver), so that the 
RSS hash can be used in software to speed-up the assignment of IP 
packets to [a subset of] processing cores, if needed.

Regards
Ivan

> Thanks,
> Stefan
>
> Sent from my smart phone; people don't make typos, Swype does!
>
> On Jan 7, 2014 3:36 AM, "Ivan Boule"  > wrote:
>
> On 01/06/2014 05:52 PM, Michael Quicquaro wrote:
>
> Thanks for the details.  Can the hash function be modified so
> that I can provide my own RSS function?  i.e.  my ultimate
> goal is to provide RSS that is not dependent on packet contents.
>
> No, the RSS function is "hard-wired" and only works on IPv4/IPv6
> packets. All other packets are stored in the same queue (0 by
> default).
> You can change the RSS key used by the RSS function to compute the
> hash value.
> See the following testpmd command:
>
>port config X rss-hash-key <80 hexa digits>
>
> to set the 320-bit RSS key of port X.
>
> Best regards,
> Ivan
>
> You may have seen my thread "generic load balancing".  At this
> point, I'm realizing that the only way to accomplish this is
> to let the packets land where they may (the queue where the
> NIC places the packet) and distribute them (to other queues)
> by having some of the CPU processing devoted to this task.
>  Can you verify this?
>
> Regards,
> - Michael.
>
>
> On Mon, Jan 6, 2014 at 10:21 AM, Ivan Boule
> mailto:ivan.boule at 6wind.com>
> >>
> wrote:
>
> On 12/31/2013 08:45 PM, Michael Quicquaro wrote:
>
> Has anyone used the "port config all reta (hash,queue)"
> command of testpmd
> with any success?
>
> I haven't found much documentation on it.
>
> Can someone provide an example on why and how it was used.
>
> Regards and Happy New Year,
> Michael Quicquaro
>
> Hi Michael,
>
> "RETA" stands for Redirection Table.
> It is a per-port configurable table of 128 entries that is
> used by the
> RSS filtering feature of Intel 1GbE and 10GbE controllers to
> select the
> RX queue into which to store a received IP packet.
> When receiving an IPv4/IPv6 packet, the controller
> computes a 32-bit
> hash on:
>
>   * the source address and the destination address of the
> IP header of
> the packet,
>   * the source port and the destination port of the UDP/TCP
> header, if any.
>
> Then, the controller takes the 7 lower bits of the RSS
> hash as an
> index
> into the RETA table to get the RX queue number where to
> store the
> packet.
>
> The API of the DPDK includes a function that is exported
> by Poll Mode
> Drivers to configure RETA entries of a given port.
>
> For test purposes, the testpmd application includes the
> following
> command
>
> "port config X rss reta (hash,queue)[,(hash,queue)]"
>
> to configure RETA entries of a port X, with each couple
> (hash,queue)
> contains the index of a RETA entry (between 0 and 127
> included)
> and the
> RX queue number (between 0 and 15) to be stored into that
> RETA entry.
>
> Best regards
> Ivan
>
> -- Ivan Boule
> 6WIND Development Engineer
>
>
>
>

[dpdk-dev] How to Destroy any rte_ring or recreate the ring with same name.

2014-01-10 Thread ankit kumar
Hi all,

   As i am trying to use DPDK ring library in my application. It works fine.
But as per API Documentation there is no function to destroy any
created ring. I have to create a
ring multiple time with same name.
   SO is there any way to destroy any created ring then recreate it
with same name.

Thank You. !!!


[dpdk-dev] Comments regarding Flow Director support in PMD IXGBE

2014-01-10 Thread Maxime Leroy
Hi Robert,

On Fri, Jan 3, 2014 at 8:52 PM, Robert Sanford  wrote:
> Issue #1:
> Our reading of the 82599 data sheet leads us to believe that
> Flow Director can simultaneously handle *both* IPv4 and IPv6 filters,
> with separate filter rules, of course.
>
> Thus, at the bottom of ixgbe_fdir.c:fdir_set_input_mask_82599( ),
> we could remove the "if (!input_mask->set_ipv6_mask)" / "else"
> around the setting of FDIRSIP4M, FDIRDIP4M, and FDIRIP6M.
> (This would also eliminate the need for the set_ipv6_masks flag itself.)
>
> We performed limited testing on this change. We have successfully
> added both IPv4 and IPv6 signature filters, but so far have only
> exercised them with IPv4 traffic.
>
> One would think that the designers of this chip feature envisioned
> users filtering mixed traffic (both IPv4 and IPv6).

By reading the 82599 datasheet, I have the same analyze than you,
the flow director masks seems to be independent for ipv4 and ipv6.

But it will be nice to have a small test with ipv6 traffic to be sure
about this point.

Would you like to provide a patch to remove this useless "if" please ?

(Note: the set_ipv6_mask field of the input_mask structure need to be
removed too)

> Issue #2:
> Apparently, API rte_eth_dev_fdir_set_masks( ) expects IPv4 address
> and port masks in host-byte-order (little-endian), while
> rte_eth_dev_fdir_add_signature_filter( ) expects IPv4 addresses and
> ports in network-byte-order (big-endian).
>
> (Contrast the writing into IXGBE_FDIRSIP4M in ixgbe_fdir.c:
> fdir_set_input_mask_82599( ), versus ixgbe/ixgbe_82599.c:
> ixgbe_fdir_set_input_mask_82599( ). The former includes an extra
> IXGBE_NTOHL( ) on the mask's complement.)
>
> Not knowing this made it a bit tricky to get signature filters working
> properly. Perhaps it is too late to change the byte-ordering in the
> (set masks) API? Whether we change it or not, we probably should
> at least document these details, to avoid confusion.

First, you probably know this point, a good way to test flow director in dpdk is
to use the testpmd application.

And it's also a good example to understand how to use rte_eth_dev_fdir_* api.

So by reading the app/test-pmd/cmdline.c file, I can understand
that the mask is parsed in little-endian for rte_eth_dev_fdir_set_masks.
And the src/dst ip addresses are parsed in big-endian for
rte_eth_dev_fdir_add_signature_filter.

Thus I agree with your analyze, the fdir api is not coherent.
I think all the parameters of the fdir api should be in network order.

+ About a patch to fix the api:

As you said, IXGBE_NTOHL need to be removed and IXGBE_WRITE_REG need
to be used instead of IXGBE_WRITE_REG_BE32 (in
lib/librte_pmd_ixgbe/ixgbe_fdir.c):

  /* Store source and destination IPv4 masks (big-endian) */
 -  IXGBE_WRITE_REG_BE32(hw, IXGBE_FDIRSIP4M,
 -IXGBE_NTOHL(~input_mask->src_ipv4_mask));
 +  IXGBE_WRITE_REG(hw, IXGBE_FDIRSIP4M,
 +~input_mask->src_ipv4_mask);

The testpmd application need to be updated in consequence to provide ip mask
in network order (in lib/librte_cmdline/cmdline.c):

  - fdir_masks.dst_ipv4_mask = res->ip_dst_mask;
  - fdir_masks.src_ipv4_mask = res->ip_src_mask;
  + fdir_masks.dst_ipv4_mask = rte_cpu_to_be_32(res->ip_dst_mask);
  + fdir_masks.dst_ipv4_mask = rte_cpu_to_be_32(res->ip_dst_mask);

Would you like to provide and test a patch to fix this issue, please ?

Thanks. Best Regards,

---
Maxime Leroy
maxime.leroy at 6wind.com


[dpdk-dev] Comments regarding Flow Director support in PMD IXGBE

2014-01-10 Thread Robert Sanford
Hello Maxime,

Thank you for taking the time to research these flow director issues and a
more complete solution.

On Fri, Jan 10, 2014 at 8:36 AM, Maxime Leroy 
 wrote:
> But it will be nice to have a small test with ipv6 traffic to be sure
> about this point.
>
> Would you like to provide a patch to remove this useless "if" please ?
>
> (Note: the set_ipv6_mask field of the input_mask structure need to be
> removed too)
...
> Thus I agree with your analyze, the fdir api is not coherent.
> I think all the parameters of the fdir api should be in network order.
...
> The testpmd application need to be updated in consequence to provide ip
mask
> in network order (in lib/librte_cmdline/cmdline.c):
...
> Would you like to provide and test a patch to fix this issue, please ?


Yes, we will do as you suggest:
1. Test with IPv6 traffic.
2. Make appropriate changes to flow director code and testpmd.
3. Update comments in the structure definitions to indicate byte order.
4. Provide the patch.


--
Regards,
Robert


[dpdk-dev] Unable to get RSS to work in testpmd and load balancing question

2014-01-10 Thread Michael Quicquaro
Why are there so many RX-errors?


On Thu, Jan 9, 2014 at 9:35 PM, Daniel Kan  wrote:

> Thanks, Sy Jong. I couldn?t reproduce your outcome on dpdk 1.5.1 with
> ixgbe. As I sent in the earlier email, rxmode.mq_mode is defaulted to 0
> (i.e. ETH_MQ_RX_NONE); it should be set to ETH_MQ_RX_RSS.
>
> Dan
>
> On Jan 9, 2014, at 6:07 PM, Choi, Sy Jong  wrote:
>
> >
> > Hi Dan,
> >
> > I have tested with 6 flows with identical ip address, but varies UDP
> port number. I can see both queues with traffic.
> > Using the following command:-
> > sudo ./app/testpmd -c 0x1f -n 4 -- -i -rss-udp --portmask=0x03
> --nb-cores=4 --rxq=2 --txq=2
> >
> >
> > I have started with RSS IPv4, which is enabled by default.
> > The critical part is the traffic, since I only 2 queues, I am sending 6
> flows with different IP addresses in order to see the flow got distributed
> evenly. Or else you might see only 1 queues if you have 2 flows they might
> load to a single queue only.
> >
> > My Command:-
> > sudo ./app/testpmd -c 0x1f -n 4 -- -i --portmask=0x03 --nb-cores=4
> --rxq=2 --txq=2
> > - Using 4 cores
> > - Rxq = 2 for each port, so 4 queues to 4 cores.
> >
> >
> >
> > testpmd> show port stats all
> >
> >    NIC statistics for port 0
>  
> >   RX-packets:  6306519648RX-errors:  757945685
>  RX-bytes: 309383840254
> >   TX-packets:   132592678TX-errors:  0
>  TX-bytes: 8485925376
> >
> >   Stats reg  0 RX-packets: 2556150208RX-errors:  0
>  RX-bytes: 116477417471
> >   Stats reg  1 RX-packets: 3750369440RX-errors:  0
>  RX-bytes: 192906422783
> >   Stats reg  2 RX-packets:  0RX-errors:  0
>  RX-bytes:  0
> > .
> > .
> > .
> >   Stats reg 15 RX-packets:  0RX-errors:  0
>  RX-bytes:  0
> >
> 
> >
> >    NIC statistics for port 1
>  
> >   RX-packets:   132594048RX-errors:   13825889
>  RX-bytes: 8486020288
> >   TX-packets:  6306522739TX-errors:  0
>  TX-bytes: 231983528894
> >
> >   Stats reg  0 RX-packets:   83615783RX-errors:  0
>  RX-bytes: 5351410624
> >   Stats reg  1 RX-packets:   48978265RX-errors:  0
>  RX-bytes: 3134609664
> >   Stats reg  2 RX-packets:  0RX-errors:  0
>  RX-bytes:  0
> > .
> > .
> > .
> >   Stats reg 15 RX-packets:  0RX-errors:  0
>  RX-bytes:  0
> >
> 
> > testpmd>
> >
> >
> >
> >
> > My Command:-
> > sudo ./app/testpmd -c 0x1f -n 4 -- -i --portmask=0x03 --nb-cores=4
> --rxq=2 --txq=2
> > - Using 4 cores
> > - Rxq = 2 for each port, so 4 queues to 4 cores.
> >
> > I use this command to map the queue statistic.
> > testpmd> set stat_qmap rx 0 0 0
> > testpmd> set stat_qmap rx 0 1 1
> > testpmd> set stat_qmap rx 1 0 0
> > testpmd> set stat_qmap rx 1 1 1
> > testpmd> start
> >   io packet forwarding - CRC stripping disabled - packets/burst=16
> >   nb forwarding cores=2 - nb forwarding ports=2
> >   RX queues=2 - RX desc=128 - RX free threshold=0
> >   RX threshold registers: pthresh=8 hthresh=8 wthresh=4
> >   TX queues=2 - TX desc=512 - TX free threshold=0
> >   TX threshold registers: pthresh=36 hthresh=0 wthresh=0
> >   TX RS bit threshold=0 - TXQ flags=0x0
> >
> > testpmd> show port stats all
> >
> >
> >
> > Regards,
> > Choi, Sy Jong
> > Platform Application Engineer
> >
> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Dan Kan
> > Sent: Wednesday, January 08, 2014 3:25 PM
> > To: dev at dpdk.org
> > Subject: [dpdk-dev] Unable to get RSS to work in testpmd and load
> balancing question
> >
> > I'm evaluating DPDK using dpdk-1.5.1r1. I have been playing around with
> the test-pmd sample app. I'm having a hard time to get RSS to work. I have
> a 2-port 82599 Intel X540-DA2 NIC. I'm running the following command to
> start the app.
> >
> > sudo ./testpmd -c 0x1f -n 2 -- -i --portmask=0x3 --nb-cores=4 --rxq=4
> > --txq=4
> >
> > I have a packet generator that sends udp packets with various src IP.
> > According testpmd, I'm only receiving packets in port 0's queue 0.
> Packets are not going into any other queues. I have attached the output
> from testpmd.
> >
> >
> >  --- Forward Stats for RX Port= 0/Queue= 0 -> TX Port= 1/Queue= 0
> > ---
> >  RX-packets: 100TX-packets: 100TX-dropped:
> > 0
> >  -- Forward statistics for port 0
> > --
> >  RX-packets: 100RX-dropped: 0 RX-total: 100
> >  TX-packets: 0  TX-dropped: 0 TX-total: 0
> >
> >
> 
> >
> >  -- Forward statistics for port 1
> > -

[dpdk-dev] [PATCH] spinlock: fix atomic and out of order execution

2014-01-10 Thread François-Frédéric Ozog
Hi Thomas,

I am afraid I introduced unnecessary complexity in the discussion as the
spinlock issues I mentioned are connected to a work in progress on my side
(implement a Chelsio cxgb5 PMD) but *not* to the general DPDK. 

I'll explain some aspects of the context and how critical sections has to be
handled in the case of the Chelsio cxgb5 PMD.

1) WC memory
Most memory is cached, some is not cached at all, and some is tagged (via
paging attributes) to be "Write Combining". This means that writes to WC
memory locations do *NOT* go through the cache but rather to a memory "write
buffer" internal to the processor. The processor delays write to memory as
it please, trying to minimize the actual number of writes to DRAM. This type
of memory is used by some devices as a fast interface, for instance Chelsio
cxgb5 write queue can filled using this method
(http://lwn.net/Articles/542643/). Public DPDK cannot declare such memory
type as it requires kernel support.

2) fencing and out of order execution
Out of order execution is related to the processor that may actually change
the order of the instructions of a program as it has been produced by the
compiler. In general, this relates only to performance but for critical
sections, it is absolutely necessary to ensure that the processor does NOT
postpone critical section instructions AFTER the unlock.
So there is a need of a "fence" to tell the processor that some instruction
blocks out of order execution: no previous instruction can be postponed
after the "fence".

3) Public DPDK critical sections

The DPDK implements lock/unlock with xchg instruction which is atomic and
serializing for "normal" cached memory. It includes an implicit lock prefix,
so no need to add such prefix (your patch proposal). This (implicit) lock
prefix is actually acting as an out of order execution fence, thus ensuring
that critical section is working as expected. Nothing has to be added or
changed for public DPDK to have correct critical sections.

4) "Extended" DPDK critical sections

Because of Chelsio cxgb5 PMD driver I am developing, I added kernel support
to DPDK to declare some memory regions as Write Combining, hence to be able
to leverage the output queue fast path.
Because of this, I also had to add proper fencing for critical sections.
The relation is the fact that writes to WC memory does NOT go through the
cache AND that the lock prefix is implemented as a cache protocol
transaction. In other words, a lock prefix is NOT an out of order
instruction fence for instructions that deal with WC memory. So, the
processor may decide to postpone the WC related instruction ATFER the
unlock.
To create a proper out of order execution fence, the processor has a set of
explicit memory fences that do the job.
So before a rte_unlock, I have to add a rte_mb().


So:
- for people willing to develop applications on top of DPDK, my comments can
be disregarded, they are not relevant, there is no need to understand them. 
- for people willing to develop PMD drivers, the fencing comments should be
clear to use the proper fencing.


I am sorry for any confusion I may have introduced in the community.


Fran?ois-Fr?d?ric


> -Message d'origine-
> De?: dev [mailto:dev-bounces at dpdk.org] De la part de Thomas Monjalon
> Envoy??: samedi 21 d?cembre 2013 00:38
> ??: dev at dpdk.org
> Objet?: [dpdk-dev] [PATCH] spinlock: fix atomic and out of order execution
> 
> From: Damien Millescamps 
> 
> Add lock prefix before xchg instructions in order to be atomic and flush
> speculative values to ensure effective execution order (as an acquire
> barrier).
> 
> MPLOCKED is a "lock" in multicore case.
> 
> Signed-off-by: Damien Millescamps 
> Signed-off-by: Thomas Monjalon 
> ---
>  lib/librte_eal/common/include/rte_spinlock.h |7 ---
>  1 file changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/lib/librte_eal/common/include/rte_spinlock.h
> b/lib/librte_eal/common/include/rte_spinlock.h
> index f7a245a..8edb971 100644
> --- a/lib/librte_eal/common/include/rte_spinlock.h
> +++ b/lib/librte_eal/common/include/rte_spinlock.h
> @@ -51,6 +51,7 @@
>  extern "C" {
>  #endif
> 
> +#include 
>  #include 
>  #ifdef RTE_FORCE_INTRINSICS
>  #include 
> @@ -93,7 +94,7 @@ rte_spinlock_lock(rte_spinlock_t *sl)
>   int lock_val = 1;
>   asm volatile (
>   "1:\n"
> - "xchg %[locked], %[lv]\n"
> + MPLOCKED "xchg %[locked], %[lv]\n"
>   "test %[lv], %[lv]\n"
>   "jz 3f\n"
>   "2:\n"
> @@ -124,7 +125,7 @@ rte_spinlock_unlock (rte_spinlock_t *sl)  #ifndef
> RTE_FORCE_INTRINSICS
>   int unlock_val = 0;
>   asm volatile (
> - "xchg %[locked], %[ulv]\n"
> + MPLOCKED "xchg %[locked], %[ulv]\n"
>   : [locked] "=m" (sl->locked), [ulv] "=q"
(unlock_val)
>   : "[ulv]" (unlock_val)
>   : "mem

[dpdk-dev] Useful: Simple C++ program & Makefile

2014-01-10 Thread Dan Kan
Hamid,
I'm in the same situation as you in which I would like to write most of the
application logic in C++. I was able to use CC=g++ by slightly modifying
the makefiles in mk. I replaced all occurrences of "%.c" with "%.cc" using
the following command.

find . -name "*.mk" -exec sed -i 's/%\.c\([^[:alpha:]]\)/%\.cc\1/g' {} \;

I also remove some warning flags as errors.

Here are the diffs:

diff -r mk/internal/rte.compile-pre.mk ../temp/dpdk-1.5.1r2/mk/internal/
rte.compile-pre.mk
39c39
< src2obj = $(strip $(patsubst %.cc,%.o,\
---
> src2obj = $(strip $(patsubst %.c,%.o,\
48c48
< src2dep = $(strip $(call dotfile,$(patsubst %.cc,%.o.d, \
---
> src2dep = $(strip $(call dotfile,$(patsubst %.c,%.o.d, \
53c53
< src2cmd = $(strip $(call dotfile,$(patsubst %.cc,%.o.cmd, \
---
> src2cmd = $(strip $(call dotfile,$(patsubst %.c,%.o.cmd, \
125c125
< %.o: %.cc $$(wildcard $$(dep_$$@)) $$(DEP_$$(@)) FORCE
---
> %.o: %.c $$(wildcard $$(dep_$$@)) $$(DEP_$$(@)) FORCE
diff -r mk/rte.module.mk ../temp/dpdk-1.5.1r2/mk/rte.module.mk
36,37c36,37
< ifneq ($(MODULE),$(notdir $(SRCS-y:%.cc=%)))
< $(MODULE)-objs += $(notdir $(SRCS-y:%.cc=%.o))
---
> ifneq ($(MODULE),$(notdir $(SRCS-y:%.c=%)))
> $(MODULE)-objs += $(notdir $(SRCS-y:%.c=%.o))
diff -r mk/toolchain/gcc/rte.vars.mk ../temp/dpdk-1.5.1r2/mk/toolchain/gcc/
rte.vars.mk
71,73c71,73
< WERROR_FLAGS := -W -Wall -Werror
< WERROR_FLAGS += -Wmissing-declarations -Wpointer-arith
< WERROR_FLAGS += -Wcast-align -Wcast-qual
---
> WERROR_FLAGS := -W -Wall -Werror -Wstrict-prototypes -Wmissing-prototypes
> WERROR_FLAGS += -Wmissing-declarations -Wold-style-definition
-Wpointer-arith
> WERROR_FLAGS += -Wcast-align -Wnested-externs -Wcast-qual


Dan


On Thu, Jan 9, 2014 at 10:30 PM, Hamid Ramazani wrote:

> > I don't exactly know what is needed for C++. Please keep us informed.
>
> Hey Thomas,
>
> I've attached a simple program (main.cpp main.h and Makefile) that has
> a C++ class and just prints some messages in the output.
> Despite the fact that it's working fine, I'm sure the Makefile could
> be written much better; Maybe I made it completer in future.
>
> All the Best,
> --Hamid
>
> On 1/3/14, Thomas Monjalon  wrote:
> > Hello,
> >
> > 03/01/2014 11:48, Hamid Ramazani :
> >> eal_timer.c:(.text+0x42c): undefined reference to `clock_gettime'
> >
> > From "man clock_gettime":
> > Link with -lrt (only for glibc versions before 2.17).
> >
> >>  g++ -m64 -pthread  -march=native -DRTE_MACHINE_CPUFLAG_SSE
> >> -DRTE_MACHINE_CPUFLAG_SSE2 -DRTE_MACHINE_CPUFLAG_SSSE3
> >>
> -DRTE_COMPILE_TIME_CPUFLAGS=RTE_CPUFLAG_SSE,RTE_CPUFLAG_SSE2,RTE_CPUFLAG_SS
> >> SE3 -I/home/hamid/dpdk/dpdk-1.5.1r1/examples/sample/build/include
> >> -I/home/hamid/dpdk/dpdk-1.5.1r1/x86_64-default-linuxapp-gcc/include
> >> -include
> >>
> /home/hamid/dpdk/dpdk-1.5.1r1/x86_64-default-linuxapp-gcc/include/rte_conf
> >> ig.h -O3 -W -Wall -Werror -Wmissing-declarations -Wpointer-arith
> >> -Wcast-align -Wcast-qual -Wformat-nonliteral -Wformat-security -Wundef
> >> -Wwrite-strings -Wl,-melf_x86_64 -Wl,-export-dynamic sample.cpp -o
> >> sample -Wl,-L/home/hamid/dpdk/dpdk-1.5.1r1/examples/sample/build/lib
> >> -Wl,-L/home/hamid/dpdk/dpdk-1.5.1r1/x86_64-default-linuxapp-gcc/lib
> >> -Wl,-L/home/hamid/dpdk/dpdk-1.5.1r1/x86_64-default-linuxapp-gcc/lib
> >> -Wl,-lrte_kni -Wl,-lrte_pmd_e1000 -Wl,-lrte_pmd_ixgbe -Wl,-lrte_mbuf
> >> -Wl,-lrte_cmdline -Wl,-lrte_timer -Wl,-lrte_hash -Wl,-lrte_lpm
> >> -Wl,--start-group -Wl,-lethdev -Wl,-lrte_malloc -Wl,-lrte_mempool
> >> -Wl,-lrte_ring -Wl,-lrte_eal -Wl,-ldl -Wl,--end-group
> >
> > Try CONFIG_RTE_BUILD_COMBINE_LIBS=y and -lintel_dpdk instead of all these
> > libraries. You can also remove the warning options if you want.
> >
> > You can also try to build your Makefile by including files like
> > mk/rte.extapp.mk and defining CC=g++.
> > I don't exactly know what is needed for C++. Please keep us informed.
> >
> > --
> > Thomas
> >
>


[dpdk-dev] Using valgrind with DPDK?

2014-01-10 Thread Patrick Mahan

General Query, has anyone attempted to use valgrind --tool=callgrind with the 
testpmd application?  I am seeing valgrind core dumping when EAL attempts to 
set up hugetlb pages (2MB size).  There seems some collision on the mmap() call.

Thanks,

Patrick

Coming to you from deep inside Fortress Mahan


[dpdk-dev] Useful: Simple C++ program & Makefile

2014-01-10 Thread Dan Kan
I forgot to add that you will also need to add -D__STDC_LIMIT_MACROS
because C99 standard specifies that limits such as INT8_MAX should only be
defined if explicitly requested. Also with g++, you can no longer use
non-trivial designated initializers which are used extensively throughout
dpdk sample apps. For example,

static const struct rte_eth_rxconf rx_conf = {
.rx_thresh = {
.pthresh = RX_PTHRESH,
.hthresh = RX_HTHRESH,
.wthresh = RX_WTHRESH,
},
.rx_free_thresh = 32,
};


Basically, use gcc to build the dpdk libraries. Then, use g++ to build your
own code and link against the built libraries.

Dan


On Fri, Jan 10, 2014 at 12:19 PM, Dan Kan  wrote:

> Hamid,
> I'm in the same situation as you in which I would like to write most of
> the application logic in C++. I was able to use CC=g++ by slightly
> modifying the makefiles in mk. I replaced all occurrences of "%.c" with
> "%.cc" using the following command.
>
> find . -name "*.mk" -exec sed -i 's/%\.c\([^[:alpha:]]\)/%\.cc\1/g' {} \;
>
> I also remove some warning flags as errors.
>
> Here are the diffs:
>
> diff -r mk/internal/rte.compile-pre.mk ../temp/dpdk-1.5.1r2/mk/internal/
> rte.compile-pre.mk
> 39c39
> < src2obj = $(strip $(patsubst %.cc,%.o,\
> ---
> > src2obj = $(strip $(patsubst %.c,%.o,\
> 48c48
> < src2dep = $(strip $(call dotfile,$(patsubst %.cc,%.o.d, \
> ---
> > src2dep = $(strip $(call dotfile,$(patsubst %.c,%.o.d, \
> 53c53
> < src2cmd = $(strip $(call dotfile,$(patsubst %.cc,%.o.cmd, \
> ---
> > src2cmd = $(strip $(call dotfile,$(patsubst %.c,%.o.cmd, \
> 125c125
> < %.o: %.cc $$(wildcard $$(dep_$$@)) $$(DEP_$$(@)) FORCE
> ---
> > %.o: %.c $$(wildcard $$(dep_$$@)) $$(DEP_$$(@)) FORCE
> diff -r mk/rte.module.mk ../temp/dpdk-1.5.1r2/mk/rte.module.mk
> 36,37c36,37
> < ifneq ($(MODULE),$(notdir $(SRCS-y:%.cc=%)))
> < $(MODULE)-objs += $(notdir $(SRCS-y:%.cc=%.o))
> ---
> > ifneq ($(MODULE),$(notdir $(SRCS-y:%.c=%)))
> > $(MODULE)-objs += $(notdir $(SRCS-y:%.c=%.o))
> diff -r mk/toolchain/gcc/rte.vars.mk../temp/dpdk-1.5.1r2/mk/toolchain/gcc/
> rte.vars.mk
> 71,73c71,73
> < WERROR_FLAGS := -W -Wall -Werror
> < WERROR_FLAGS += -Wmissing-declarations -Wpointer-arith
> < WERROR_FLAGS += -Wcast-align -Wcast-qual
> ---
> > WERROR_FLAGS := -W -Wall -Werror -Wstrict-prototypes -Wmissing-prototypes
> > WERROR_FLAGS += -Wmissing-declarations -Wold-style-definition
> -Wpointer-arith
> > WERROR_FLAGS += -Wcast-align -Wnested-externs -Wcast-qual
>
>
> Dan
>
>
> On Thu, Jan 9, 2014 at 10:30 PM, Hamid Ramazani  gmail.com>wrote:
>
>> > I don't exactly know what is needed for C++. Please keep us informed.
>>
>> Hey Thomas,
>>
>> I've attached a simple program (main.cpp main.h and Makefile) that has
>> a C++ class and just prints some messages in the output.
>> Despite the fact that it's working fine, I'm sure the Makefile could
>> be written much better; Maybe I made it completer in future.
>>
>> All the Best,
>> --Hamid
>>
>> On 1/3/14, Thomas Monjalon  wrote:
>> > Hello,
>> >
>> > 03/01/2014 11:48, Hamid Ramazani :
>> >> eal_timer.c:(.text+0x42c): undefined reference to `clock_gettime'
>> >
>> > From "man clock_gettime":
>> > Link with -lrt (only for glibc versions before 2.17).
>> >
>> >>  g++ -m64 -pthread  -march=native -DRTE_MACHINE_CPUFLAG_SSE
>> >> -DRTE_MACHINE_CPUFLAG_SSE2 -DRTE_MACHINE_CPUFLAG_SSSE3
>> >>
>> -DRTE_COMPILE_TIME_CPUFLAGS=RTE_CPUFLAG_SSE,RTE_CPUFLAG_SSE2,RTE_CPUFLAG_SS
>> >> SE3 -I/home/hamid/dpdk/dpdk-1.5.1r1/examples/sample/build/include
>> >> -I/home/hamid/dpdk/dpdk-1.5.1r1/x86_64-default-linuxapp-gcc/include
>> >> -include
>> >>
>> /home/hamid/dpdk/dpdk-1.5.1r1/x86_64-default-linuxapp-gcc/include/rte_conf
>> >> ig.h -O3 -W -Wall -Werror -Wmissing-declarations -Wpointer-arith
>> >> -Wcast-align -Wcast-qual -Wformat-nonliteral -Wformat-security -Wundef
>> >> -Wwrite-strings -Wl,-melf_x86_64 -Wl,-export-dynamic sample.cpp -o
>> >> sample -Wl,-L/home/hamid/dpdk/dpdk-1.5.1r1/examples/sample/build/lib
>> >> -Wl,-L/home/hamid/dpdk/dpdk-1.5.1r1/x86_64-default-linuxapp-gcc/lib
>> >> -Wl,-L/home/hamid/dpdk/dpdk-1.5.1r1/x86_64-default-linuxapp-gcc/lib
>> >> -Wl,-lrte_kni -Wl,-lrte_pmd_e1000 -Wl,-lrte_pmd_ixgbe -Wl,-lrte_mbuf
>> >> -Wl,-lrte_cmdline -Wl,-lrte_timer -Wl,-lrte_hash -Wl,-lrte_lpm
>> >> -Wl,--start-group -Wl,-lethdev -Wl,-lrte_malloc -Wl,-lrte_mempool
>> >> -Wl,-lrte_ring -Wl,-lrte_eal -Wl,-ldl -Wl,--end-group
>> >
>> > Try CONFIG_RTE_BUILD_COMBINE_LIBS=y and -lintel_dpdk instead of all
>> these
>> > libraries. You can also remove the warning options if you want.
>> >
>> > You can also try to build your Makefile by including files like
>> > mk/rte.extapp.mk and defining CC=g++.
>> > I don't exactly know what is needed for C++. Please keep us informed.
>> >
>> > --
>> > Thomas
>> >
>>
>
>


[dpdk-dev] Issue in virtio pmd available in dpdk 1.5 - virtqueue does not exist

2014-01-10 Thread Selvaganapathy Chidambaram
Hi,

I am using dpdk 1.5 where virtio pmd is available
in dpdk-1.5.1r2/lib/librte_pmd_virtio/.

When I run l2fwd reference application, I am getting the following error:

EAL: PCI device :00:08.0 on NUMA socket -1
EAL:   probe driver: 1af4:1000 rte_virtio_pmd
EAL: PCI device :00:09.0 on NUMA socket -1
EAL:   probe driver: 1af4:1000 rte_virtio_pmd
Lcore 0: RX port 0
Lcore 1: RX port 1
Initializing port 0...
EAL: Error - exiting with code: 1
  Cause: rte_eth_rx_queue_setup:err=-22, port=0

On debugging further, I see that in function virtio_dev_queue_setup, vq_size
is zero and hence returns error.

I was able to run e1000 pmd and virtio pmd *extension* successfully. Please
let me know if I need to set anything specifically to make virtio pmd to
work.

Thanks,
Selvaganapathy.C.


[dpdk-dev] [PATCH] app/testpmd: fix RSS rx by setting mq_mode

2014-01-10 Thread Daniel Kan
---
 app/test-pmd/testpmd.c |3 +++
 1 file changed, 3 insertions(+)

diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index b11eb2e..355db0f 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -1546,6 +1546,9 @@ init_port_config(void)
if (nb_rxq > 0) {
port->dev_conf.rx_adv_conf.rss_conf.rss_key = NULL;
port->dev_conf.rx_adv_conf.rss_conf.rss_hf = rss_hf;
+   if (nb_rxq > 1 && rss_hf != 0) {
+   port->dev_conf.rxmode.mq_mode = ETH_MQ_RX_RSS;
+   }
} else {
port->dev_conf.rx_adv_conf.rss_conf.rss_key = NULL;
port->dev_conf.rx_adv_conf.rss_conf.rss_hf = 0;
-- 
1.7.9.5



[dpdk-dev] one directional traffic from SR-IOV port using l2fwd

2014-01-10 Thread James Yu
I am trying to make SR-IOV + DPDK l2fwd to work together. I can only send
one directional traffic, not bi-directional.



The traffic is one-directionally looped back by l2fwd using DPDK l2fwd as
illustrated below



Spirent port 1 --> KVM host PF -> VF (Virtual function) --> DPDK l2fwd
(looping back to the other port) --+

Spirent port 2 <--<-
<--+



When I send traffic from port 2 to KVM host, I did not receive traffic on
port 1.


 I think the code should use

rte_ixgbevf_pmd_init() as shown in l2fwd-vf in DPDK 1.2.3 release
(http://www.dpdk.org/browse/dpdk/tree/examples/l2fwd-vf/main.c?h=1.2.3).

Any one knows how to send/receive ports to/from SR-IOV ports ?

Thanks

James

---

I have the following setup. Anything mis-configured ?




I did the following setup to turn on SR-IOV:



Test HW Config:

CPU   = Intel? Xeon? Processor E5506 = 4-core, 2.13 Ghz, (VT-d, VT-x
capable), Hyper threading not supported

Mem = 16GB RAM (800 Mhz) single slot (no NUMA)

NIC= Intel 82599EB 10-Gigabit Ethernet (SR-IOV capable)



*Steps to setup KVM host and guest:*

1) Hypervisor

- Enable VT-d and Virtualization support

2) Host Kernel: RHEL 6.1 (2.6.32-431.el6.x86_64) qemu 0.12.1.2, ixgbe
3.15.1-k

a) Grub Kernel configuration

- Intel = Add "intel_iommu=on"

- AMD = Add "iommu=on iommu=pt"

b) In Host Kernel, Enable virtual functions for ixgbe.

- modprobe -r ixgbe

- modprobe -v ixgbe max_vfs=2

Here is the list of 10G PCI devices. The Virtual Function associated with
1a:10.0 and 1a:10.1 are 1a:10.2 and 1a:10.3. They will be used in the guest
VM hostdev XML configuration in 2(d).

[root at rh188 ~]# lspci |grep Eth

1a:00.0 Ethernet controller: Intel Corporation 82599EB 10-Gigabit SFI/SFP+
Network Connection (rev 01)

1a:00.1 Ethernet controller: Intel Corporation 82599EB 10-Gigabit SFI/SFP+
Network Connection (rev 01)

1a:10.0 Ethernet controller: Intel Corporation 82599 Ethernet Controller
Virtual Function (rev 01)

1a:10.1 Ethernet controller: Intel Corporation 82599 Ethernet Controller
Virtual Function (rev 01)

*1a:10.2 Ethernet controller: Intel Corporation 82599 Ethernet Controller
Virtual Function (rev 01)*

*1a:10.3 Ethernet controller: Intel Corporation 82599 Ethernet Controller
Virtual Function (rev 01)*

c) Blacklist ixgbevf driver in the host. Add below two lines into
/etc/modprobe.d/blacklist.conf

# Intel SR-IOV virtual function driver (ixgbe)

blacklist ixgbevf

d) Add PCIe Virtual functions to the KVM guest either graphically or
through "virsh edit" command. For examples, two <*hostdev*> entries are
added in the guest XML configuration. The ones in bold, function 0x2 and
0x3, are the virtual functions listed in the host. They are associated with
the slot 0x05 and slot 0x08 on the guest VM. You will use these two PCI
devices on the guest.



  

**

  

  

  





  

**

  

  

  



3) Guest Kernel



 To run DPDK l2fwd loopback, run the following script under the dpdk source
code directory:

*modprobe -r igb_uio*

*modprobe uio*

*insmod ./build/kmod/igb_uio.ko*

 *modprobe -r ixgbevf*

*insmod /root/rpmbuild/BUILD/ixgbevf-2.12.1/src/ixgbevf.ko*

 *#Reserve huge pages memory.*

*mkdir -p /mnt/huge*

*mount -t hugetlbfs nodev /mnt/huge*

*echo 196 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages*

*./tools/pci_unbind.py --bind=igb_uio 00:05.0*

*./tools/pci_unbind.py --bind=igb_uio 00:08.0*



*./examples/l2fwd/build/l2fwd -c 3 -n 1 -b 000:00:03.0 -b 000:00:07.0 -b
000:00:0a.0 -- -q 1 -p 3*



*NOTE: ixgbevf.ko is built based on ixgbevf-2.12.1 which has bug fixes to
improve the performance.*


[dpdk-dev] send/receive L2 packets from SR-IOV ports using l2fwd-vf

2014-01-10 Thread James Yu
I found that it used to have l2fwd-vf in DPDK 1.2.3 release (
http://www.dpdk.org/browse/dpdk/tree/examples/l2fwd-vf/main.c?h=1.2.3)
But in the next release 1.3.1, that directory is gone. Does that mean it is
merged to some other tool ? Which tool can I use to send/receive L2 traffic
from SR-IOV ports.

In DPDK 1.3.1r2, to use use SR-IOV ports, should I use l2fwd-vf from 1.2.3
release or simply use l2fwd ?

Thanks

James