[dpdk-dev] Making rte_eal_pci_probe() in rte_eal_init() optional?

2015-11-15 Thread Roger B. Melton
Hi David,

We are on a very old kernel (2.6.xx) that lacks VFIO.  In the future 
however, after migration to a newer kernel it will be an option.

I like the "-b all" and "-w none" idea, but I think it might be 
complicated to implement it the way we would need it to work.  The 
existing -b and -w options  persist for the duration of the application, 
and we would need the "-b all"/"-w none" to persists only through 
rte_eal_init() time.  Otherwise our attempt to to attach a device at a 
later time would be blocked by the option.

Wouldn't it be simpler to have an option to disable the rte_eal_init() 
time the probe.  Would that address the issue with VFIO, prevent 
automatically attaching to devices while permitting on demand attach?

Thanks again.

Regards,
-Roger



On 11/14/15 12:51 PM, David Marchand wrote:
> Hello Roger,
>
> On Sat, Nov 14, 2015 at 4:55 PM, Roger B. Melton  > wrote:
>
> Agreed.  For our application, the debug case would be to _enable_
> the PCI scan.
>
> Again, thanks David for pointing it out.  It did solve our problem.
>
>
> The only problem with --no-pci is that I think that vfio won't work 
> properly if used.
>
> Did you try to blacklist all your devices then attach them later ?
> I would say what you need here is to "blacklist all" or "whitelist 
> none" at startup, so maybe a special keyword for -b/-w options.
>
>
> -- 
> David Marchand

-- 
  
|Roger B. Melton|  |  Cisco Systems  |
|CPP Software  :|::|: 7100 Kit Creek Rd  |
|+1.919.476.2332 phone:|||:  :|||:RTP, NC 27709-4987 |
|+1.919.392.1094 fax   .:|||:..:|||:. rmelton at cisco.com  |
||
| This email may contain confidential and privileged material for the|
| sole use of the intended recipient. Any review, use, distribution  |
| or disclosure by others is strictly prohibited. If you are not the |
| intended recipient (or authorized to receive for the recipient),   |
| please contact the sender by reply email and delete all copies of  |
| this message.  |
||
| For corporate legal information go to: |
| http://www.cisco.com/web/about/doing_business/legal/cri/index.html |
|__ http://www.cisco.com |



[dpdk-dev] [PATCH] doc: fix repeated typo in sample app docs

2015-11-15 Thread De Lara Guarch, Pablo
Hi John,

> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of John McNamara
> Sent: Friday, November 13, 2015 5:55 PM
> To: dev at dpdk.org
> Subject: [dpdk-dev] [PATCH] doc: fix repeated typo in sample app docs
> 
> Fix repeated typo in the "Compiling the Application" section of
> almost all of the sample app docs.
> 
> This generally gets copied into new sample app guides.
> 
> Signed-off-by: John McNamara 

Could you also fix the issue below in the load balancer application,
which is related to this patch?

cd ${RTE_SDK}/examples/load_balancer make

Thanks!
Pablo


[dpdk-dev] URGENT please help. Issue on ixgbe_tx_free_bufs version 2.0.0

2015-11-15 Thread Ariel Rodriguez
Hi Bruce, im going to list the results after the test?s.

I will start with the second hint you proposed:

2) I upgrade our custom dpdk application with the latest dpdk code (2.1.0)
and the issue still there.

1) I test the load balancer app with the latest dpdk code (2.1.0) with the nic
82599ES 10-Gigabit SFI/SFP+ with tapped traffic and the results are:

   a) Work fine after 6 hours of running. (For timing issues i cant wait
longer but the issue always happend before 5 hours of running so i supposed
we are fine in this test).

   b) I made a change to load balancer code to behave as our dpdk
application in the workers code. This change is just for giving  the
workers code enough load (load in terms of core frecuency) that made the rx
core drop several packet because ring between workers and rx core is full.
(Our application drop several packets because the workers code are not fast
enough).

   In the last test, the segmentation fault arise , just in the same
line that i previously report.

Debugging and reading the code in the ixgbe_rxtx.c i  see some weird things.

  - The core dump of the issue always is around line 260 in the
ixgbe_rxtx.c code.
  - Looking at the function "ixgbe_tx_free_bufs" at line 132 , i understand
there is a test for looking at the rs bit write back mechanism.
The IXGBE_ADVTXD_STAT_DD is set and then the code type cast to
ixgbe_tx_entry from the sw_ring in the tx queue (variable name txep).

  - The txep->mbuf entry is totally corrupted beacause has a invalid memory
address, obviously i compared that memory address with the mbuf mempool and
is not even close to be valid. But the address of ixgbe_tx_entry is valid
and in the range of the zmalloc sotware ring structure constructed at
initialization.

 - The txep pointer is the first one in the sw_ring. That
because txq->tx_next_dd is 31 and txq->tx_rs_thresh is 32.
txep = &(txq->sw_ring[txq->tx_next_dd - (txq->tx_rs_thresh - 1)]);

 - txq->tx_rs_thresh is 32. I use  the default values just setting null in
the corresponding *_queue_setup functions.

 - The weirdess thing is that the next entry on the software ring (next
ixgbe_tx_entry) is valid  and has a valid mbuf memory address.

I dont know how to continue , because im tryng to find out where i could
corrupt the mbuf associated with the ixgbe_tx_entry. I debug and test all
part of the worker core code , finding out a bad mbuf or a mbuf corruption
before enqueue on the tx ring. The tx core and the rx core is just the same
as the one in the load balancer core (This apply in our application). Not
issue there. If there is a corruption of the mbuf in the workers code the
segmentation fault has to be before tx queue ring enqueue. (I test several
field of the mbuf before enqueuing it, ->port field , ->data_len ... etc)

In the second test of the load balancer core i could not see a relationship
between the packets drop in the rx core and the mbuf corruption in the
ixgbe_tx_entry.


Waiting for some advices...

Regards

Ariel Horacio Rodriguez.













On Tue, Nov 10, 2015 at 8:50 AM, Ariel Rodriguez 
wrote:

> Thank you very much for your rapid response.
>
> 1) IO part is the same as load balancer. The worker part is different. The
> tx part use qos scheduler framework also. I will try to run the example and
> see what happends.
>
> 2) yes i can. I will do that too.
>
> The nic is 82599ES 10-Gigabit SFI/SFP+ with tapped traffic (is a hardware
> bypass device silicom vendor).
>
> I develop a similar app without the tx part. It just received a copy of
> the traffic (around 6gbps and 40 concurrent flows) and then free the
> mbufs. It works like a charm.
>
> Is strange this issue ... If i disabled the qos scheduler code and the tx
> code dropping all packets instead of rte_eth_tx_burst ( is like disabling
> tx core) the issue is happening in rte_eth_rx_burst returning corrupted
> mbuf (rx core)
>
> Could the nic behave anormally?
>
> I will try the 2 things you comment before.
>
> Regards .
>
> Ariel Horacio Rodriguez
> On Tue, Nov 10, 2015 at 01:35:21AM -0300, Ariel Rodriguez wrote:
> > Dear dpdk experts.
> >
> > Im having a recurrent segmentation fault under the
> > function ixgbe_tx_free_bufs (ixgbe_rxtx.c:150) (i enable -g3 -O0).
> >
> > Surfing the core dump i find out this:
> >
> > txep = &(txq->sw_ring[txq->tx_next_dd - (txq->tx_rs_thresh - 1)]);
> >
> > txq->tx_next_dd = 31
> > txq->txq->tx_rs_thresh=32
> >
> > Obviosly txep points out to the first element but
> >
> > *(txep).mbuf == INVALID MBUF ADDRESS
> >
> > The same applies to
> >
> > *(txep+1).mbuf ; *(txep +2).mbuf;*(txep+3).mbuf
> >
> > from *(txep+4) .mbuf to *(txep+31).mbuf seems to be valid because im able
> > to derefence the mbuf's
> >
> >
> > Note:
> >
> > I disable CONFIG_RTE_IXGBE_INC_VECTOR because i gets similiar behavior ,
> I
> > thought the problem would disappear disabling that feature.
> >
> >
> > the program always  runs well up to 4 or 5 hours and then crash ...
> always
> > in the same line.