[dpdk-dev] [PATCH] mem: fix include in rte_malloc

2013-07-31 Thread didier.pallard
On 07/31/2013 11:18 AM, Thomas Monjalon wrote:
> The function rte_malloc_virt2phy has a dependency on rte_memory.h:
> phys_addr_t must be defined.
>
> The dependency handling for apps was wrong in the commit 8c86825cbf.
> Let's revert this part.
>
> Reported-by: Jia Sui 
> Signed-off-by: Thomas Monjalon 

Acked-by: Didier Pallard 


[dpdk-dev] [PATCH 1/2] mem: add write memory barrier before changing heap state

2014-04-16 Thread didier.pallard
On 04/15/2014 04:08 PM, Richardson, Bruce wrote:
>> -Original Message-
>> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of David Marchand
>> Sent: Tuesday, April 15, 2014 2:51 PM
>> To: dev at dpdk.org
>> Subject: [dpdk-dev] [PATCH 1/2] mem: add write memory barrier before
>> changing heap state
>>
>> From: Didier Pallard 
>>
>> a write memory barrier is needed before changing heap state value, else some
>> concurrent core may see state changing before all initialization values are
>> written to memory, causing unpredictable results in malloc function.
>>
>> Signed-off-by: Didier Pallard 
> No barrier should be necessary here. As in a number of other places, such as 
> rings, compiler barriers can be used in place of write memory barriers, due 
> to IA ordering rules. However, in this case, both variables referenced are 
> volatile variables and so the assignments to them cannot be reordered by the 
> compiler so no compiler barrier is necessary either.
>
> Regards,
> /Bruce

Hi bruce,

Indeed a compiler barrier is absolutely needed here. volatile variable 
use is absolutely not a serializing instruction from compiler point of 
view; only atomic variable use is serializing, due to asm volatile 
(memory) directive use.
Here is the assembler generated with and without rte_wmb:


With rte_wmb

  142:   f0 45 0f b1 07  lock cmpxchg %r8d,(%r15)
  147:   0f 94 c0sete   %al
  14a:   84 c0   test   %al,%al
  14c:   74 ea   je 138 
  14e:   49 c7 47 10 00 00 00movq   $0x0,0x10(%r15)
  155:   00
  156:   41 c7 47 18 00 00 00movl   $0x0,0x18(%r15)
  15d:   00
  15e:   41 c7 47 08 00 00 00movl   $0x0,0x8(%r15)
  165:   00
  166:   41 c7 47 1c 00 00 00movl   $0x0,0x1c(%r15)
  16d:   00
  16e:   49 c7 47 20 00 00 00movq   $0x0,0x20(%r15)
  175:   00
  176:   45 89 57 04 mov%r10d,0x4(%r15)
  17a:   0f ae f8sfence
* 17d:   41 c7 07 02 00 00 00movl   $0x2,(%r15)**
** 184:   41 8b 37mov(%r15),%esi**
* 187:   83 fe 02cmp$0x2,%esi
  18a:   75 b4   jne140 
  18c:   0f 1f 40 00 nopl   0x0(%rax)
  190:   48 83 c3 3f add$0x3f,%rbx


Without rte_wmb

  142:   f0 45 0f b1 07  lock cmpxchg %r8d,(%r15)
  147:   0f 94 c0sete   %al
  14a:   84 c0   test   %al,%al
  14c:   74 ea   je 138 
  14e:   49 c7 47 10 00 00 00movq   $0x0,0x10(%r15)
  155:   00
  156:   41 c7 47 08 00 00 00movl   $0x0,0x8(%r15)
  15d:   00
* 15e:   41 c7 07 02 00 00 00movl   $0x2,(%r15)**
** 165:   41 8b 37mov(%r15),%esi**
* 168:   41 c7 47 18 00 00 00movl $0x0,0x18(%r15)
  16f:   00
  170:   41 c7 47 1c 00 00 00movl   $0x0,0x1c(%r15)
  177:   00
  178:   49 c7 47 20 00 00 00movq   $0x0,0x20(%r15)
  17f:   00
  180:   45 89 57 04 mov%r10d,0x4(%r15)
  184:   83 fe 02cmp$0x2,%esi
  187:   75 b7   jne140 
  189:   0f 1f 80 00 00 00 00nopl   0x0(%rax)
  190:   48 83 c3 3f add$0x3f,%rbx

It's clear that the *heap->initialised = INITIALISED;* instruction has 
been reordered by the compiler.

About rte_wmb and rte_rmb use, i agree with you that on intel 
architecture those macro should do nothing more than compiler barrier, 
due to Intel architecture choices.
But for code completness, i think those memory barriers should remain in 
place in the code, and rte_*mb should map to compiler barrier on intel 
architecture.

didier




[dpdk-dev] [PATCH v4 3/3] mbuf:replace the inner_l2_len and the inner_l3_len fields

2014-12-02 Thread didier.pallard
Hello,

On 12/02/2014 07:52 AM, Jijiang Liu wrote:
> Replace the inner_l2_len and the inner_l3_len field with the outer_l2_len and 
> outer_l3_len field, and rework csum forward engine and i40e PMD due to  these 
> changes.
>
> Signed-off-by: Jijiang Liu 
[...]
> --- a/lib/librte_mbuf/rte_mbuf.h
> +++ b/lib/librte_mbuf/rte_mbuf.h
> @@ -276,8 +276,8 @@ struct rte_mbuf {
>   uint64_t tso_segsz:16; /**< TCP TSO segment size */
>   
>   /* fields for TX offloading of tunnels */
> - uint64_t inner_l3_len:9; /**< inner L3 (IP) Hdr Length. 
> */
> - uint64_t inner_l2_len:7; /**< inner L2 (MAC) Hdr 
> Length. */
> + uint64_t outer_l3_len:9; /**< Outer L3 (IP) Hdr Length. 
> */
> + uint64_t outer_l2_len:7; /**< Outer L2 (MAC) Hdr 
> Length. */
>   
>   /* uint64_t unused:8; */
>   };

Sorry for entering lately this discussion, but i'm not convinced by the 
choice of outer_lx_len rather than inner_lx_len for new fields.
I agree with Olivier that new flags should only be related to the use of 
new fields, to maintain coherency with oldest implementations.
But from a stack point of view, i think it is better to have lx_len 
fields that target the outer layers, and to name new fields inner_lx_len.

Let's discuss the two possibilities.

1) outer_lx_len fields are introduced.
In this case, the stack should have knowledge that it is processing 
tunneled packets to use outer_lx_len rather than lx_len,
or stack should always use outer_lx_len packet and move those fields to 
lx_len packets if no tunneling occurs...
I think it will induce extra processing that does not seem to be really 
needed.

2) inner_lx_len fields are introduced.
In this case, the stack first uses lx_len fields. When packet should be 
tunneled, lx_len fields are moved to inner_lx_len fields.
Then the stack can process the outer layer and still use the lx_len fields.

For  example:
an eth/IP/TCP forged packet will look like this:

Ether/IP/UDP/xxx
   m->flags = IP_CKSUM
   m->l2_len = sizeof(ether)
   m->l3_len = sizeof(ip)
   m->l4_len = sizeof(udp)
   m->inner_l2_len = 0
   m->inner_l3_len = 0

When entering tunnel for example a VXLAN interface, lx_len will be moved 
to inner_lx_len

Ether/IP/UDP/xxx
   m->flags = INNER_IP_CKSUM
   m->l2_len = 0
   m->l3_len = 0
   m->l4_len = 0
   m->inner_l2_len = sizeof(ether)
   m->inner_l3_len = sizeof(ip)


once complete encapsulation is processed by the stack, the packet will 
look like

Ether/IP/UDP/VXLAN/Ether/IP/UDP/xxx
   m->flags = IP_CKSUM | INNER_IP_CKSUM
   m->l2_len = sizeof(ether)
   m->l3_len = sizeof(ip)
   m->l4_len = sizeof(udp)
   m->inner_l2_len = sizeof(ether) + sizeof (vxlan)
   m->inner_l3_len = sizeof(ip)


didier




[dpdk-dev] [RFC PATCH 2/2] ixgbe: release software locked semaphores on initialization

2014-02-24 Thread didier.pallard
v.c or something)?
> Thanks
> Konstantin
>
> -Original Message-
> From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> Sent: Wednesday, February 19, 2014 4:52 PM
> To: Ananyev, Konstantin
> Cc: Didier Pallard; dev at dpdk.org
> Subject: Re: [dpdk-dev] [RFC PATCH 2/2] ixgbe: release software locked 
> semaphores on initialization
>
> 19/02/2014 13:41, Ananyev, Konstantin:
>> Can the patch be reworked to keep changes under librte_pmd_ixgbe/ixgbe
>> directory untouched? Those files are derived directly from the BSD
>> driver baseline, and any changes will make future merges of newer code
>> more challenging. The changes should be limited to files in the
>> librte_pmd_ixgbe directory (and ethdev). Thanks
> Please, could you send an url to the BSD driver baseline ?
> By the way, git is very good at rebasing such patches.
> And if the fix is good, it should be applied on the baseline.
> Refusing a fix without alternative is not an option.
>
> --
> Thomas
> --
> Intel Shannon Limited
> Registered in Ireland
> Registered Office: Collinstown Industrial Park, Leixlip, County Kildare 
> Registered Number: 308263 Business address: Dromore House, East Park, 
> Shannon, Co. Clare
>
> This e-mail and any attachments may contain confidential material for the 
> sole use of the intended recipient(s). Any review or distribution by others 
> is strictly prohibited. If you are not the intended recipient, please contact 
> the sender and delete all copies.
>
>
> --
> Intel Shannon Limited
> Registered in Ireland
> Registered Office: Collinstown Industrial Park, Leixlip, County Kildare
> Registered Number: 308263
> Business address: Dromore House, East Park, Shannon, Co. Clare
>
> This e-mail and any attachments may contain confidential material for the 
> sole use of the intended recipient(s). Any review or distribution by others 
> is strictly prohibited. If you are not the intended recipient, please contact 
> the sender and delete all copies.
>
>


-- 
Didier PALLARD
6WIND
Software Engineer

Tel: +33 1 39 30 92 46
Mob: +33 6 49 11 40 14
Fax: +33 1 39 30 92 11
didier.pallard at 6wind.com
www.6wind.com

This e-mail message, including any attachments, is for the sole use of the 
intended recipient(s) and contains information that is confidential and 
proprietary to 6WIND. All unauthorized review, use, disclosure or distribution 
is prohibited. If you are not the intended recipient, please contact the sender 
by reply e-mail and destroy all copies of the original message.

Ce courriel ainsi que toutes les pi?ces jointes, est uniquement destin? ? son 
ou ses destinataires. Il contient des informations confidentielles qui sont la 
propri?t? de 6WIND. Toute r?v?lation, distribution ou copie des informations 
qu'il contient est strictement interdite. Si vous avez re?u ce message par 
erreur, veuillez imm?diatement le signaler ? l'?metteur et d?truire toutes les 
donn?es re?ues



[dpdk-dev] [RFC PATCH 2/2] ixgbe: release software locked semaphores on initialization

2014-02-26 Thread didier.pallard
Hi,

Well, about ixgbe, you're right, i didn't notice that the 
ixgbe_release_swfw_sync_X540 function is releasing the SMBI lock, even 
if it is not the owner.
I'm not sure it is really safe to do that, but it may allow to release 
the locks from outside the base driver in all cases using only 
ixgbe_acquire/release_swfw_semaphore API functions.

for igb, in my tests, i found that i had to put lock release between 
e1000_init_mac_params and e1000_init_nvm_params because all necessary 
setup was not done
in hw structure before e1000_init_mac_params call (so 
e1000_setup_init_funcs(hw, FALSE) was not enough to have all settings 
done to be able to reset the locks).
But perhaps it may be overcome by a call to another exported function of 
the API. I think that what i was needing is a call to 
mac->ops.set_lan_id(hw) that is also done
in e1000_get_bus_info_pcie_generic function. So something like that 
should work:

e1000_setup_init_funcs(hw, FALSE);
e1000_get_bus_info(hw);
e1000_reset_swfw_lock(hw);
e1000_setup_init_funcs(hw, TRUE);

I will try this patch.

thanks


On 02/25/2014 01:57 AM, Ananyev, Konstantin wrote:
> Hi,
> About e1000 - I suppose you refer to the 
> eth_igb_dev_init()->e1000_setup_init_funcs(hw, TRUE)-> 
> e1000_init_nvm_params(hw)?
>   If so, I suppose we can do something like that to overcome it:
> e1000_setup_init_funcs(hw, FALSE);
> e1000_reset_swfw_lock(hw);
> e1000_setup_init_funcs(hw, TRUE);
>   ?
> First setup_init_funcs() would just setup hw func pointers and wouldn't call 
> e1000_init_nvm_params/ e1000_init_phy_params.
> Then we reset the lock, then call setup_funcs once again - that time it would 
> read/write HW regs.
> Konstantin
>
> -Original Message-
> From: didier.pallard [mailto:didier.pallard at 6wind.com]
> Sent: Monday, February 24, 2014 2:19 PM
> To: Ananyev, Konstantin
> Cc: Thomas Monjalon; dev at dpdk.org
> Subject: Re: [dpdk-dev] [RFC PATCH 2/2] ixgbe: release software locked 
> semaphores on initialization
>
> Hi,
>
> The patch (or some derivative that do the same result) should probably rather 
> be integrated in base driver.
>
> For IGB implementation, it is not possible to extract patch from base driver, 
> since lock release should be done before calls to e1000_init_nvm_params or 
> e1000_init_phy_params that use the potentially stuck locks and after enough 
> function pointers fields are filled by e1000_setup_init_funcs to have 
> functions to access the hardware.
> For ixgbe, it may be possible on 82598/82599 using ixgbe_xxx_swfw_semaphore 
> to do the job from outside the base driver, assuming that no lock will never 
> be taken by the base driver before the return of ixgbe_init_shared_code 
> function. But it is not be possible on X540, since this implementation does 
> not use the ixgbe_get_eeprom_semaphore generic function that automatically 
> release the SMBI lock on timeout; So the release of this lock should be done 
> using ixgbe_release_swfw_sync_semaphore that is not accessible through the 
> API.
>
> didier
>
>
> On 02/21/2014 05:30 PM, Ananyev, Konstantin wrote:
>> To be more specific, I am talking about something like the patch below.
>> It is just a copy-paste of your fix, but implemented and called from
>> ixgbe_ethdev.c Konstantin
>>
>> diff --git a/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
>> b/lib/librte_pmd_ixgbe/ixgbe_et hdev.c index 7e068c2..5d8744a 100644
>> --- a/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
>> +++ b/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
>> @@ -561,6 +561,42 @@ ixgbe_dcb_init(struct ixgbe_hw *hw,struct
>> ixgbe_dcb_config
>> *dcb_config)
>>   }
>>}
>>
>> +static void
>> +ixgbe_swfw_lock_reset(struct ixgbe_hw *hw) {
>> +   uint16_t mask;
>> +
>> +   /* Get bus info */
>> +   hw->mac.ops.get_bus_info(hw);
>> +
>> +   /* Ensure that all locks are released before first NVM or PHY
>> + access */
>> +
>> +   /*
>> +* Phy lock should not fail in this early stage. If this is the case,
>> +* it is due to an improper exit of the application.
>> +* So force the release of the faulty lock. Release of common lock
>> +* is done automatically by swfw_sync function.
>> +*/
>> +   mask = IXGBE_GSSR_PHY0_SM << hw->bus.func;
>> +   if (hw->mac.ops.acquire_swfw_sync(hw, mask) < 0) {
>> +   DEBUGOUT1("SWFW phy%d lock released", hw->bus.func);
>> +   }
>> +   hw->mac.ops.release_swfw_sync(hw, mask);
>> +
>> +   /*
>> +* Those one are more tricky since they are common to all ports; but
>> +* swfw_sync

[dpdk-dev] [PATCH] timer: add lfence before TSC read

2014-01-27 Thread didier.pallard
Yes, i will add a new function that includes the lfence;

for the performance penalty, we did not see noticable performance impact 
on our full software, so we did not see any reason to use 2 functions, 
but it's certainly because we make a very limited number of calls to 
rdtsc and it's true that it is highly application dependant, so 2 
functions are probably better. But if using the unaccurate function, you 
may have some hard time the first time you want to debug or do some 
precise measures, since the measure is not always done when expected. 
And generally, especially when debugging, you're not focusing at first 
on the function you're using to debug...
i don't know how to do to be sure that people will be aware of the 
problem and do not lose time on the same problem, i will try to add some 
kind of warning in rte_rdtsc function itself.
But perhaps should it be better to use the precise version as default 
one and let the optimized version with another name to be use on purpose 
when accuracy is not important; By default, i think we generaly suppose 
a time reading function to be accurate...

thanks
didier

On 01/27/2014 10:57 AM, Thomas Monjalon wrote:
> 24/01/2014 12:42, Fran?ois-Fr?d?ric Ozog:
>> IMHO, adding the lfence for all cases is introducing an un-necessary
>> performance penalty.
>>
>> What about adding rte_rdtsc_sync() or rte_rdtsc_serial() with the comment
>> about the rdtsc instruction behavior so that developers can choose which
>> form they want?
> Yes it could be a good idea in some cases. Didier, could you try to add such
> function ?
>
> But in some debugging cases we need to have high precision for almost all
> timestamps. Here I don't know what is the smartest solution.
>
> Thank you for commenting. Hope we'll find a good fix.




[dpdk-dev] DMAR errors when running testpmd on kernel >= 3.15

2014-07-22 Thread didier.pallard
Hello,

I am testing testpmd on recent kernels (>= 3.15) and i have an IOMMU 
problem (there are several commits around IOMMU in v3.15, they may be 
related with this problem):

Hardware:
Platform: Intel S2600IP
CPU: Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz
BIOS version: SE5C600.86B.02.03.0003.041920141333
NICs:
 05:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit 
SFI/SFP+ Network Connection (rev 01)
 05:00.1 Ethernet controller: Intel Corporation 82599ES 10-Gigabit 
SFI/SFP+ Network Connection (rev 01)
 07:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit 
SFI/SFP+ Network Connection (rev 01)
 07:00.1 Ethernet controller: Intel Corporation 82599ES 10-Gigabit 
SFI/SFP+ Network Connection (rev 01)

Software:
Linux distribution: Fedora 20
Linux kernel: >= 3.15 (tested with 3.15.6-200 fc20 and 3.16.0-rc6 vanilla)
Linux boot line: BOOT_IMAGE=/boot/vmlinuz-3.15.6-200.fc20.x86_64 
root=/dev/sda5 console=tty1 console=ttyS0,115200n8 intel_iommu=on iommu=pt
DPDK version: 1.7.0 (also tested on master, with same result)

Test: I run the following commands to start the test:
echo 1024 >  /proc/sys/vm/nr_hugepages
mount -t hugetlbfs nodev /mnt/huge
modprobe uio
insmod dpdk/build/kmod/igb_uio.ko
dpdk/tools/dpdk_nic_bind.py --bind=igb_uio 05:00.0 05:00.1 07:00.0 07:00.1
dpdk/build/app/testpmd  --huge-dir /mnt/huge -n 4 -c 0x0001fe0001ff 
--socket-mem=512,0 -- --socket-num=0 -i --rxq=4 --txq=16 --portmask=0xf 
--rxd=128 --rxfreet=32 --rxpt=8 --rxht=8 --rxwt=0 --txd=512 --txfreet=32 
--txpt=32 --txht=0  --txwt=0 --txrst=32 --txqflags=0xF00 --burst=32 
--mbcache=250 --total-num-mbufs=64000 --coremask=0x0001fe0001fe

When i start the traffic, i have the following messages displayed on screen:
[ 1819.770528] dmar: DMAR:[DMA Read] Request device [05:00.0] fault addr 
3c416
[ 1819.770528] DMAR:[fault reason 02] Present bit in context entry is clear
[ 1819.786243] dmar: DMAR:[DMA Read] Request device [07:00.1] fault addr 
3ce399000
[ 1819.786243] DMAR:[fault reason 02] Present bit in context entry is clear
[ 1819.801955] dmar: DMAR:[DMA Read] Request device [05:00.1] fault addr 
3ce119000
[ 1819.801955] DMAR:[fault reason 02] Present bit in context entry is clear
[ 1819.817668] dmar: DMAR:[DMA Read] Request device [07:00.0] fault addr 
3ce269000
[ 1819.817668] DMAR:[fault reason 02] Present bit in context entry is clear

and nothing flows through the testpmd ports.

The problem does not appear:
   - with kernel older than 3.15
   - if interfaces are bound using vfio-pci instead of igb_uio
   - if iommu is disabled in bootline (remove intel_iommu=on iommu=pt 
from kernel boot line)

Sometimes, only two ports instead of four have DMAR problems, the two 
others forward packets normally.

Does someone know what may happen?

thanks
didier




[dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh above 1 for all NICs but 82598

2015-09-11 Thread didier.pallard
On 08/25/2015 08:52 PM, Vlad Zolotarov wrote:
>
> Helin, the issue has been seen on x540 devices. Pls., see a chapter 
> 7.2.1.1 of x540 devices spec:
>
> A packet (or multiple packets in transmit segmentation) can span any 
> number of
> buffers (and their descriptors) up to a limit of 40 minus WTHRESH 
> minus 2 (see
> Section 7.2.3.3 for Tx Ring details and section Section 7.2.3.5.1 for 
> WTHRESH
> details). For best performance it is recommended to minimize the 
> number of buffers
> as possible.
>
> Could u, pls., clarify why do u think that the maximum number of data 
> buffers is limited by 8?
>
> thanks,
> vlad

Hi vlad,

Documentation states that a packet (or multiple packets in transmit 
segmentation) can span any number of
buffers (and their descriptors) up to a limit of 40 minus WTHRESH minus 2.

Shouldn't there be a test in transmit function that drops properly the 
mbufs with a too large number of
segments, while incrementing a statistic; otherwise transmit function 
may be locked by the faulty packet without
notification.

thanks
didier