>

>
>
>> -----Original Message-----
>> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of
>>harish.patil at qlogic.com
>> Sent: Sunday, November 08, 2015 7:40 PM
>> To: dev at dpdk.org
>> Subject: [dpdk-dev] [PATCH] l3fwd: Fix l3fwd crash due to unaligned
>>load/store intrinsics
>>
>> From: Harish Patil <harish.patil at qlogic.com>
>>
>> l3fwd app expects PMDs to return packets whose L2 header is
>> 16-byte aligned due to usage of _mm_load_si128()/_mm_store_si128()
>> intrinsics in the app. However, most of the protocol stacks expects
>> packets such that its IP/L3 header be aligned on a 16-byte boundary.
>>
>> Based on the recommendations received on dpdk-dev, we are changing
>> the l3fwd app to use _mm_loadu_si128()/_mm_loadu_si128() so that the
>> address need not be 16-byte aligned and thereby preventing crash.
>> We have tested that there is no performance impact due to this
>> change.
>>
>> Signed-off-by: Harish Patil <harish.patil at qlogic.com>
>> ---
>
>Acked-by: Konstantin Ananyev <konstantin.ananyev at intel.com>
>
>As a side notice:
>In fact with gcc build I do see a slight regression: ~1%
>for 4 ports over 1 core test-case.
>Though I think the problem is not in the patch itself.
>By some, unknown to me reason, gcc treats aligned and unaligned load/store
>instrincts in a different way (at least for that particular case).
>With aligned load/store in use it generates code that is pretty close to
>the source:
> 4 loads first, then 4 BLENDs, then 4  stores  (with some interfering
>scalar instructions of course).
>But with unaligned ones  gcc starts to mix loads and blends for the same
>register, so now it is:
>load x0; blend x0; load x1; blend x1; ..
>As if the source code was:
>
>te[0] = _mm_loadu_si128(p[0]);
>te[0] =  _mm_blend_epi16(te[0], ve[0], MASK_ETH);
>te[1] = _mm_loadu_si128(p[1]);
>te[1] =  _mm_blend_epi16(te[1], ve[1], MASK_ETH);
>...
>
>So load latency is not hidden any more.
>I tried it with different versions of  - same story for all of them.
>Clang doesn't have such issue and generates similar code for both
>aligned and unaligned instrincts.
>
>The only way to fix it I can think about  -  put rte_compiler_barrier()
>just before the first blend instinct.
>That helped, now there are no noticeable differences in generated code
>and results before and after the patch.
> So I suppose, I'll have to submit a patch after yours one to fix that
>problem.
>Konstantin
>
>

Sure, thanks for verifying and providing fix.

Harish


________________________________

This message and any attached documents contain information from the sending 
company or its parent company(s), subsidiaries, divisions or branch offices 
that may be confidential. If you are not the intended recipient, you may not 
read, copy, distribute, or use this information. If you have received this 
transmission in error, please notify the sender immediately by reply e-mail and 
then delete this message.

Reply via email to