Hi Neil,

> The ixgbe pmd currently can't be built without enabling sse instructions at
> compile time.

Actually it can, all you have to do is set RTE_IXGBE_INC_VECTOR=n in your 
config.

>  While sse extensions provide better performance, theres no reason
> that we can't still create builds to run on systems that don't support sse.  
> If
> we modify the ixgbe code to use the __builtin_shuffle and __builtin_popcountll
> functions, I've confirmed that the gcc compiler emits the appropriate sse
> instructions when the provided -march parameter indicates a machine that
> includes sse support, and emits generic code when see isn't available.

I don't think it is ok to blindly replace _mm_shuffle_epi8 with 
__builtin_shuffle.
They are not identical.
I tried your patch on IVB box (gcc 4.8.3, CONFIG_RTE_MACHINE="native").
The result is - ixgbe_recv_pkts_vec() functionality is broken.
See below for more details.
So my vote is NACK.
Konstantin 

1. Code changes:
uint16_t
ixgbe_recv_pkts_vec(void *rx_queue, struct rte_mbuf **rx_pkts,
                uint16_t nb_pkts)
{
        ...
        __m128i shuf_msk;
        ...
        /* mask to shuffle from desc. to mbuf */
        shuf_msk = _mm_set_epi8(
                7, 6, 5, 4,  /* octet 4~7, 32bits rss */
                0xFF, 0xFF,  /* skip high 16 bits vlan_macip, zero out */
                15, 14,      /* octet 14~15, low 16 bits vlan_macip */
                0xFF, 0xFF,  /* skip high 16 bits pkt_len, zero out */
                13, 12,      /* octet 12~13, low 16 bits pkt_len */
                0xFF, 0xFF,  /* skip nb_segs and in_port, zero out */
                13, 12       /* octet 12~13, 16 bits data_len */
                );
         ...
         for (...) {
                __m128i descs[RTE_IXGBE_DESCS_PER_LOOP];
                __m128i pkt_mb1, pkt_mb2, pkt_mb3, pkt_mb4;
                ...
                descs[3] = _mm_loadu_si128((__m128i *)(rxdp + 3)); 
                ...
-               pkt_mb4 = _mm_shuffle_epi8(descs[3], shuf_msk);
+              pkt_mb4 = __builtin_shuffle(descs[3], shuf_msk);
...

2. Code generated before the patch (valid one):
...
vmovdqa 0x4978d(%rip),%xmm0           /* load shuf_msk */
...
vmovdqu 0x30(%rdx),%xmm4                /* load desc[3] */
....
vpshufb %xmm0,%xmm4,%xmm8
....

3. Code generated after the patch applied (broken one):
...
vmovdqu 0x30(%rdx),%xmm3
...
vpunpcklqdq %xmm3,%xmm3,%xmm3       /* !!! ERROR - should be vpshufb !!!! */

4. What happens here?
My understanding:

GCC treats __m128i as vector of two 64bit integers:
/lib/gcc/x86_64-redhat-linux/4.8.3/include/emmintrin.h:typedef long long 
__m128i __attribute__ ((__vector_size__ (16), __may_alias__));

Reply via email to