Hi Neil, > The ixgbe pmd currently can't be built without enabling sse instructions at > compile time.
Actually it can, all you have to do is set RTE_IXGBE_INC_VECTOR=n in your config. > While sse extensions provide better performance, theres no reason > that we can't still create builds to run on systems that don't support sse. > If > we modify the ixgbe code to use the __builtin_shuffle and __builtin_popcountll > functions, I've confirmed that the gcc compiler emits the appropriate sse > instructions when the provided -march parameter indicates a machine that > includes sse support, and emits generic code when see isn't available. I don't think it is ok to blindly replace _mm_shuffle_epi8 with __builtin_shuffle. They are not identical. I tried your patch on IVB box (gcc 4.8.3, CONFIG_RTE_MACHINE="native"). The result is - ixgbe_recv_pkts_vec() functionality is broken. See below for more details. So my vote is NACK. Konstantin 1. Code changes: uint16_t ixgbe_recv_pkts_vec(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t nb_pkts) { ... __m128i shuf_msk; ... /* mask to shuffle from desc. to mbuf */ shuf_msk = _mm_set_epi8( 7, 6, 5, 4, /* octet 4~7, 32bits rss */ 0xFF, 0xFF, /* skip high 16 bits vlan_macip, zero out */ 15, 14, /* octet 14~15, low 16 bits vlan_macip */ 0xFF, 0xFF, /* skip high 16 bits pkt_len, zero out */ 13, 12, /* octet 12~13, low 16 bits pkt_len */ 0xFF, 0xFF, /* skip nb_segs and in_port, zero out */ 13, 12 /* octet 12~13, 16 bits data_len */ ); ... for (...) { __m128i descs[RTE_IXGBE_DESCS_PER_LOOP]; __m128i pkt_mb1, pkt_mb2, pkt_mb3, pkt_mb4; ... descs[3] = _mm_loadu_si128((__m128i *)(rxdp + 3)); ... - pkt_mb4 = _mm_shuffle_epi8(descs[3], shuf_msk); + pkt_mb4 = __builtin_shuffle(descs[3], shuf_msk); ... 2. Code generated before the patch (valid one): ... vmovdqa 0x4978d(%rip),%xmm0 /* load shuf_msk */ ... vmovdqu 0x30(%rdx),%xmm4 /* load desc[3] */ .... vpshufb %xmm0,%xmm4,%xmm8 .... 3. Code generated after the patch applied (broken one): ... vmovdqu 0x30(%rdx),%xmm3 ... vpunpcklqdq %xmm3,%xmm3,%xmm3 /* !!! ERROR - should be vpshufb !!!! */ 4. What happens here? My understanding: GCC treats __m128i as vector of two 64bit integers: /lib/gcc/x86_64-redhat-linux/4.8.3/include/emmintrin.h:typedef long long __m128i __attribute__ ((__vector_size__ (16), __may_alias__));