> -----Original Message----- > From: Zhang, Qi Z > Sent: Friday, August 30, 2019 7:32 AM > To: Rong, Leyi <leyi.r...@intel.com>; Ye, Xiaolong <xiaolong...@intel.com>; > Wang, Haiyue <haiyue.w...@intel.com>; Lu, Wenzhuo <wenzhuo...@intel.com> > Cc: dev@dpdk.org > Subject: RE: [PATCH v2 6/6] net/ice: switch to Rx flexible descriptor in AVX > path > > > > * take the two sets of status bits and merge to one @@ - > 450,20 > > +452,22 @@ _ice_recv_raw_pkts_vec_avx2(struct ice_rx_queue *rxq, > > +struct > > rte_mbuf **rx_pkts, > > /* get only flag/error bits we want */ > > const __m256i flag_bits = > > _mm256_and_si256(status0_7, flags_mask); > > - /* set vlan and rss flags */ > > - const __m256i vlan_flags = > > - _mm256_shuffle_epi8(vlan_flags_shuf, flag_bits); > > - const __m256i rss_flags = > > - _mm256_shuffle_epi8(rss_flags_shuf, > > - _mm256_srli_epi32(flag_bits, 11)); > > /** > > * l3_l4_error flags, shuffle, then shift to correct adjustment > > * of flags in flags_shuf, and finally mask out extra bits > > */ > > __m256i l3_l4_flags = _mm256_shuffle_epi8(l3_l4_flags_shuf, > > - _mm256_srli_epi32(flag_bits, 22)); > > + _mm256_srli_epi32(flag_bits, 4)); > > l3_l4_flags = _mm256_slli_epi32(l3_l4_flags, 1); > > l3_l4_flags = _mm256_and_si256(l3_l4_flags, cksum_mask); > > + /* set rss and vlan flags */ > > + const __m256i rss_vlan_flag_bits = > > + _mm256_srli_epi32(flag_bits, 12); > > + const __m256i rss_flags = > > + _mm256_shuffle_epi8(rss_flags_shuf, > rss_vlan_flag_bits); > > + const __m256i vlan_flags = > > + _mm256_shuffle_epi8(vlan_flags_shuf, > > + rss_vlan_flag_bits); > > Seems we can "or" rss_flags_shuf and vlan_flags_shuf, so just need to do one > shuffle here to save some CPU cycles? >
That's make sense literally, will do some benchmarking test for this adjustment:). > > > > /* merge flags */ > > const __m256i mbuf_flags = _mm256_or_si256(l3_l4_flags, > > -- > > 2.17.1