https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104151
--- Comment #3 from Hongtao.liu <crazylht at gmail dot com> --- 172_1 1 times scalar_store costs 12 in body 173_2 1 times scalar_store costs 12 in body 174__builtin_bswap64 (_8) 1 times scalar_stmt costs 4 in body 175__builtin_bswap64 (_10) 1 times scalar_stmt costs 4 in body 176BIT_FIELD_REF <a_3(D), 64, 64> 1 times scalar_stmt costs 4 in body 177BIT_FIELD_REF <a_3(D), 64, 0> 1 times scalar_stmt costs 4 in body 1781<unknown> 1 times vec_perm costs 4 in body 179__builtin_bswap64 (_8) 1 times vector_stmt costs 4 in prologue 180__builtin_bswap64 (_8) 1 times vec_perm costs 4 in body 181_1 1 times vector_store costs 16 in body 182test.cc:9:10: note: Cost model analysis for part in loop 0: 183 Vector cost: 28 184 Scalar cost: 40 ... 243 <bb 2> [local count: 1073741824]: 244 _8 = BIT_FIELD_REF <a_3(D), 64, 64>; 245 _11 = VIEW_CONVERT_EXPR<vector(2) long unsigned int>(a_3(D)); 246 _13 = VIEW_CONVERT_EXPR<vector(2) long unsigned int>(a_3(D)); 247 _12 = VEC_PERM_EXPR <_11, _13, { 1, 0 }>; 248 _14 = VIEW_CONVERT_EXPR<vector(16) char>(_12); 249 _15 = VEC_PERM_EXPR <_14, _14, { 7, 6, 5, 4, 3, 2, 1, 0, 15, 14, 13, 12, 11, 10, 9, 8 }>; 250 _16 = VIEW_CONVERT_EXPR<vector(2) long unsigned int>(_15); 251 _1 = __builtin_bswap64 (_8); 252 _10 = BIT_FIELD_REF <a_3(D), 64, 0>; 253 _2 = __builtin_bswap64 (_10); 254 MEM <vector(2) long unsigned int> [(long unsigned int *)&y] = _16; 255 _7 = MEM <uint128_t> [(char * {ref-all})&y]; 1. According to ABI, uint128 is passed by 2 gpr, and there should be extra cost for _11 = VIEW_CONVERT_EXPR<vector(2) long unsigned int>(a_3(D)); 2. Why there's 1781<unknown> 1 times vec_perm costs 4 in body, should it be 2 times vec_perm costs?