Hi, I have recently added ARM support for builtin_bswap16, which uses the rev16 instruction when dealing with an unsigned argument.
Considering: unsigned short myfunc(unsigned short x) { return __builtin_bswap16(x); } gcc -O2 generates: myfunc: rev16 r0, r0 uxth r0, r0 bx lr I'd like to get rid of the zero extension, which is not needed since r0's 16 upper bits are zero on input. Note that rev16 actually operates on a 32 bits value and swaps the bytes in each halfword of a 32 bits register. After discussions with Ulrich, I have changed the machine description of bswaphi2 to: (define_insn "arm_rev16_new" [(set (match_operand:SI 0 "s_register_operand" "=l,l,r") (ior:SI (and:SI (ashift:SI (match_operand:SI 1 "s_register_operand" "l,l,r") (const_int 8)) (const_int 4278255360)) (and:SI (lshiftrt:SI (match_dup 1) (const_int 8)) (const_int 16711935))))] "arm_arch6" "@ rev16\t%0, %1 rev16%?\t%0, %1 rev16%?\t%0, %1" [(set_attr "arch" "t1,t2,32") (set_attr "length" "2,2,4")] ) (define_expand "bswaphi2" [(set (match_operand:HI 0 "s_register_operand" "") (bswap:HI (match_operand:HI 1 "s_register_operand" "")))] "arm_arch6" { rtx in = gen_lowpart (SImode, operands[1]); rtx out = gen_lowpart (SImode, operands[0]); emit_insn (gen_arm_rev16_new (out, in)); DONE; } ) Now, this exposes the fact that rev16 also changes the 16 upper bits, but the generated code is still the same. I have been trying to understand why combine does not manage to infer that the zero extension is superfluous. Before RTL, the gimple IR contains: myfunc (short unsigned int x) { short unsigned int _2; ;; basic block 2, loop depth 0 ;; pred: ENTRY _2 = __builtin_bswap16 (x_1(D)); [tail call] return _2; ;; succ: EXIT } Before combine, the RTL is: (note 4 0 2 2 [bb 2] NOTE_INSN_BASIC_BLOCK) (insn 2 4 3 2 (set (reg/v:SI 112 [ x ]) (reg:SI 0 r0 [ x ])) rev16.c:11 636 {*arm_movsi_vfp} (expr_list:REG_DEAD (reg:SI 0 r0 [ x ]) (nil))) (note 3 2 6 2 NOTE_INSN_FUNCTION_BEG) (insn 6 3 7 2 (set (subreg:SI (reg:HI 113) 0) (ior:SI (and:SI (ashift:SI (reg/v:SI 112 [ x ]) (const_int 8 [0x8])) (const_int 4278255360 [0xff00ff00])) (and:SI (lshiftrt:SI (reg/v:SI 112 [ x ]) (const_int 8 [0x8])) (const_int 16711935 [0xff00ff])))) rev16.c:17 354 {arm_rev16_new} (expr_list:REG_DEAD (reg/v:SI 112 [ x ]) (nil))) (insn 7 6 12 2 (set (reg:SI 110 [ D.4971 ]) (zero_extend:SI (reg:HI 113))) rev16.c:17 166 {*arm_zero_extendhisi2_v6} (expr_list:REG_DEAD (reg:HI 113) (nil))) (insn 12 7 15 2 (set (reg/i:SI 0 r0) (reg:SI 110 [ D.4971 ])) rev16.c:19 636 {*arm_movsi_vfp} (expr_list:REG_DEAD (reg:SI 110 [ D.4971 ]) (nil))) (insn 15 12 0 2 (use (reg/i:SI 0 r0)) rev16.c:19 -1 (nil)) Stepping inside set_nonzero_bits_and_sign_copies() indicates that: - insn 2 has nonzero_bits = 65535, and sign_bit_copies = 16 - insn 6 has nonzero_bits = 65535 and sign_bit_copies = 1 - insn 7 has nonzero_bits = 65535 and sign_bit_copies = 16 Any suggestion about how I could avoid generating this zero_extension? Thanks, Christophe.