On 02/16/2014 08:21 AM, Paolo Bonzini wrote: > Il 31/01/2014 15:43, Richard Henderson ha scritto: >> + gen_shift_maybe_vex: >> + if (have_bmi2 && !const_args[2]) { >> + tcg_out_vex_modrm(s, vexop + rexw, args[0], args[2], args[1]); >> + break; >> + } >> + /* FALLTHRU */ > > What if args[2] happens to be ECX?
I ran some measurements and as I expected this basically never happens. For 64-bit, I never saw it occur. For 32-bit, 1/800 of all shifts used ecx. For 64-bit, the use of shlx et al is always a size win. The mov and shift, including their rex prefixes, are 3 bytes each, while the shlx is 5 byes. For 32-bit, things are more complicated. The mov and shift are 2 bytes each, so the use of shlx is by itself a 1 byte size penalty. Except that sometimes the avoidance of the mov results in fewer spills, and thus fewer bytes overall. So overall I see the barest fraction (< 0.01%) size decrease across all TBs. r~