On Fri, Mar 6, 2015 at 11:24 PM, Andi Kleen <[email protected]> wrote: > Denys Vlasenko <[email protected]> writes: > >> By the nature of TEST operation, it is often possible >> to test a narrower part of the operand: >> "testl $3, mem" -> "testb $3, mem", >> "testq $3, %rcx" -> "testb $3, %cl" >> This results in shorter insns, because TEST insn has no >> sign-entending byte-immediate forms unlike other ALU ops. > > It also results in expensive LCP stalls. Please don't do it. > If you feel the need to change instructions around like this read > the optimization manuals first.
Length-changing prefix (LCP) stalls result from 0x66 prefix. (See https://software.intel.com/en-us/forums/topic/328256). Basically, LCP happens because adding 66 byte before this instruction: [test_opcode] [modrm] [imm32] changes it to [66] [test_opcode] [modrm] [imm16] where [imm16] has *different length* now: 2 bytes instead of 4. This confuses decoder. REX prefixes were carefully designed to almost never hit this case: adding REX prefix does not change instruction length except MOVABS and MOV [addr],RAX insn. My patch does not add optimizations which would use 0x66 prefix. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [email protected] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/

