On Mon, Mar 15, 2021 at 06:04:41PM +0100, Sedat Dilek wrote: > make V=1 -j4 LLVM=1 LLVM_IAS=1
So for giggles I checked, neither GCC nor LLVM seem to emit prefix NOPs when building with -march=sandybridge, they always use MOPL. Furthermore, the kernel explicitly sets: -falign-jumps=1 -falign-loops=1, which, when not specified, default to 16 or so. This means that your userspace is *littered* with NOPL, even when you build your entire distro from source with -march=sandybridge. (arch/gentoo FTW I suppose). (The only good new is that recent LLVM has a pass to use alternative instruction encoding in order to grow a basic block in size in order to minimize the amount of NOP it needs to emit at the end in order to satisfy the jump/loop alignment.) So if you *really* deeply care about NOP performance on your SNB, start by teaching LLVM about prefix NOPs and rebuild your complete userspace. At that point, you can do some trivial patches to the kernel to make it use -march=sandybridge and prefix NOPs too. Until that time, the vast majority of NOPs your CPU will execute will be NOPL.