On 14 November 2017 at 00:54, Linus Torvalds <torva...@linux-foundation.org> wrote: > On Mon, Nov 13, 2017 at 3:30 PM, Linus Torvalds > <torva...@linux-foundation.org> wrote: >> >> So let's just rewrite that mnt_flags conversion that way, justr to get >> gcc to generate the obvious code. > > Oh wow. I tried to do the same thing in fs/namespace.c where it does > the reverse bit translation, and gcc makes a _horrible_ mess of it and > actually makes the code much worse because for some reason the pattern > doesn't trigger.
[trimming cc list] Can you be more specific? That's not what I see with gcc 7.1. I've found two blocks where I replaced the if's with the ternary expressions, one in clone_mnt, one in do_mount. In clone_mnt, gcc-7.1 seems to do (first instruction is the unconditional |= MNT_LOCK_ATIME) 10d7: 0d 00 00 04 00 or $0x40000,%eax 10dc: 89 43 30 mov %eax,0x30(%rbx) 10df: a8 02 test $0x2,%al 10e1: 74 08 je 10eb <clone_mnt+0x9b> 10e3: 0d 00 00 20 00 or $0x200000,%eax 10e8: 89 43 30 mov %eax,0x30(%rbx) 10eb: a8 01 test $0x1,%al 10ed: 74 08 je 10f7 <clone_mnt+0xa7> 10ef: 0d 00 00 10 00 or $0x100000,%eax 10f4: 89 43 30 mov %eax,0x30(%rbx) and after patching 10cc: 0d 00 00 04 00 or $0x40000,%eax 10d1: 89 c2 mov %eax,%edx 10d3: c1 e2 10 shl $0x10,%edx 10d6: 81 e2 00 00 40 00 and $0x400000,%edx 10dc: 09 d0 or %edx,%eax 10de: 89 c2 mov %eax,%edx 10e0: c1 e2 14 shl $0x14,%edx 10e3: 81 e2 00 00 20 00 and $0x200000,%edx 10e9: 09 c2 or %eax,%edx (with a final store of %eax to 0x30(%rbx)). Either way it's four instructions per flag, but I assume the one without the branches is preferable. Similarly, in do_mount, before we have 3429: 44 89 f8 mov %r15d,%eax 342c: 83 c8 01 or $0x1,%eax 342f: 40 f6 c5 02 test $0x2,%bpl 3433: 44 0f 45 f8 cmovne %eax,%r15d 3437: 44 89 f8 mov %r15d,%eax 343a: 83 c8 02 or $0x2,%eax 343d: 40 f6 c5 04 test $0x4,%bpl 3441: 44 0f 45 f8 cmovne %eax,%r15d but after patching it does something like 3425: 4d 89 fe mov %r15,%r14 3428: 48 c1 ea 07 shr $0x7,%rdx 342c: 49 d1 ee shr %r14 342f: 89 d0 mov %edx,%eax 3431: c1 e1 05 shl $0x5,%ecx 3434: 83 e0 08 and $0x8,%eax 3437: 41 83 e6 07 and $0x7,%r14d 343b: 41 09 c6 or %eax,%r14d 343e: 89 d0 mov %edx,%eax which actually makes use of MS_{NOSUID,NODEV,NOEXEC} all being 1 bit off from their MNT_ counterparts (witness the shr %r14 and the and with 0x7), so there we also cut 37 bytes according to bloat-o-meter. Now, it does seem that older (and not that old in absolute terms) compilers may generate worse code after the transformation - the 'replace with shift-mask-or' pattern doesn't exist until 7.1. E.g. with 5.4 we have $ scripts/bloat-o-meter fs/namespace.o.{0,1}-5.4 add/remove: 0/0 grow/shrink: 2/0 up/down: 50/0 (50) function old new delta do_mount 3153 3195 +42 clone_mnt 768 776 +8 5.4 compiles the "if() ..." construction roughly as 7.1, i.e. with cmovs, but the ternary expressions are transformed into this abomination 336f: 48 89 da mov %rbx,%rdx 3372: 83 e2 04 and $0x4,%edx 3375: 48 83 fa 01 cmp $0x1,%rdx 3379: 19 d2 sbb %edx,%edx 337b: f7 d2 not %edx 337d: 83 e2 02 and $0x2,%edx 3380: 09 d0 or %edx,%eax 3382: 48 89 da mov %rbx,%rdx 3385: 83 e2 08 and $0x8,%edx 3388: 48 83 fa 01 cmp $0x1,%rdx 338c: 19 d2 sbb %edx,%edx 338e: f7 d2 not %edx 3390: 83 e2 04 and $0x4,%edx 3393: 09 d0 or %edx,%eax Was it something like this you saw? Rasmus