Hi! On Sun, Jun 22, 2025 at 06:13:51PM +0100, David Laight wrote: > On Sun, 22 Jun 2025 11:52:43 +0200 > Christophe Leroy <christophe.le...@csgroup.eu> wrote: > > e500 has the isel instruction which allows selecting one value or > > the other without branch and that instruction is not speculative, so > > use it. Allthough GCC usually generates code using that instruction, > > it is safer to use inline assembly to be sure. The result is:
The instruction (which is a standard Power instruction since architecture version 2.03, published in 2006) can in principle be speculative, but there exist no Power implementations that do any data speculation like this at all. If you want any particular machine instructions to be generated you have to manually write it, sure, in inline asm or preferably in actual asm. But you can be sure that GCC will generate isel or similar (like the v3.1 set[n]bc[r] insns, best instructions ever!), whenever appropriate, i.e. when it is a) allowed at all, and b) advantageous. > > 14: 3d 20 bf fe lis r9,-16386 > > 18: 7c 03 48 40 cmplw r3,r9 > > 1c: 7c 69 18 5e iselgt r3,r9,r3 > > > > On other ones, when kernel space is over 0x80000000 and user space > > is below, the logic in mask_user_address_simple() leads to a > > 3 instruction sequence: > > > > 14: 7c 69 fe 70 srawi r9,r3,31 > > 18: 7c 63 48 78 andc r3,r3,r9 > > 1c: 51 23 00 00 rlwimi r3,r9,0,0,0 > > > > This is the default on powerpc 8xx. > > > > When the limit between user space and kernel space is not 0x80000000, > > mask_user_address_32() is used and a 6 instructions sequence is > > generated: > > > > 24: 54 69 7c 7e srwi r9,r3,17 > > 28: 21 29 57 ff subfic r9,r9,22527 > > 2c: 7d 29 fe 70 srawi r9,r9,31 > > 30: 75 2a b0 00 andis. r10,r9,45056 > > 34: 7c 63 48 78 andc r3,r3,r9 > > 38: 7c 63 53 78 or r3,r3,r10 > > > > The constraint is that TASK_SIZE be aligned to 128K in order to get > > the most optimal number of instructions. > > > > When CONFIG_PPC_BARRIER_NOSPEC is not defined, fallback on the > > test-based masking as it is quicker than the 6 instructions sequence > > but not necessarily quicker than the 3 instructions sequences above. > > Doesn't that depend on whether the branch is predicted correctly? > > I can't read ppc asm well enough to check the above. [ PowerPC or Power (or Power Architecture, or Power ISA) ] > And the C is also a bit tortuous. I can read the code ;-) All those instructions are normal simple integer instructions. Shifts, adds, logicals. In general, correctly predicted non-taken bvranches cost absolutely nothing. Correctly predicted taken branches cost the same as any taken branch, so a refetch, maybe resulting in a cycle or so of decode bubble. And a mispredicted branch can be very expensive, say on the order of a hundred cycles (but usually more like ten, which is still a lot of insns worth). So branches are great for predictable stuff, and "not so great" for not so predictable stuff. Segher