Excerpts from Segher Boessenkool's message of August 26, 2021 10:49 pm: > Hi! > > On Thu, Aug 26, 2021 at 01:26:14PM +1000, Nicholas Piggin wrote: >> Excerpts from Segher Boessenkool's message of August 19, 2021 1:06 am: >> > On Fri, Aug 13, 2021 at 04:08:13PM +1000, Nicholas Piggin wrote: >> >> This one possibly the branches end up in predictors, whereas conditional >> >> trap is always just speculated not to hit. Branches may also have a >> >> throughput limit on execution whereas trap could be more (1 per cycle >> >> vs 4 per cycle on POWER9). >> > >> > I thought only *taken* branches are just one per cycle? >> >> Taken branches are fetched by the front end at one per cycle (assuming >> they hit the BTAC), but all branches have to be executed by BR at one >> per cycle > > This is not true. (Simple) predicted not-taken conditional branches are > just folded out, never hit the issue queues. And they are fetched as > many together as fit in a fetch group, can complete without limits as > well.
No, they are all dispatched and issue to the BRU for execution. It's trivial to construct a test of a lot of not taken branches in a row and time a loop of it to see it executes at 1 cycle per branch. > The BTAC is a frontend thing, used for target address prediction. It > does not limit execution. I didn't say it did. > Correctly predicted simple conditional branches just get their prediction > validated (and that is not done in the execution units). Incorrectly > predicted branches the same, but those cause a redirect and refetch. How could it validate prediction without issuing? It wouldn't know when sources are ready. > >> > Internally *all* traps are conditional, in GCC. It also can optimise >> > them quite well. There must be something in the kernel macros that >> > prevents good optimisation. >> >> I did take a look at it at one point. >> >> One problem is that the kernel needs the address of the trap instruction >> to create the entry for it. The other problem is that __builtin_trap >> does not return so it can't be used for WARN. LLVM at least seems to >> have a __builtin_debugtrap which does return. > > This is <https://gcc.gnu.org/PR99299>. Aha. >> The first problem seems like the show stopper though. AFAIKS it would >> need a special builtin support that does something to create the table >> entry, or a guarantee that we could put an inline asm right after the >> builtin as a recognized pattern and that would give us the instruction >> following the trap. > > I'm not quite sure what this means. Can't you always just put a > > bla: asm(""); > > in there, and use the address of "bla"? Not AFAIKS. Put it where? > If not, you need to say a lot > more about what you actually want to do :-/ We need to put the address (or relative offset) of the trap instruction into an entry in a different section. Basically this: __builtin_trap(); asm ("1: \n\t" " .section __bug_table,\"aw\" \n\t" "2: .4byte 1b - 2b - 4 \n\t" " .previous"); Where the 1: label must follow immediately after the trap instruction, and where the asm must be emitted even for the case of no-return traps (but anything following the asm could be omitted). Thanks, Nick