Excerpts from Segher Boessenkool's message of August 27, 2021 12:37 am: > On Thu, Aug 26, 2021 at 11:57:52PM +1000, Nicholas Piggin wrote: >> Excerpts from Segher Boessenkool's message of August 26, 2021 10:49 pm: >> > On Thu, Aug 26, 2021 at 01:26:14PM +1000, Nicholas Piggin wrote: >> >> Excerpts from Segher Boessenkool's message of August 19, 2021 1:06 am: >> >> > On Fri, Aug 13, 2021 at 04:08:13PM +1000, Nicholas Piggin wrote: >> >> >> This one possibly the branches end up in predictors, whereas >> >> >> conditional >> >> >> trap is always just speculated not to hit. Branches may also have a >> >> >> throughput limit on execution whereas trap could be more (1 per cycle >> >> >> vs 4 per cycle on POWER9). >> >> > >> >> > I thought only *taken* branches are just one per cycle? >> >> >> >> Taken branches are fetched by the front end at one per cycle (assuming >> >> they hit the BTAC), but all branches have to be executed by BR at one >> >> per cycle >> > >> > This is not true. (Simple) predicted not-taken conditional branches are >> > just folded out, never hit the issue queues. And they are fetched as >> > many together as fit in a fetch group, can complete without limits as >> > well. >> >> No, they are all dispatched and issue to the BRU for execution. It's >> trivial to construct a test of a lot of not taken branches in a row >> and time a loop of it to see it executes at 1 cycle per branch. > > (s/dispatched/issued/)
? > > Huh, this was true on p8 already. Sorry for my confusion. In my > defence, this doesn't matter for performance on "real code". > >> > Correctly predicted simple conditional branches just get their prediction >> > validated (and that is not done in the execution units). Incorrectly >> > predicted branches the same, but those cause a redirect and refetch. >> >> How could it validate prediction without issuing? It wouldn't know when >> sources are ready. > > In the backend. But that is just how it worked on older cores :-/ Okay. I don't know about older cores than POWER9. Backend would normally include execution though. Only other place you could do it if you don't issue/exec would be after it goes back in order, like completion. But that would be horrible for mispredict penalty. >> >> The first problem seems like the show stopper though. AFAIKS it would >> >> need a special builtin support that does something to create the table >> >> entry, or a guarantee that we could put an inline asm right after the >> >> builtin as a recognized pattern and that would give us the instruction >> >> following the trap. >> > >> > I'm not quite sure what this means. Can't you always just put a >> > >> > bla: asm(""); >> > >> > in there, and use the address of "bla"? >> >> Not AFAIKS. Put it where? > > After wherever you want to know the address after. You will have to > make sure they stay together somehow. I still don't follow. > It is much easier to get the address of something, not the address after > it. If you know it is just one insn anyway, that isn't hard to handle > either (even if prefixed insns exist). > >> > If not, you need to say a lot >> > more about what you actually want to do :-/ >> >> We need to put the address (or relative offset) of the trap instruction >> into an entry in a different section. Basically this: >> >> __builtin_trap(); >> asm ("1: \n\t" >> " .section __bug_table,\"aw\" \n\t" >> "2: .4byte 1b - 2b - 4 \n\t" >> " .previous"); >> >> Where the 1: label must follow immediately after the trap instruction, >> and where the asm must be emitted even for the case of no-return traps >> (but anything following the asm could be omitted). > > The address *after* the trap insn? That is much much harder than the > address *of* the trap insn. No the address of the trap instruction, hence the -4. I have the label after because that is the semantics I have from inline asm. If you could give a built in that put a label at the address of the trap instruction that could be used later by inline asm then that could work too: __builtin_labeled_trap("1:"); asm (" .section __bug_table,\"aw\" \n\t" "2: .4byte 1b - 2b \n\t" " .previous");