[Sorry, I wanted to reply earlier, but it stayed in my drafts folder for a month]
On Sat, Feb 01, 2025 at 12:22:51PM +1100, Paul Mackerras wrote: [snipped] > > 603 was a looong time ago, I don't recall the details. > > Regarding broadcast TLBIEs, the protocols and mechanisms for doing > that are known to be complex and slow in the IBM Power processors (ask > Derek Williams about that :). Anton found that in fact doing only > local TLBIEs and using IPIs gave *better* performance on IBM Power > systems than using hardware broadcast TLBIEs in many cases (the reason > being that software knows which other CPUs might have a given TLB > entry, often quite a small set, whereas hardware doesn't, and has to > send the invalidation to every CPU and wait for a response from every > CPU). Add to that, that most other SMP-capable CPU architectures > don't do broadcast TLB invalidations, Intel x86 for example. Actually it's coming to x86, at least on the AMD side: https://lore.kernel.org/all/20250206044346.3810242-1-r...@surriel.com/ with performance numbers which look rather good. I don't know how it looks like at the level of the hardware protocol, but implementing it on a single chip/socket is likely relatively simple. Gabriel > > > > the kernel already has code to deal with this. One of the patches in > > > this series provides a config option to allow platforms to select > > > unconditionally the behaviour where cross-CPU TLB invalidations are > > > handled using inter-processor interrupts. > > > > Are there plans to broadcast the (SMP cache invalidation) messages? > > Cache (i.e. instruction and data cache) - yes, they *are* coherent. > More precisely, the D caches are write-through, and all I and D caches > snoop writes to memory (including DMA writes) and invalidate any cache > lines being written to. > > > Will uwatt support some real bus protocol, for example? > > "Real" meaning using tri-state bus drivers, like we did in the 90s? :) > > > Again, congrats on this great milestone! Does this floating point > > support do square roots as well (aka "gpopt"; does it do "gfxopt" for > > that matter, fsel?) fsqrt is kinda tricky to get to work fully > > correctly :-) > > Yes, fsqrt and fsel are implemented in hardware, and are accurate to > the last bit. Also, the FPU handles denormalized values in hardware > (both input and output) and implements all exception handling as per > the ISA, including the trap-enabled overflow cases. Feel free to run > whatever tests you like and report bugs. But we're getting a bit > off-topic from the kernel patches. :) > > Paul. >