[Sorry, I wanted to reply earlier, but it stayed in my drafts folder for a 
month]

On Sat, Feb 01, 2025 at 12:22:51PM +1100, Paul Mackerras wrote:
[snipped]
> 
> 603 was a looong time ago, I don't recall the details.
> 
> Regarding broadcast TLBIEs, the protocols and mechanisms for doing
> that are known to be complex and slow in the IBM Power processors (ask
> Derek Williams about that :).  Anton found that in fact doing only
> local TLBIEs and using IPIs gave *better* performance on IBM Power
> systems than using hardware broadcast TLBIEs in many cases (the reason
> being that software knows which other CPUs might have a given TLB
> entry, often quite a small set, whereas hardware doesn't, and has to
> send the invalidation to every CPU and wait for a response from every
> CPU).  Add to that, that most other SMP-capable CPU architectures
> don't do broadcast TLB invalidations, Intel x86 for example.

Actually it's coming to x86, at least on the AMD side:

https://lore.kernel.org/all/20250206044346.3810242-1-r...@surriel.com/

with performance numbers which look rather good.

I don't know how it looks like at the level of the hardware protocol,
but implementing it on a single chip/socket is likely relatively simple.

Gabriel

> 
> > > the kernel already has code to deal with this.  One of the patches in
> > > this series provides a config option to allow platforms to select
> > > unconditionally the behaviour where cross-CPU TLB invalidations are
> > > handled using inter-processor interrupts.
> > 
> > Are there plans to broadcast the (SMP cache invalidation) messages?
> 
> Cache (i.e. instruction and data cache) - yes, they *are* coherent.
> More precisely, the D caches are write-through, and all I and D caches
> snoop writes to memory (including DMA writes) and invalidate any cache
> lines being written to.
> 
> > Will uwatt support some real bus protocol, for example?
> 
> "Real" meaning using tri-state bus drivers, like we did in the 90s? :)
> 
> > Again, congrats on this great milestone!  Does this floating point
> > support do square roots as well (aka "gpopt"; does it do "gfxopt" for
> > that matter, fsel?)  fsqrt is kinda tricky to get to work fully
> > correctly :-)
> 
> Yes, fsqrt and fsel are implemented in hardware, and are accurate to
> the last bit.  Also, the FPU handles denormalized values in hardware
> (both input and output) and implements all exception handling as per
> the ISA, including the trap-enabled overflow cases.  Feel free to run
> whatever tests you like and report bugs.  But we're getting a bit
> off-topic from the kernel patches. :)
> 
> Paul.
> 
 


Reply via email to