On 7/8/24 6:58 AM, Manolis Tsamis wrote:
This is still hard to tell. In some cases I have observed either
improvement or regressions in benchmarks, which are highly susceptible
to costing and the specific store-forwarding penalties of the CPU.
I have seen cases where the store-forwarding instance is profitable to
avoid but we get bad code generation due to other reasons (usually
store_bit_field lowering not being good enough) and hence a
regression.
So I believe more time and testing is needed to really evaluate the
speedups that can be achieved.
Definitely agree across the board. I'm hoping to have some good data
from the hardware team on this shortly. I suspect that on RISC-V the
lack of good bitfield manipulation capabilities is going to limit the
effectiveness somewhat. But that's speculation.
First step is to get the signaling data on when our design is stalling
due to narrow store feeding wider load. We *might* have that data as of
this morning, but an IT issue is getting in the way of verification.
jeff