On 9/5/24 12:46 PM, Palmer Dabbelt wrote:
On Thu, 05 Sep 2024 11:03:18 PDT (-0700), jeffreya...@gmail.com wrote:
So the first patch failed the pre-commit CI; it didn't fail in my
testing because I'm using --with-arch to set a default configuration
that includes things like zicond to ensure that's always tested. And
the failing test is skipped when zicond is enabled by default.
The failing test is designed to ensure that we don't miss an
if-conversion due to costing issues around the extension that was
typically done in an sCC sequence (which is why it's only run when
zicond is off).
The test failed because we have a little routine that is highly
dependent on the code generated by the sCC expander and will adjust the
costing to account for expansion quirks that usually go away in register
allocation.
That code needs to be enhanced to work after the sCC expansion change.
Essentially it needs to account for the subreg extraction that shows up
in the sequence as well as being a bit looser on mode checking.
I kept the code working for the old sequences -- in theory a user could
conjure up the old sequence so handling them seems useful.
This also drops the testsuite changes. Palmer's change makes them
unnecessary.
OK, so we'll just go with that one assuming it passes the tests?
That's the plan. I pushed your change last night, so I just need a
clean run on my change now (fingers crossed).
don't really care a ton either way, I was mostly just interested in the
sign extension stuff as we've had so many issues there that I don't know
how to solve. So I figured I'd poke around to see if there was anything
interesting going on, but it was pretty boring.
There's still "stuff" in this space, but it's of less and less of a concern.
Extensions are typically less than 1% of our dynamic instruction stream
for specint these days. The worst cases are 502.gcc where extensions
vary from 1% - 1.3% of the dynamic stream and 557.xz where they range
from 1.2% - 1.4% of the dynamic instruction stream.
If it weren't for the measurable real performance regression we saw
internally on x264 I wouldn't have been looking in this space at all.
Finding the nugget for sCC expansion was just a bit of frosting from
that effort.
As far as "stuff" goes. There's probably on the order of 2b unnecessary
extensions in 541.leela. I haven't chased that down yet -- it
represents a tiny fraction of the dynamic count. Whatever it is, it was
caught by the REP_MODE_EXTENDED bits from VRULL and isn't by any of the
other mechanisms we have in place right now.
Jeff