https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111020

--- Comment #5 from H. Peter Anvin <hpa at zytor dot com> ---
I don't think source code modifications are a huge problem, but at this point
they require tracking down each individual bit.

As far as trapping implementations are concerned:

1. In deeply embedded implementations, it is entirely possible that
firmware/microcode might be *more* expensive than logic. Although memory arrays
are, of course, very dense, they are still extremely general and RISC-V isn't a
very sparse instruction set.

2. It seems like it almost would require an implementation-specific performance
model. Now, one can validly argue that by setting the cost of unimplemented
instructions to a (near-)infinite value such instructions should never be
generated even if they are "enabled". That might also be a possible avenue for
achieving this.

As far as an explosion of subsets, yes, this is really what this means.
Bloating a tiny on-chip control processor both in area and timing to implement
instructions that never actually appears in the code is at best painful.

That being said, I do intend to submit a proposal to the RISC-V ISA folks to
subset the Zbb subset. It is worth noting that there are overlaps between the
Zb* and Zbk* subsets, but the individual intersection sets do not have their
own names.

The Zbb instruction set is particularly noxious (and this is indeed an ISA
definition problem), because it implements multiple things that are, from an
implementation point of view, completely separate and require separate code
paths in the ALU:

§ 1.2.1 Logical with negate
        - minimal cost; in fact in some implementations it might have zero or
even negative cost due to decoder simplification.
        - Extremely common in embedded operations.

§ 1.2.2 Count leading/trailing zero bits
        - Requires dedicated logic.
        - ctz and clz have very different uses.
        - Typically clz and ctz will not be able to share logic, either,
requiring *two* dedicated units.

§ 1.2.3 Count population
        - Requires dedicated logic.
        - May be useless depending on what the processor needs.

§ 1.2.4 Integer minimum/maximum
        - May be cheap or expensive, depending on if an existing comparator can
be leveraged.
        - Quite possibly free or almost free if the AMO instruction set is
already supported in its entirety, as that requires max/min already.

§ 1.2.5 Sign- and zero-extension
§ 1.2.6 Bitwise rotation
        - May be very cheap or quite expensive, depending on the implementation
of the shift instructions.

§ 1.2.7 OR combine
        - Requires dedicated logic.
        - Virtually useless in control processors that do not process text.

§ 1.2.8 Byte-reverse
        - Requires dedicated logic.
        - These, and some other instructions, are special cases of a bit swap
extension proposed in the original bitmanip proposal, but was not included even
as a separate set.
        - Virtually useless in control processors that does not need to
interface with cross-endian data.


These 8 groups really ought to be given separate names.

Is this going to happen again? Quite likely.

It seems, as you say, that chopping the public ISA to pieces to support every
single use case would seem unlikely.

It really comes down to: out of multiple suboptimal cases (forced hardware
bloat, custom subsets, extremely fine grained public subsets, vendor-hacked
trees that lag behind and/or diverge from upstream), what option is the least
amount of badness?

Reply via email to