https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111020
--- Comment #5 from H. Peter Anvin <hpa at zytor dot com> --- I don't think source code modifications are a huge problem, but at this point they require tracking down each individual bit. As far as trapping implementations are concerned: 1. In deeply embedded implementations, it is entirely possible that firmware/microcode might be *more* expensive than logic. Although memory arrays are, of course, very dense, they are still extremely general and RISC-V isn't a very sparse instruction set. 2. It seems like it almost would require an implementation-specific performance model. Now, one can validly argue that by setting the cost of unimplemented instructions to a (near-)infinite value such instructions should never be generated even if they are "enabled". That might also be a possible avenue for achieving this. As far as an explosion of subsets, yes, this is really what this means. Bloating a tiny on-chip control processor both in area and timing to implement instructions that never actually appears in the code is at best painful. That being said, I do intend to submit a proposal to the RISC-V ISA folks to subset the Zbb subset. It is worth noting that there are overlaps between the Zb* and Zbk* subsets, but the individual intersection sets do not have their own names. The Zbb instruction set is particularly noxious (and this is indeed an ISA definition problem), because it implements multiple things that are, from an implementation point of view, completely separate and require separate code paths in the ALU: § 1.2.1 Logical with negate - minimal cost; in fact in some implementations it might have zero or even negative cost due to decoder simplification. - Extremely common in embedded operations. § 1.2.2 Count leading/trailing zero bits - Requires dedicated logic. - ctz and clz have very different uses. - Typically clz and ctz will not be able to share logic, either, requiring *two* dedicated units. § 1.2.3 Count population - Requires dedicated logic. - May be useless depending on what the processor needs. § 1.2.4 Integer minimum/maximum - May be cheap or expensive, depending on if an existing comparator can be leveraged. - Quite possibly free or almost free if the AMO instruction set is already supported in its entirety, as that requires max/min already. § 1.2.5 Sign- and zero-extension § 1.2.6 Bitwise rotation - May be very cheap or quite expensive, depending on the implementation of the shift instructions. § 1.2.7 OR combine - Requires dedicated logic. - Virtually useless in control processors that do not process text. § 1.2.8 Byte-reverse - Requires dedicated logic. - These, and some other instructions, are special cases of a bit swap extension proposed in the original bitmanip proposal, but was not included even as a separate set. - Virtually useless in control processors that does not need to interface with cross-endian data. These 8 groups really ought to be given separate names. Is this going to happen again? Quite likely. It seems, as you say, that chopping the public ISA to pieces to support every single use case would seem unlikely. It really comes down to: out of multiple suboptimal cases (forced hardware bloat, custom subsets, extremely fine grained public subsets, vendor-hacked trees that lag behind and/or diverge from upstream), what option is the least amount of badness?